Study shows AI struggles with NYT Connections game despite advanced capabilities


Judith Brown Clarke Vice President for Equity and Inclusion Chief Diversity Officer | Stony Brook University

A recent study led by Tuhin Chakrabarty, an assistant professor at Stony Brook's Department of Computer Science, in collaboration with researchers from Columbia University, has revealed insights into the capabilities of AI models when faced with abstract reasoning challenges. The research focused on the New York Times word game 'Connections,' which presents a unique benchmark for testing Large Language Models (LLMs).

Despite the prowess of AI and machine learning in defeating top chess players, the study found that even the most advanced LLM, Claude 3.5 Sonnet, could fully solve only 18% of 'Connections' games. This was based on an analysis of over 400 games where both novice and expert human players outperformed AI.

In 'Connections,' players must organize a 4x4 grid of 16 words into four groups based on shared characteristics. For instance, words like 'Followers,' 'Sheep,' 'Puppets,' and 'Lemmings' can be grouped as 'Conformists.' Success in this task requires reasoning across various knowledge forms, including semantic and encyclopedic understanding.

Chakrabarty explained, "While the task might seem easy to some, many of these words can be easily grouped into several other categories." He noted how potential groupings serve as red herrings designed to add complexity to the game.

The research highlighted that LLMs show relative strength in tasks involving semantic relations but struggle with more complex knowledge types such as multiword expressions and understanding combined word form and meaning. Five different LLMs were tested: Google's Gemini 1.5 Pro, Anthropic's Claude 3.5 Sonnet, OpenAI's GPT4 Omni, Meta's Llama 3.1 405B, and Mistral Large 2 (Mistral-AI, 2024). The results indicated that while these models could partially solve some puzzles, their overall performance was lacking compared to humans.

For further details on this study, readers are directed to visit the AI Innovation Institute website.

Organizations Included in this History


Daily Feed

Local

Unsolved Cases During National Crime Victims’ Rights Week

Crime Victims Rights' Week is this week and Suffolk County Police Department (SCPD) is looking for your help. SCPD hopes to garner information from the public to aid getting justice for victims.


National

So It Goes: Upcoming Billy Joel HBO Doc to World-Premiere at Tribeca

The screening will take place at the Beacon Theatre on June 4. The festival runs through June 15. Additional programming has yet to be announced.


State

NY Democrats Move to Stymie Police in the Pursuit of Racial Equity

A new law has been proposed that would prevent police from stopping drivers for a whole slew of basic safety issues.