Category: benchmarks

Not even Pokémon is safe from AI benchmarking controversy. Last week, a post on X went viral, claiming that Google’s latest Gemini model surpassed Anthropic’s flagship Claude model in the original Pokémon video game trilogy. Reportedly, Gemini had reached Lavendar Town in a developer’s Twitch stream; Claude was stuck at Mount Moon as of late…

People are using Super Mario to benchmark AI now

March 3, 2025

Did xAI lie about Grok 3’s benchmarks?

February 22, 2025

Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out into public view. This week, an OpenAI employee accused Elon Musk’s AI company, xAI, of publishing misleading benchmark results for its latest AI model, Grok 3. One of the co-founders of xAI, Igor Babushkin, insisted that the company was…

AI isn’t very good at history, new paper finds

January 19, 2025

AI researcher François Chollet is co-founding a nonprofit to build benchmarks for AGI

January 8, 2025

Former Google engineer and influential AI researcher François Chollet is co-founding a nonprofit to help develop benchmarks that’ll probe AI for “human-level” intelligence. The nonprofit, the ARC Prize Foundation, will be led by Greg Kamradt, an ex-Salesforce engineering director and founder of the AI product studio Leverage. Kamradt will serve as president and a member…

Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024

December 31, 2024

When a company releases a new AI video generator, it’s not long before someone uses it to make a video of actor Will Smith eating spaghetti. It’s become something of a meme as well as a benchmark: Seeing whether a new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself…

Category: benchmarks

AI, benchmarks, Global Security News, pokemon

Debates over AI benchmarking have reached Pokémon

April 14, 2025

AI, benchmarks, games, Gaming, Global IT News, Global Security News, super mario bros