Not even Pokémon is safe from AI benchmarking controversy. Last week, a post on X went viral, claiming that Google’s latest Gemini model surpassed Anthropic’s flagship Claude model in the original Pokémon video game trilogy. Reportedly, Gemini had reached Lavendar Town in a developer’s Twitch stream; Claude was stuck at Mount Moon as of late…
Category: pokemon
AI, Anthropic, Benchmark, claude 3.7 sonnet, Gaming, Global IT News, Global Security News, pokemon, pokemon red
Anthropic used Pokémon to benchmark its newest AI model
Anthropic used Pokémon to benchmark its newest AI model. Yes, really. In a blog post published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the Game Boy classic Pokémon Red. The company equipped the model with basic memory, screen pixel input, and function calls to press buttons and navigate around the…