At least we can watch AI play Mario. © 2024 TechCrunch. All rights reserved. For personal use only.
Category: benchmarks
AI, benchmarks, Global IT News, Global Security News, Grok, openai, xAI
Did xAI lie about Grok 3’s benchmarks?
Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out into public view. This week, an OpenAI employee accused Elon Musk’s AI company, xAI, of publishing misleading benchmark results for its latest AI model, Grok 3. One of the co-founders of xAI, Igor Babushkin, insisted that the company was…
AI, benchmarks, Global IT News, Global Security News, hallucinations, LLMs, Research, TC
AI isn’t very good at history, new paper finds
Top LLMs performed poorly on a high-level history test, a new paper has found. © 2024 TechCrunch. All rights reserved. For personal use only.
AGI, AI, arc prize, arc prize foundation, arc-agi, benchmarks, Francois Chollet, generative ai, Global IT News, Global Security News
AI researcher François Chollet is co-founding a nonprofit to build benchmarks for AGI
Former Google engineer and influential AI researcher François Chollet is co-founding a nonprofit to help develop benchmarks that’ll probe AI for “human-level” intelligence. The nonprofit, the ARC Prize Foundation, will be led by Greg Kamradt, an ex-Salesforce engineering director and founder of the AI product studio Leverage. Kamradt will serve as president and a member…
2024, AI, benchmarking, benchmarks, Connect 4, generative ai, Global IT News, Global Security News, Minecraft, pictionary, spaghetti, Will Smith
Will Smith eating spaghetti and other weird AI benchmarks that took off in 2024
When a company releases a new AI video generator, it’s not long before someone uses it to make a video of actor Will Smith eating spaghetti. It’s become something of a meme as well as a benchmark: Seeing whether a new video generator can realistically render Smith slurping down a bowl of noodles. Smith himself…