Category: Benchmark

Manus AI is one of the hottest AI agent startups around, recently raising $75 million at a half-billion dollar valuation in a round led by Benchmark. But two unnamed sources told Semafor that the investment is now under review by the U.S. Treasury Department over its compliance with 2023 restrictions on investing in Chinese companies.…

Sarah Tavel, Benchmark’s first woman GP, transitions to venture partner

April 29, 2025

Eight years after joining Benchmark as the firm’s first woman general partner, Sarah Tavel announced on X that she is transitioning to a more limited role at the storied venture firm. In her new position as a venture partner, Tavel will continue to make investments and serve on existing company boards, but she will have…

Chinese AI startup Manus reportedly gets funding from Benchmark at $500M valuation

April 25, 2025

Chinese startup Manus AI, which works on building tools related to AI agents, has picked up $75 million in a funding round led by Benchmark at a roughly $500 million valuation, according to Bloomberg. The company will use the money to expand to new markets, including the U.S., Japan, and the Middle East, Bloomberg noted,…

Meta’s benchmarks for its new AI models are a bit misleading

April 6, 2025

One of the new flagship AI models Meta released on Saturday, Maverick, ranks second on LM Arena, a test that has human raters compare the outputs of models and choose which they prefer. But it seems the version of Maverick that Meta deployed to LM Arena differs from the version that’s widely available to developers.…

Anthropic used Pokémon to benchmark its newest AI model

February 24, 2025

Anthropic used Pokémon to benchmark its newest AI model. Yes, really. In a blog post published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the Game Boy classic Pokémon Red. The company equipped the model with basic memory, screen pixel input, and function calls to press buttons and navigate around the…

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

February 16, 2025

Every Sunday, NPR host Will Shortz, The New York Times’ crossword puzzle guru, gets to quiz thousands of listeners in a long-running segment called the Sunday Puzzle. While written to be solvable without too much foreknowledge, the brainteasers are usually challenging even for skilled contestants. That’s why some experts think they’re a promising way to…

Meta launches new program to improve speech and translation AI

February 7, 2025

Meta is launching a new program in partnership with UNESCO to collect speech recordings and transcriptions the company said will help the development of future openly available AI. The program, the Language Technology Partner Program, is seeking collaborators who can contribute more than 10 hours of speech recordings with transcriptions, large amounts of written text,…

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

February 5, 2025

Even some of the best AI can’t beat this new benchmark

January 23, 2025

The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems. The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make…

Decart adds another $32M at a $500M+ valuation

December 19, 2024

A young startup that emerged from stealth less than two months ago with big-name backers and bigger ambitions to make a splash in the world of AI is returning to the spotlight. Decart is building what its CEO and co-founder Dean Leitersdorf describes as “a fully vertically integrated AI research lab,” alongside enterprise and consumer…

Category: Benchmark

Benchmark, china, Global Security News, Manus AI, Startups

The US is reviewing Benchmark’s investment into Chinese AI startup Manus

May 9, 2025

Benchmark, Global Security News, Sarah Tavel, Venture

Sarah Tavel, Benchmark’s first woman GP, transitions to venture partner

April 29, 2025

AI, ai agent, Benchmark, Global Security News, manus

Chinese AI startup Manus reportedly gets funding from Benchmark at $500M valuation

April 25, 2025

AI, Benchmark, Global Security News, Llama, llama 4, Meta

Meta’s benchmarks for its new AI models are a bit misleading

April 6, 2025

AI, Anthropic, Benchmark, claude 3.7 sonnet, Gaming, Global IT News, Global Security News, pokemon, pokemon red

Anthropic used Pokémon to benchmark its newest AI model

February 24, 2025

AI, Benchmark, evergreens, Global IT News, Global Security News, NPR, reasoning model, Research

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

February 16, 2025

AI, Benchmark, Global IT News, Global Security News, NPR, npr sunday puzzle, reasoning model, Research, sunday puzzle

These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models

February 5, 2025

AI, Benchmark, benchmarking, generative ai, Global IT News, Global Security News, In Brief

Even some of the best AI can’t beat this new benchmark

January 23, 2025

AI, Benchmark, decart, Fundraising, Gaming, GenAI, generative ai, Global IT News, Global Security News, LLM, oasis, Startups

Decart adds another $32M at a $500M+ valuation

December 19, 2024

Benchmark, china, Global Security News, Manus AI, Startups

Benchmark, Global Security News, Sarah Tavel, Venture

AI, ai agent, Benchmark, Global Security News, manus

AI, Benchmark, Global Security News, Llama, llama 4, Meta

AI, Anthropic, Benchmark, claude 3.7 sonnet, Gaming, Global IT News, Global Security News, pokemon, pokemon red

AI, Benchmark, evergreens, Global IT News, Global Security News, NPR, reasoning model, Research

AI, Benchmark, Global IT News, Global Security News, Language Technology Partner Program, Meta, Social, Speech Recognition, Translation

AI, Benchmark, Global IT News, Global Security News, NPR, npr sunday puzzle, reasoning model, Research, sunday puzzle

AI, Benchmark, benchmarking, generative ai, Global IT News, Global Security News, In Brief

AI, Benchmark, decart, Fundraising, Gaming, GenAI, generative ai, Global IT News, Global Security News, LLM, oasis, Startups