Manus AI is one of the hottest AI agent startups around, recently raising $75 million at a half-billion dollar valuation in a round led by Benchmark. But two unnamed sources told Semafor that the investment is now under review by the U.S. Treasury Department over its compliance with 2023 restrictions on investing in Chinese companies.…
Category: Benchmark
Benchmark, Global Security News, Sarah Tavel, Venture
Sarah Tavel, Benchmark’s first woman GP, transitions to venture partner
Eight years after joining Benchmark as the firm’s first woman general partner, Sarah Tavel announced on X that she is transitioning to a more limited role at the storied venture firm. In her new position as a venture partner, Tavel will continue to make investments and serve on existing company boards, but she will have…
AI, ai agent, Benchmark, Global Security News, manus
Chinese AI startup Manus reportedly gets funding from Benchmark at $500M valuation
Chinese startup Manus AI, which works on building tools related to AI agents, has picked up $75 million in a funding round led by Benchmark at a roughly $500 million valuation, according to Bloomberg. The company will use the money to expand to new markets, including the U.S., Japan, and the Middle East, Bloomberg noted,…
AI, Benchmark, Global Security News, Llama, llama 4, Meta
Meta’s benchmarks for its new AI models are a bit misleading
One of the new flagship AI models Meta released on Saturday, Maverick, ranks second on LM Arena, a test that has human raters compare the outputs of models and choose which they prefer. But it seems the version of Maverick that Meta deployed to LM Arena differs from the version that’s widely available to developers.…
AI, Anthropic, Benchmark, claude 3.7 sonnet, Gaming, Global IT News, Global Security News, pokemon, pokemon red
Anthropic used Pokémon to benchmark its newest AI model
Anthropic used Pokémon to benchmark its newest AI model. Yes, really. In a blog post published Monday, Anthropic said that it tested its latest model, Claude 3.7 Sonnet, on the Game Boy classic Pokémon Red. The company equipped the model with basic memory, screen pixel input, and function calls to press buttons and navigate around the…
AI, Benchmark, evergreens, Global IT News, Global Security News, NPR, reasoning model, Research
These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models
Every Sunday, NPR host Will Shortz, The New York Times’ crossword puzzle guru, gets to quiz thousands of listeners in a long-running segment called the Sunday Puzzle. While written to be solvable without too much foreknowledge, the brainteasers are usually challenging even for skilled contestants. That’s why some experts think they’re a promising way to…
AI, Benchmark, Global IT News, Global Security News, Language Technology Partner Program, Meta, Social, Speech Recognition, Translation
Meta launches new program to improve speech and translation AI
Meta is launching a new program in partnership with UNESCO to collect speech recordings and transcriptions the company said will help the development of future openly available AI. The program, the Language Technology Partner Program, is seeking collaborators who can contribute more than 10 hours of speech recordings with transcriptions, large amounts of written text,…
AI, Benchmark, Global IT News, Global Security News, NPR, npr sunday puzzle, reasoning model, Research, sunday puzzle
These researchers used NPR Sunday Puzzle questions to benchmark AI ‘reasoning’ models
Every Sunday, NPR host Will Shortz, The New York Times’ crossword puzzle guru, gets to quiz thousands of listeners in a long-running segment called the Sunday Puzzle. While written to be solvable without too much foreknowledge, the brainteasers are usually challenging even for skilled contestants. That’s why some experts think they’re a promising way to…
AI, Benchmark, benchmarking, generative ai, Global IT News, Global Security News, In Brief
Even some of the best AI can’t beat this new benchmark
The nonprofit Center for AI Safety (CAIS) and Scale AI, a company that provides a number of data labeling and AI development services, have released a challenging new benchmark for frontier AI systems. The benchmark, called Humanity’s Last Exam, includes thousands of crowdsourced questions touching on subjects like mathematics, humanities, and the natural sciences. To make…
AI, Benchmark, decart, Fundraising, Gaming, GenAI, generative ai, Global IT News, Global Security News, LLM, oasis, Startups
Decart adds another $32M at a $500M+ valuation
A young startup that emerged from stealth less than two months ago with big-name backers and bigger ambitions to make a splash in the world of AI is returning to the spotlight. Decart is building what its CEO and co-founder Dean Leitersdorf describes as “a fully vertically integrated AI research lab,” alongside enterprise and consumer…