site:the-decoder.com - Search News

AI benchmarks systematically ignore how humans disagree, Google study finds

A Google study finds that the standard three to five human raters per test example often aren't enough for reliable AI ...

the-decoder

OpenAI reshuffles leadership as health issues force key executives to step back

Several leadership changes are underway at OpenAI. Fidji Simo, CEO of the newly created "AGI Deployment" division, is taking sick leave for several weeks to deal with an autoimmune disease affecting ...

the-decoder

Alibaba's Qwen team makes AI models think deeper with new algorithm

Alibaba's Qwen team has developed a new training algorithm for reasoning models that assigns different weights to individual tokens based on how much each step influences the subsequent chain of ...

the-decoder

Anthropic says Claude Code's usage drain comes down to peak-hour caps and ballooning contexts

Anthropic has looked into complaints from users who were hitting their Claude Code usage limits much faster than expected. According to Anthropic's Lydia Hallie, tighter limits during peak hours and ...

the-decoder

Microsoft's MAI-Transcribe-1 runs 2.5x faster than its predecessor at $0.36 per audio hour

Microsoft has introduced MAI-Transcribe-1, a speech-to-text model supporting 25 languages that achieves the lowest word error rate of any model tested on the FLEURS ...

the-decoder

AI offensive cyber capabilities are doubling every six months, safety researchers find

AI safety research firm Lyptus Research has published a new study on the offensive cybersecurity capabilities of AI models. The study is based on the METR time-horizon method and involved testing with ...

the-decoder

Anthropic leak reveals new model "Claude Mythos" with "dramatically higher scores on tests" than any previous model

The leaked blog posts have allegedly surfaced online; the information matches what Fortune shared in a follow-up article. There are two versions of the same blog post that only differ in the model's ...

the-decoder

Microsoft rolls out Copilot Cowork more broadly and lets AI models check each other's work

Microsoft is making "Copilot Cowork" more widely available and launching a new AI research agent. The previously announced feature builds on Claude Cowork and lets the system handle multi-step tasks ...

the-decoder

EU bars AI-generated content from official communications, according to Politico

Politico reports that the EU Commission, Parliament, and Council have banned their communications teams from using fully AI-generated videos and images. AI may only be used to optimize existing visual ...

the-decoder

The New York Times drops freelancer whose AI tool copied from an existing book review

The New York Times cut ties with freelance writer Alex Preston after it turned out an AI tool he'd used had copied from an existing book review. Preston was writing a review of Jean-Baptiste Andrea's ...

the-decoder

Anthropic drops the surcharge for million-token context windows, making Opus 4.6 and Sonnet 4.6 far cheaper

Anthropic is making Claude's extra-large context window a lot cheaper. The Opus 4.6 and Sonnet 4.6 models now offer a context window of one million tokens at the standard price. Until now, Anthropic ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results