The Sequence Radar: From Model to Team: Several Models are Better than One: Sakana’s Blueprint for Collective AI | By The Digital Insider

Sakana's new model combines different models seamlessly at inference time.

Created Using GPT-4o

Next Week in The Sequence:

  • Knowledge: We explore creativity AI evals.

  • Engineering: We dive into Amazon’s Strands agentic framework.

  • Opinion : We discuss the limits of autonomy in AI agents.

  • Research: Dive into Sakana AI’s new AB-MCTS model.

Let’s Go! You can subscribe to The Sequence below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Several Models are Better than One: Sakana’s Blueprint for Collective AI

Sakana AI has rapidly emerged as one of my favorite and most innovative AI research labs in the current landscape. Founded by former Google Brain and DeepMind researchers, the lab has already made headlines with its work on evolutionary methods, model orchestration, and adaptive reasoning.

Now they have launched a new, impressive model.

In the latest evolution of inference-time AI, Sakana AI has introduced a compelling framework that pushes beyond traditional single-model reasoning: Adaptive Branching Monte Carlo Tree Search, or AB-MCTS. At its core, AB-MCTS reflects a philosophical shift in how we think about large language model (LLM) reasoning. Rather than treating generation as a flat, linear process, Sakana’s approach reframes inference as a strategic exploration through a search tree—navigating between depth (refinement of existing ideas), width (generation of new hypotheses), and even model selection itself. The result is a system that begins to resemble collaborative, human-like thinking.

AB-MCTS is an inference-time algorithm grounded in the principles of Monte Carlo Tree Search, a method historically associated with planning in board games like Go. Sakana adapts this mechanism to textual reasoning by using Thompson Sampling to decide whether to continue developing a promising response or branch into an unexplored avenue. This adaptive process means the system is no longer bound to fixed temperature sampling or deterministic prompting. Instead, it engages in a kind of probabilistic deliberation, allocating its computational resources to the most promising parts of the solution space as determined in real-time.

But the real breakthrough lies in the extension: Multi-LLM AB-MCTS. In this paradigm, multiple LLMs—including frontier models like OpenAI’s o4-mini, Google DeepMind’s Gemini 2.5 Pro, and DeepSeek’s R1—are orchestrated into a dynamic ensemble. At each point in the reasoning tree, the system not only decides what to do next (go deeper or go wider) but also who should do it. This introduces a novel third axis to inference: model routing. Initially unbiased, the system learns to favor models that historically perform better on certain subtasks, effectively turning a collection of models into a coherent team.

The implications for real-world AI systems are profound. By decoupling capability from a single monolithic model, AB-MCTS provides a path to compositional reliability. Enterprises can now imagine deploying systems where reasoning chains are distributed across specialized models, dynamically assigned at runtime based on contextual performance. This not only improves robustness but opens up opportunities for cost optimization, interpretability, and safety. Moreover, Sakana has open-sourced the framework—dubbed TreeQuest—under Apache 2.0, inviting both researchers and practitioners to integrate it into their pipelines.

What Sakana has achieved with AB-MCTS is a blueprint for how we might scale intelligence not just by increasing parameters or data, but by scaling the search process itself. It borrows from the playbooks of both biological evolution and algorithmic planning, combining breadth, depth, and diversity in a structured, learnable way. In doing so, it reframes LLMs as components of larger reasoning ecosystems—systems that can adapt, deliberate, and even self-correct. The age of collective intelligence at inference-time may just be getting started.

🔎 AI Research

Title: SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

AI Lab: Allen Institute for AI & Yale University
SciArena introduces a community-driven platform to evaluate foundation models on scientific literature tasks using researcher votes, achieving a leaderboard based on over 13,000 preference votes. It also proposes SciArena-Eval, a benchmark to assess how well models can serve as automated evaluators compared to human judgments, revealing substantial gaps between LLM-based and human evaluation accuracy.

Title: Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

AI Lab: Carnegie Mellon University, University of Washington, University of Pennsylvania, The Hong Kong Polytechnic University
This study shows that reinforcement learning (RL) fine-tuning on math reasoning data enhances generalization to other reasoning and non-reasoning tasks, while supervised fine-tuning (SFT) often leads to capability degradation. Using latent-space PCA and token distribution analyses, the authors attribute this to SFT-induced representation drift and propose RL as a more robust training paradigm for transferable intelligence.

Title: Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact

AI Lab: Multi-institutional collaboration including University of Central Florida, Cornell, Vector Institute, Meta, Amazon, Oxford, and others
This review outlines a roadmap toward Artificial General Intelligence (AGI) grounded in cognitive neuroscience, agentic AI, memory, and modular reasoning, highlighting the limitations of token-based models and the importance of world models and agent architectures. It calls for interdisciplinary alignment and socially grounded, explainable, and adaptive systems to move from statistical learning to general-purpose intelligence.

Title: Zero-shot Antibody Design in a 24-well Plate

AI Lab: Chai Discovery Team
The paper presents Chai-2, a multimodal generative model that enables zero-shot design of antibodies and miniproteins with experimentally validated success rates of 16% for antibodies and 68% for miniproteins—orders of magnitude higher than prior methods. Demonstrating the ability to design binders to novel targets without known antibodies, Chai-2 reduces discovery time from months to weeks using as few as 20 experimental designs per target.

Title: Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

AI Lab: Sakana AI
This paper introduces AB-MCTS (Adaptive Branching Monte Carlo Tree Search), a novel inference-time search framework that dynamically decides whether to “go wider” by exploring new answers or “go deeper” by refining existing ones using external feedback, improving LLM performance without additional training. AB-MCTS outperforms repeated sampling and standard MCTS across coding, reasoning, and ML tasks by balancing exploration and exploitation in a principled, budget-aware manner.

Title: Fast and Simplex: 2-Simplicial Attention in Triton

AI Lab: Meta AI and University of Texas at Austin
This paper proposes the 2-simplicial Transformer, a generalization of standard dot-product attention to trilinear attention forms, improving token efficiency and scaling behavior for reasoning, coding, and mathematical tasks. Through a custom Triton kernel implementation, the model exhibits more favorable scaling exponents under token constraints, outperforming traditional Transformers on key benchmarks such as MMLU and GSM8k.

🤖 AI Tech Releases

Ernie 4.5

Baidu released the newest version of its marquee Ernie model.

DeepSWE

Together AI open sourced DeepSWE, a new coding agent based on Qwen3.

🛠 AI in Production

Agents Auditing at Salesforce

Salesforce discusses the architecture powering the Agentforce auditing capabilities.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


#000, #Agent, #Agentforce, #AgenticAI, #Agents, #AGI, #Ai, #AIAGENTS, #AIChips, #AIModels, #AIResearch, #AISystems, #AIPowered, #Algorithm, #Amazon, #Amp, #Analyses, #Antibodies, #Apache, #Approach, #Architecture, #Artificial, #ArtificialGeneralIntelligence, #Attention, #Automation, #Behavior, #Benchmark, #Benchmarks, #Blackwell, #Blueprint, #Board, #Brain, #Building, #CarnegieMellonUniversity, #CEO, #Chips, #Cloud, #Coding, #CognitiveNeuroscience, #Collaboration, #Collaborative, #Collective, #Communication, #Community, #Companies, #Creativity, #Data, #DataIntegration, #DeepMind, #Deepseek, #Dell, #Deploying, #Design, #Discovery, #Diversity, #Drift, #EARLY, #Ecosystems, #Editorial, #Efficiency, #Email, #Engineering, #Enterprise, #Enterprises, #Ernie, #Erp, #Evaluation, #Evolution, #Exhibits, #Experimental, #Extension, #Forms, #Foundation, #FoundationModels, #Framework, #Funding, #Games, #Gemini, #Generative, #Google, #GoogleDeepmind, #GPT, #Grammarly, #How, #Human, #Ideas, #IlyaSutskever, #Inference, #Integration, #Intelligence, #Interpretability, #Investments, #It, #Kernel, #Landscape, #Language, #LanguageModel, #LargeLanguageModel, #Leaderboard, #Learning, #Literature, #Llm, #LLMInference, #LLMs, #Math, #Mathematical, #Mcts, #Memory, #Meta, #MetaAI, #Method, #Ml, #MMLU, #Model, #Models, #Modular, #Multimodal, #Neuroscience, #Nvidia, #O4Mini, #One, #Openai, #OPINION, #Optimization, #Orchestration, #Other, #PAID, #Paper, #Performance, #Perplexity, #Pipelines, #Plan, #Planning, #Platform, #Power, #PowerUsers, #Process, #Productivity, #Prompting, #Radar, #RealTime, #Reasoning, #ReinforcementLearning, #Reliability, #Research, #Resources, #Review, #Robinhood, #Robotics, #Robots, #Safety, #SakanaAI, #Scale, #Scaling, #Scientific, #Search, #Silicon, #Space, #Startup, #Startups, #Study, #Success, #Superintelligence, #Tech, #Temperature, #Texas, #Thinking, #Time, #TinyAI, #Tools, #Training, #Transformer, #Transformers, #Tree, #Tuning, #University, #Ux, #Vector, #Version, #Work, #Workflows, #World, #WorldModels, #Writing, #ZeroShot
Published on The Digital Insider at https://is.gd/F2cY3F.

Comments