The Sequence Radar #539: Keep an Eye on Alibaba’s ZeroSearch

The new model represents a major milestone in information retrieval.

Next Week in The Sequence:

Our series about evals continues with a review of instruction following benchmarks. In engineering, we looked into the newly released Llama Firewall. Our opnion section looks at some new ideas for MCP. Research discusses the controversial paper The Leaderboard Illustion.

You can subscribe to The Sequence below:

📝 Editorial: Keep an Eye on Alibaba’s ZeroSearch

Alibaba has been one of the AI labs pushing the boundaries of frontier models with its Qwen model family. However, the chinese internet giant doesn’t seems to have plans to stop there.

A few days ago, Alibaba's release of ZeroSearch which marks a key technical advancement in retrieval-augmented training for language models. The framework introduces a self-supervised paradigm where large language models (LLMs) can simulate search engine behavior, removing dependence on commercial APIs such as Google Search. This shift not only lowers the financial burden of reinforcement learning-based training but also provides a controlled environment to shape the retrieval process. ZeroSearch challenges a core assumption in modern LLM training: that high-quality external search queries are necessary for effective information retrieval and question answering.

At the heart of ZeroSearch is a curriculum-driven training strategy. The system begins with a supervised fine-tuning (SFT) phase, where the LLM is taught to generate both relevant and non-relevant documents for a given query. This serves as the basis for a simulated search environment. Reinforcement learning with proximal policy optimization (PPO) is then applied in increasingly difficult rollouts. The curriculum ensures the model experiences a broad range of query-retrieval interactions, improving its reasoning and grounding without relying on real-time internet access.

ZeroSearch's empirical performance is notable. In evaluation across seven QA benchmarks, a 7B parameter model trained with simulated retrieval performed on par with models trained using actual Google Search data. More impressively, a 14B model trained entirely using ZeroSearch's simulation paradigm outperformed its real-search-trained counterpart. Cost comparisons further validate the design: training with external API calls cost $586.70 for 64K queries, while the simulated approach using a 14B model on four A100 GPUs cost just $70.80—an 88% cost reduction.

A key technical benefit of ZeroSearch is its ability to decouple retrieval quality from search engine output noise. Traditional methods inherit variability and biases from commercial engines, while ZeroSearch enables fine-grained control over retrieval data. This introduces a new axis of optimization in LLM training, where the quality and diversity of retrieved documents can be systematically tuned to support task-specific capabilities such as fact verification, grounded generation, or multi-hop reasoning.

In addition to the methodological innovation, Alibaba has open-sourced the ZeroSearch framework, datasets, and pre-trained checkpoints. This decision reflects a broader shift toward reproducible and democratized AI research. Developers and researchers can now integrate ZeroSearch with their own training pipelines, enabling cost-efficient experiments with large-scale RLHF and retrieval conditioning without external API constraints.

ZeroSearch sets a precedent for the future of retrieval-augmented generation. It presents a credible alternative to web-based search as a training signal, with strong implications for cost reduction, model alignment, and safety. For AI developers focused on scalable training regimes, reinforcement learning, and search-enhanced reasoning, ZeroSearch offers a technically rigorous and open alternative that redefines how retrieval can be integrated into foundation model development.

🔎 AI Research

ZEROSEARCH

In the paper "ZEROSEARCH: Incentivize the Search Capability of LLMs without Searching," researchers from Tongyi Lab, Alibaba Group introduce a reinforcement learning framework that enhances LLM search capabilities without relying on real search engines. Instead, they use a simulation LLM to generate both relevant and noisy documents through supervised fine-tuning and curriculum-based rollouts, achieving performance that rivals or surpasses real search engine-based methods at significantly lower cost.

High Risk Data Generation

“Teaching Models to Understand (but not Generate) High-risk Data” – University of Southern California & Allen Institute for AI
This paper introduces SLUNG, a pretraining paradigm that allows language models to learn from high-risk content (e.g., toxic or copyrighted text) without being trained to generate it. By applying masked or unlikelihood loss to high-risk tokens, the model learns to understand but not reproduce such data, improving performance on tasks like toxicity detection without increasing harmful outputs.

Scalable CoT

In the paper "Scalable Chain of Thoughts via Elastic Reasoning," researchers from Salesforce AI Research introduce Elastic Reasoning, a framework that enables large reasoning models to produce more efficient chain-of-thought outputs by separating the reasoning process into “thinking” and “solution” phases with independently allocated budgets. This approach improves performance under strict inference constraints and generalizes well to unseen budget scenarios while maintaining or even improving solution quality.

Complex Problem Solving

“Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey” – Ant Group, Zhejiang University, and The University of Hong Kong. This survey reviews how large language models can be used for multi-step reasoning in complex domains such as mathematics, scientific research, and software engineering. It explores techniques including chain-of-thought prompting, knowledge retrieval, and verification mechanisms, and proposes a comprehensive framework for problem-solving with LLMs, while highlighting their current limitations.

Beyond Theorem Proving

“Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving” – Shanghai Jiao Tong University
This work formalizes the notion of problem-solving as a deterministic MDP and proposes FPS (Formal Problem-Solving) and D-FPS (Deductive FPS) frameworks to decouple reasoning from answer verification in formal environments like Lean. It introduces new benchmarks (FormalMath500, MiniF2F-Solving, PutnamBench-Solving) and the Restricted Propositional Equivalence (RPE) method for robust symbolic evaluation.

Benchmarking LLMs’ Swarm Intelligence

“Benchmarking LLMs’ Swarm Intelligence” – Renmin University of China
This paper proposes SwarmBench, a benchmark for evaluating LLMs' capacity for decentralized coordination in multi-agent systems with limited local perception and communication. It simulates tasks like pursuit, flocking, and transport, revealing that while some models demonstrate emergent strategies, current LLMs struggle with robust planning and alignment under swarm-like constraints.

Improving Text Comprehension

“LLM-based Text Simplification and its Effect on User Comprehension and Cognitive Load” – Google researchers present a Gemini-based system for minimally lossy text simplification and validate it with a randomized study of 4,563 participants. The study shows that simplified texts improve comprehension (+3.9% MCQ accuracy), confidence, and perceived ease, especially in challenging domains like biomedical literature, demonstrating the real-world utility of LLM-driven simplification.

🤖 AI Tech Releases

Anthropic Web Search API

Anthropic launched web search via its API.

Llama Firewall

Meta open sourced a security framework for Llama.

LeChat Enterprise

Mistral announced LeChat Enterprise, its enterprise platform.

🛠 AI in Production

Forecasting at Lyft

Lyft discusses the architecture for temporal forecasting.

📡AI Radar

Instacart CEO joined OpenAI as CEO of Applications.
At Stripe’s Sessions conference, Mark Zuckerberg unveiled a vision to transform the ad industry with AI.
ServiceNow, launched AI Control Tower, a platform to govern and manage AI agents.
ServiceNow also signed a definitive agreement to buy Data.World.
Robert Fergus, who co-founded Facebook AI Research(FAIR) before leaving for Google, is back at Meta leading FAIR.
AI agents startup Sett raised $27 million in a new round of funding.
Agentic data insights platform Wisdom AI announced a $23 million funding round.
The US Treasury Department is investigating Benchmark’s investment in chinese AI startup Manus AI.
CoreWeave wants to raise $1.5 billion in debt after the dissapointment of the IPO.
Task-centric AI model builder Fastino raised $17.5 million in a new round.
Business AI agentic platform Relevance AI raised a $24 million Series B.

#Agent, #Agents, #Agreement, #Ai, #AIAGENTS, #AiModel, #AIResearch, #Alibaba, #Amp, #API, #APIs, #Applications, #Approach, #Architecture, #Behavior, #Benchmark, #Benchmarking, #Benchmarks, #Biases, #Billion, #Budgets, #Business, #California, #CEO, #ChainOfThoughtPrompting, #Communication, #Comprehension, #Comprehensive, #Conference, #Content, #Data, #DataWorld, #Datasets, #Design, #Detection, #Developers, #Development, #Diversity, #Documents, #Domains, #Editorial, #Engine, #Engineering, #Engines, #Enterprise, #Environment, #Evaluation, #Eye, #Facebook, #Fair, #Financial, #Firewall, #Foundation, #Framework, #Funding, #Future, #Gemini, #Google, #GoogleSearch, #GPT, #GPUs, #Heart, #How, #Ideas, #Industry, #Inference, #InformationRetrieval, #Innovation, #Insights, #Intelligence, #Internet, #Investment, #It, #Language, #LanguageModels, #LargeLanguageModels, #Leaderboard, #Learn, #Learning, #Literature, #Llama, #Llm, #LLMs, #ManusAI, #MarkZuckerberg, #Mathematics, #MCP, #Meta, #Method, #Milestone, #Model, #Models, #MultiAgent, #MultiStepReasoning, #Noise, #Notion, #One, #Openai, #Optimization, #PAID, #Paper, #Parameter, #Perception, #Performance, #Pipelines, #Planning, #Platform, #Policy, #PPO, #ProblemSolving, #Process, #Prompting, #ProximalPolicyOptimization, #Query, #Qwen, #Radar, #Raise, #RealTime, #Reasoning, #ReasoningModels, #Reduction, #ReinforcementLearning, #RelevanceAi, #Research, #Review, #Reviews, #Risk, #RLHF, #Safety, #Salesforce, #Scalable, #Scale, #Scientific, #Search, #SearchEngine, #SearchEngines, #Searching, #Security, #Servicenow, #Signal, #Simulation, #Software, #SoftwareEngineering, #Startup, #Strategy, #Study, #Survey, #Teaching, #Tech, #Text, #Thinking, #Time, #Toxic, #Toxicity, #Training, #Transform, #Transport, #Tuning, #University, #Us, #Vision, #Web, #WebSearch, #Work, #World, #Zuckerberg

Published on The Digital Insider at https://is.gd/RaM2Vp.

Julio Marchi © Speaks Out Network

Search This Blog