The Sequence Radar #751: Last Week in AI: K2’s Brains, Lambda’s Capacity, ARR Gravitas

An amazing new model that pushes the boundaries of reasoning plus more deals and ARR news.

Next Week in The Sequence:

We continue our series about synthetic data generation with an exploration of the different types of synthetic data. Our AI of the week covers the amazing Kimi K2 Thinking. The opinion section explores the state of memory in foundation models.

Subscribe and don’t miss out:

📝 Editorial: Last Week in AI: K2’s Brains, Lambda’s Capacity, ARR Gravit

This week crystallized three threads—technical progress, compute access, and business scale—that define where AI is heading.

Moonshot’s Kimi K2 is the week’s purest technical release. It pushes the Mixture-of-Experts playbook further: a huge total parameter budget, a relatively small number of experts “activated” per token, and a training stack centered on stability and agentic post-training. The “K2 Thinking” variant leans hard into long-horizon reasoning and tool use, with credible jumps on coding and logic tasks. Two takeaways matter. First, open weights plus a transparent training recipe give the community something to study and remix instead of guessing at secret sauce. Second, K2 is a concrete demonstration that MoE architectures—paired with disciplined data and post-training—can match or beat dense giants on cost/perf without relying on brute force. If you care about reproducibility and TCO, this is a north star, not a novelty.

On the industrial side, Lambda’s large-compute pact with Microsoft is the clearest sign that GPU scarcity is being professionalized into bookable, multi-year capacity. The shape of the deal is straightforward: a diversified pipeline of top-tier accelerators, delivered under a contract that smooths supply shocks and shortens time-to-capacity for model teams. Translation for practitioners: fewer roulette spins to secure training windows, more predictable paths for scaled fine-tuning and serving, and the beginning of a healthier spot market for bursts. Translation for startups: access to frontier-class clusters is drifting from “who you know” toward “what you can reserve,” which levels the playing field—at least a little—against hyperscaler lock-in.

Then there’s the money talk. OpenAI and Anthropic both projected eye-popping revenue run-rates, reframing AI as a utility build-out more than a software SKU. The headline isn’t just the numbers; it’s the operating model behind them. Premium agentic capabilities—code assistants, retrieval-augmented systems, and tool-driven workflows—are converting into durable enterprise spend. That, in turn, justifies long-cycle bets on data centers, power, and supply chains that would make a telecom CFO nod in recognition. It also resets expectations for everyone else: if the leaders are locking in capacity and turning usage into recurring revenue, the strategic gap isn’t only model quality; it’s infrastructure, go-to-market, and governance.

That’s the week: one open model that matters, one supply deal that changes access, and two revenue signals that justify the capex—and remind us that the center of gravity is shifting toward teams who can turn reasoning into reliable, governed workflows at scale.

🔎 AI Research

Scaling Agent Learning via Experience Synthesis

Authors: Meta Superintelligence Labs; FAIR at Meta; University of Chicago; UC Berkeley.

Summary: The paper introduces DreamGym, a unified RL framework that replaces costly real-environment rollouts with a reasoning-based experience model, an active replay buffer, and a curriculum task generator to synthesize diverse, causally grounded trajectories for agent training. Across WebShop, ALFWorld, and WebArena, DreamGym matches strong RL baselines using only synthetic interactions and delivers sizable sim-to-real gains while requiring far fewer real-world rollouts.

Nested Learning: The Illusion of Deep Learning Architectures

Authors: Google Research (USA).

Summary: The paper proposes Nested Learning (NL), a paradigm that treats modern models—including optimizers—as systems of nested, multi-level optimization problems with their own “context flows,” explaining in-context learning as compression of context and suggesting added “levels” for higher-order abilities. Using this lens, the authors introduce richer “deep” optimizers, a self-modifying sequence model, and a continuum memory system that together form the HOPE module, which shows strong results on language modeling and commonsense reasoning benchmarks.

CodeClash: Benchmarking Goal-Oriented Software Engineering

Authors: Stanford University; Princeton University; Cornell University.

Summary: The paper introduces CodeClash, a tournament-style benchmark where LMs iteratively edit codebases that then compete head-to-head in arenas (e.g., BattleSnake, Poker, RoboCode) to optimize high-level objectives—revealing capabilities beyond unit-test correctness. Across 1,680 tournaments, models showed creativity but shared failures in strategic reasoning and codebase maintenance; notably, top models lost every round to an expert human bot.

LiveTradeBench: Seeking Real-World Alpha with Large Language Models

Authors: University of Illinois Urbana–Champaign.

Summary: LiveTradeBench is a live, multi-market trading environment (U.S. equities and Polymarket) that streams prices/news and evaluates LLM agents on portfolio-allocation decisions, exposing gaps between static benchmark scores and real-world decision-making. Over 50 live trading days with 21 LLMs, performance in one market didn’t generalize to another and high LMArena scores didn’t predict superior trading outcomes.

RedCodeAgent: Automatic Red-Teaming Agent Against Diverse Code Agents

Authors: University of Chicago; University of Illinois Urbana–Champaign; VirtueAI; Microsoft Research; UK AI Safety Institute; University of Oxford; UC Berkeley.

Summary: RedCodeAgent is an automated red-teaming system that learns from past attacks (memory), combines multiple jailbreak tools (including code-substitution), and uses sandboxed execution to discover vulnerabilities in code agents beyond static benchmarks. It consistently outperforms baseline jailbreak methods across many risky scenarios and languages, while remaining efficient and uncovering new vulnerabilities in real-world assistants like Cursor and Codeium.

Towards a Future Space-Based, Highly Scalable AI Infrastructure System Design (Project Suncatcher)

Authors: Google Research.

Summary: Google proposes solar-powered “data centers” in space—constellations of satellites with free-space optical links and radiation-tested TPUs—to tap near-continuous solar energy and reduce terrestrial resource strain; formation-flying and short-range DWDM optical links are key enablers. Early analyses show feasibility across inter-satellite bandwidth, orbital control, TPU radiation tolerance, and launch-cost trajectories that could drop to ≲$200/kg to LEO by the mid-2030s.

🤖 AI Tech Releases

Kimi K2 Thinking

Moonshot AI released Kimi K2 Thinking, a reasoning model that excels in agentic tasks.

Magentic Marketplace

Microsoft open sourced Magentic Marketplace, an open source simulation environment for agentic markets.

📡AI Radar

OpenAI CEO Sam Altman says the company expects to end 2025 above $20B ARR and has roughly $1.4T in data-center commitments over the next eight years (statement on X).
Amazon unveils Kindle Translate (beta) to help KDP authors publish AI-translated ebooks (English↔Spanish; German→English to start).
Inception raises $50M to build diffusion-based LLMs for code and text, aiming for big latency/efficiency gains.
Snap and Perplexity sign a partnership to bring conversational AI search into Snapchat.
Replika founder Eugenia Kuyda launches Wabi (“YouTube for apps”) with a $20M pre-seed.
SoftBank and OpenAI form “SB OAI Japan,” a joint venture to market the Crystal/“Cristal” enterprise AI offering in Japan starting 2026.
Anthropic’s internal plan (as reported) targets up to $70B revenue and $17B cash flow in 2028, driven by B2B demand.
Lambda announces a multibillion-dollar, multi-year agreement with Microsoft to deploy AI infrastructure using tens of thousands of NVIDIA GPUs.
Poolside: following its Project Horizon announcement (a 2-GW Texas AI campus with CoreWeave as anchor), reports say NVIDIA is weighing up to a $1B investment. (Company context + report.) (poolside.ai)
AUI raises $20M at a $750M valuation cap, highlighting a neurosymbolic AI breakthrough. (Business Wire)

Published on The Digital Insider at https://is.gd/znGymC.

Julio Marchi © Speaks Out Network

Search This Blog