The Sequence Radar #735: OpenAI x AMD, DevDay, Reflection, and Gemini Enterprise

Surprising partnerships and major product releases.

Next Week in The Sequence:

Knowledge: Our series about AI interpretability discusses chain of thought monitoring as an interpretability method.
AI of the Week: We are going to discuss Samsung’s 7 million parameter model that shocked the AI world with its performance.
Opinion: Let’s discuss whether AMD has a chance to really compete with NVIDIA post OpenAI deal.

Subscribe Now to Not Miss Anything:

📝 Editorial: Last Week in AI: OpenAI x AMD, DevDay, Reflection, and Gemini Enterprise

This week’s AI news was less about shiny demos and more about industrial alignment: the compute supply chain, the agent application stack, the race to open‑frontier research, and a decisive enterprise land‑grab all moved in tandem. The throughline is simple but consequential: capacity is becoming the product, agents are becoming the interface, and enterprises are demanding reliability over novelty. Together these forces are reshaping the incentives of model companies, chipmakers, and platform providers.

OpenAI’s expanded partnership with AMD reframes compute not as a commodity purchase but as a co‑managed capability. Read beyond the headline and you see the operating model shifting: model roadmaps, training cadence, and inference service‑levels are increasingly co‑planned with silicon vendors. This is about diversifying the supply chain and turning GPU logistics into part of the model company’s control plane. The signal to builders is clear: design for hardware optionality and assume that capacity planning will meaningfully shape your product’s latency, cost, and reliability envelopes.

On the software plane, OpenAI’s DevDay repositioned ChatGPT as both runtime and distribution layer. “Apps in ChatGPT” and a first‑class Apps SDK make the chat surface feel more like an OS‑level shell, while an agent toolkit packages the operational guts—planning, tools, and execution harnesses—so teams can ship production‑grade assistants. Model tiering and enterprise controls round out the story: mix deep‑reasoning models where quality matters with lighter realtime variants where throughput and cost dominate. The thesis is pragmatic: the next platform play is agentic, and the store is the chat client you already have.

Meanwhile, Reflection AI’s launch plants a flag for an “open frontier” lab in the U.S.—ambitious, research‑forward, and explicitly oriented toward agentic coding and general‑purpose models. Beyond the branding, the bet is that credibility in the next cycle will come from shipping code‑doing systems while also contributing publishable science, rather than choosing one lane. If large incumbents consolidate platforms, Reflection is wagering that the next generational leap will be won by labs that move quickly between applied agents and frontier‑scale research, with openness as both a differentiator and a recruiting magnet.

Google, for its part, drew the enterprise map with Gemini Enterprise: a single front door for Gemini models, prebuilt agents, and no/low‑code tooling. Think of it as a structured path from chat to line‑of‑business automation, with security and data governance inherited from Google Cloud and Workspace. Early customer momentum and a straightforward packaging approach make it feel less like a point product and more like an operating environment—one that raises the competitive floor for what “enterprise‑ready” means in 2025: agent orchestration, system‑of‑record integrations, and measurable productivity deltas out of the box.

Taken together, these moves sketch an end‑to‑end pipeline: AMD supplies predictable capacity; OpenAI packages agentic primitives and a distribution channel; Google codifies enterprise adoption; and Reflection pressures the frontier from the open flank. The practical guidance for builders: design for heterogeneity across hardware and models, target agent reliability over raw benchmark deltas, and anchor go‑to‑market in real workflows rather than feature tours. The platform war has moved from the demo table to the P&L—where latency, uptime, compliance, and unit economics now matter as much as model specs. That’s a healthier game for everyone who plans to ship.

🔎 AI Research

Less is More: Recursive Reasoning with Tiny Networks

AI Lab: Samsung SAIL Montréal.

Summary: The paper introduces the Tiny Recursion Model (TRM)—a single 2-layer, ~7M-parameter network that recursively updates a latent state and answer with deep supervision, removing HRM’s two-network hierarchy, fixed-point assumptions, and extra ACT pass. TRM significantly improves generalization on hard puzzles, reaching ~87.4% on Sudoku-Extreme, ~85.3% on Maze-Hard, ~44.6% on ARC-AGI-1, and ~7.8% on ARC-AGI-2 with far fewer parameters than HRM and outperforming many large CoT LLMs.

Hybrid Architectures for Language Models: Systematic Analysis and Design Insights

AI Lab: FAIR at Meta.

Summary: The paper systematically compares inter-layer and intra-layer Transformer–Mamba hybrids and finds intra-layer fusion delivers the best quality-throughput Pareto, robust long-context retrieval, and smooth MoE scaling; practical recipes include ~1:5 hybrid ratios for efficiency and never placing Transformer blocks at the front. It offers concrete design guidance (e.g., block ratios/placement) and shows hybrids beat homogeneous Transformers or SWA at equal compute.

Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data

AI Lab: NVIDIA.

Summary: Injecting diverse reasoning data during pretraining yields durable gains (≈+19% on expert benchmarks) that later SFT cannot recover, while SFT benefits most from high-quality long CoT data; naively scaling mixed-quality SFT can even hurt math performance. The work proposes an asymmetric data-allocation rule—diversity early, quality late—and shows “latent” benefits from high-quality pretraining that unlock after SFT.

Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning

AI Lab: ETH Zürich & Max Planck Institute for Intelligent Systems.

Summary: The authors introduce TTC-RL, where an agent self-curates task-relevant training data at test time and continues RL training, substantially boosting pass@1 and pass@k on math/coding/GPQA (e.g., Qwen3-8B AIME/CodeElo gains) and approaching the performance of long-context “thinking” models. TTC-RL raises the performance ceiling, specializes tightly to target tasks, and the paper proposes a “latent improvement” metric to separate real reasoning gains from formatting.

Reinforced Generation of Combinatorial Structures: Applications to Complexity Theory

AI Lab: Google DeepMind & Google.

Summary: Using the AlphaEvolve coding agent, the paper discovers new extremal graphs/gadgets that (i) strengthen near-tight certification hardness via Ramanujan graphs and (ii) improve MAX-k-CUT inapproximability, achieving ~0.987 for MAX-4-CUT and 55/57 for MAX-3-CUT, while also refining algorithmic upper bounds. A key enabler is AI-driven 10,000× verifier speedups that make searching larger structures feasible, with all final results verified by brute-force checks.

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

AI Lab: Stanford University (with Texas A&M, UC San Diego & Lambda)

Summary: The paper introduces AGENTFLOW, a trainable agentic framework that coordinates planner–executor–verifier–generator modules and optimizes the planner in the loop via Flow-GRPO, which broadcasts a single trajectory-level outcome to each turn to solve long-horizon credit assignment. On 10 benchmarks it boosts accuracy (+14.9% search, +14.0% agentic, +14.5% math, +4.1% science) and even surpasses GPT-4o; Figure 1 visualizes these gains and the system setup.

🤖 AI Tech Releases

AgentKit

OpenAI launched AgentKit, a new native stack for building and managing AI agents.

ChatGPT Apps

OpenAI introduced a new generation of apps integrated into their chat experience.

Gemini 2.5 Computer Use

Google released Gemini 2.5 Computer Use, a model specialized on web and mobile agentic capabilities.

Codex GA

OpenAI announced the general availability of Codex with new features such as a new SDK and Slack integration.

Petri

Anthropic open sourced Petri, an open source auditing tool for AI safety research.

📡AI Radar

Reflection raises $2B at an $8B valuation to scale “open frontier” models aimed at rivaling DeepSeek and closed labs.
Datacurve lands a $15M Series A to build higher-quality datasets for coding agents and FM training.
OpenAI’s Sora app topped 1M downloads in under five days, outpacing ChatGPT’s first-week iOS pace.
Intel detailed its first 18A-node PC platform (“Panther Lake”), targeting high-volume production and AI PCs.
Anthropic is opening an India office and exploring a tie-up with Mukesh Ambani’s Reliance.
IBM and Anthropic struck a partnership to bring Claude into IBM’s dev tools and enterprise software.
Elon Musk’s xAI named former Morgan Stanley banker Anthony Armstrong as CFO amid broader exec changes.
Supermemory, a long-term memory API for AI apps built by a 19-year-old founder, secured backing from Google execs.
Sources say ex-Databricks AI head Naveen Rao’s new hardware startup is targeting a $5B valuation with a16z backing.
David AI raised $50M to expand its audio datasets “data layer” for training AI models.
xAI is reportedly expanding its ongoing round to $20B, with a structure tied to Nvidia GPU purchases.
OpenAI signed a multiyear AMD deal worth “tens of billions,” including warrants for up to a 10% AMD stake.

Published on The Digital Insider at https://is.gd/0fV8Tk.

Julio Marchi © Speaks Out Network

Search This Blog