The Sequence Radar #739: Last Week in AI: From Vibes to Verbs: Agent Skills, Haiku 4.5, Veo 3.1, and nanochat | By The Digital Insider

Lots of fun developments for practical AI applications.

Created Using GPT-5

Next Week in The Sequence:

A few fun things: we will continue our series about AI interpretability. We will be releasing a long piece about fine-tuning vs. reinforcement learning that you cannot miss and will dive into Anthropic’s new Agent Skills.

Subscribe Now to Not Miss Anything:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Last Week in AI: From Vibes to Verbs: Agent Skills, Haiku 4.5, Veo 3.1, and nanochat

This week in AI was just a lot of fun: the frontier is racing, but the tooling is finally congealing into something you can depend on. Fewer magic tricks, more scaffolding. You can feel the distance compress between an idea, a script, and a shipped product.

Anthropic’s Agent Skills shift agents from “one giant brain” to a set of precisely scoped capabilities you can load on demand. Instead of a universal assistant improvising across everything, Claude can snap into a well-defined mode—say, Excel analyst, RFP writer, or procurement agent—each packaged with instructions, tools, and resources. That sounds mundane, but real enterprises run on checklists, templates, and compliance. By turning those artifacts into first-class skills, you get repeatability, auditability, and fewer accidental side quests. In practice this looks like clean interfaces: a skill declares what it can do, which APIs it can call, and how outputs are formatted. This also reduces context bloat: you don’t stuff the model with the whole company; you mount the one binder that matters and detach it when you’re done.

Alongside that procedural upgrade, Claude Haiku 4.5 leans into the small-but-capable regime. The appeal is not just latency or price—it’s the idea that most work doesn’t need Olympian IQ, it needs a fast, reliable contributor who shows up instantly and follows the playbook. Haiku 4.5 claims near-Sonnet coding quality at a fraction of the cost with materially lower time-to-first-token. When you pair Haiku with Agent Skills, you start designing systems around time-to-useful: a lightweight model spins up, mounts two or three skills (style guide, spreadsheet ops, vendor database), executes with crisp boundaries, then gets out of the way. This is how you scale to thousands of concurrent, low-variance tasks without melting your budget.

On the creative side, Google DeepMind Veo 3.1 nudges video generation from “cool clips” toward directable sequences. The headline is control. You can specify characters, locations, objects, transitions, and iterate toward continuity normally earned in an editor. Audio gets cleaner, motion is more stable, and the model is less surprised by your intentions. The important mental shift is to treat video synthesis as a programmable pipeline, not prompt roulette. The more granular the handles—shot duration, camera intent, scene constraints—the more you can unit-test narrative structure the same way you test code paths. For teams building ads, explainers, or product demos, this moves generative video from whimsical novelty into an iterative craft.

Finally, Andrej Karpathy’s “nanochat” is this week’s best educational artifact. It’s an end-to-end ChatGPT-style system distilled to the essentials: tokenizer, pretraining, SFT, RL, eval, inference, and a minimal web UI. The superpower here is line-of-sight: every stage is short enough to read and cheap enough to run, so the path from blank GPU to functional chat agent is hours, not weeks. That lowers the barrier for students and teams alike: clone, run, modify, measure. Want to experiment with a custom reward model? Swap a few lines. Curious about inference quirks? Tweak the sampler and observe. In a field that often hides complexity behind opaque stacks, nanochat is a public service—an opinionated baseline you can reason about and extend.

If there’s a theme, it’s specialization with handles: scoped agency that loads the right binder, compact models that cut latency, video systems that expose practical levers, and a reference stack you can actually read. Less spectacle, more engineering. That’s progress.

🔎 AI Research

DeepMMSearch-R1: Empowering Multimodal LLMs in Multimodal Web Search

AI Lab: Apple, Johns Hopkins University

Summary: DeepMMSearch-R1 is a multimodal LLM that performs on-demand, multi-turn web searches with dynamic query generation and cropped image-based search, trained via SFT followed by online RL. It also introduces the DeepMMSearchVQA dataset and a three-tool pipeline (text search, grounding/cropping, image search) to enable self-reflection and state-of-the-art results on knowledge-intensive VQA.

Robot Learning: A Tutorial

AI Lab: University of Oxford & Hugging Face

Summary: This tutorial surveys the shift from classical, model-based control to data-driven robot learning, and walks through RL, behavioral cloning, and emerging generalist, language-conditioned robot policies. It also presents the open-source lerobot stack and LeRobotDataset with practical, ready-to-run examples across the robotics pipeline.

Tensor Logic: The Language of AI

AI Lab: Pedro Domingos, University of Washington

Summary: The paper proposes “tensor logic,” a programming model that unifies neural and symbolic AI by expressing rules as tensor equations (einsum) equivalent to Datalog operations, enabling learning and inference in a single framework. It demonstrates how to implement neural nets, symbolic reasoning, kernel machines, and graphical models, and discusses scaling via Tucker decompositions and GPU-centric execution.

Agent Learning via Early Experience

AI Lab: Meta Superintelligence Labs, FAIR at Meta, The Ohio State University,

Summary: The authors introduce “early experience,” a reward-free training paradigm where agents use the consequences of their own exploratory actions as supervision, with two concrete strategies—implicit world modeling and self-reflection. Across eight environments, these methods improve effectiveness and OOD generalization and provide strong initializations for downstream RL.

Qwen3Guard Technical Report

AI Lab: Qwen

Summary: Qwen3Guard introduces multilingual safety guardrail models in two variants—Generative (instruction-following tri-class judgments: safe/controversial/unsafe) and Stream (token-level, real-time moderation for streaming)—released in 0.6B/4B/8B sizes with support for 119 languages. It reports state-of-the-art prompt/response safety classification across English, Chinese, and multilingual benchmarks and is released under Apache 2.0.

🤖 AI Tech Releases

Claude Haiku 4.5

Anthropic released Claude Haiku 4.5, its latest small model that showcases performance comparable to Sonnet 4.

Veo 3.1

Google DeepMind released the new version of its marquee video generation model.

Qwen3-VL

Alibaba Qwen released Qwen3-VL 4B and 8B, two small models optimized for reasoning and instruction following.

nanochat

Andrej Karpathy released nanochat, an open source training and inference pipeline similar to ChatGPT.

Agent Skills

Anthropic released Agent Skills to specialize Claude tasks with script, resources and instructions.

📡AI Radar



Published on The Digital Insider at https://is.gd/RyR7mY.

Comments