The Sequence Radar #526: Llama 4 Scout and Maverick are Here! | By The Digital Insider

A major release for open source generative AI.

Created Using GPT-4o

Next Week in The Sequence:

Our series in AI evals continue with an exploration of the types of benchmarks. The opinion series explores the trend of all the major AI labs creating the same primitives( research, reasoning, search, etc) and its implications. In research, we dive into the new Llama 4 release. Engineering explores another cool framework.

You can subscribe to The Sequence below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Llama 4 Scout and Maverick are Here!

I had written an editorial for this newsletter and then Meta dropped the Llama 4 release Sat. Ohh well, time to rewrite the whole thing but it was definitely worth it because this Llama release is a big deal! New architectures and enhanced multi-modality.

Llama 4 debuts with two models: Llama 4 Scout and Llama 4 Maverick. Both are multimodal, capable of processing not just text but also audio, video, and images. This versatility positions them as foundational models for next-generation applications requiring rich contextual understanding across modalities.

At the core of Llama 4 lies a sophisticated mixture-of-experts (MoE) architecture. Llama 4 Scout incorporates 16 experts with 17 billion active parameters, uniquely optimized to run on a single H100 GPU. It also supports an unprecedented 10 million token context window, dramatically enhancing its capabilities in long-range dependency tasks such as legal document analysis, scientific literature synthesis, and full-codebase reasoning. Meanwhile, Llama 4 Maverick scales the architecture further to 128 experts and a total of 400 billion parameters, achieving state-of-the-art results in reasoning, multilingual tasks, and code generation, rivaling models like DeepSeek V3.

This release signals a strategic pivot from Meta, influenced by the growing competitiveness of open-source models from the broader AI community. With Llama 4 Behemoth in training — a model boasting 288 billion active parameters and nearly two trillion in total — Meta aims to surpass frontier models like GPT-4.5 and Claude 3 in STEM-specific benchmarks. Early signals suggest it is succeeding, with Behemoth outperforming across a wide range of logic and scientific reasoning evaluations.

What sets Llama 4 apart is not just its performance, but its accessibility. Both Scout and Maverick are released under open terms and available through platforms like Hugging Face and llama.com, reinforcing Meta's commitment to democratizing advanced AI tooling. This stands in stark contrast to the increasingly closed ecosystem of some competitors, and it provides researchers, developers, and startups the building blocks to create high-performance AI systems without prohibitive licensing barriers.

Meta's anticipated $65 billion AI infrastructure spend in 2025 underscores the seriousness of its intent. Llama 4 is not merely another model drop; it is a recalibration of the LLM landscape, balancing technical sophistication with an open invitation to build. In the era of scale, efficiency, and multimodality, Llama 4 is not just keeping pace — it is setting the tempo.

🔎 AI Research

Responsible Path to AGI

Google DeepMind published a very long paper titled “An Approach to Technical AGI Safety and Security” which outlines their proactive strategy for navigating the development of artificial general intelligence, emphasizing readiness, risk assessment, and collaboration. This approach involves a systematic exploration of four main risk areas – misuse, misalignment, accidents, and structural risks – and details their ongoing efforts in monitoring progress, implementing safety and security measures, and fostering an ecosystem for responsible AGI development.

CURIE

Google Research published a paper detailing CURIE, a new benchmark designed to evaluate the potential of large language models in scientific problem-solving by testing their long-context understanding, reasoning, and information extraction abilities across six scientific disciplines. Its top contributions include a suite of ten challenging tasks based on full-length scientific papers that represent realistic scientific workflows and novel model-based evaluation metrics to assess the varied and heterogeneous forms of ground truth annotations.

UniDisc

In the paper"Unified Multimodal Discrete Diffusion", researchers from Carnegie Mellon University present UniDisc, a novel unified multimodal discrete diffusion model for jointly understanding and generating text and images. The model leverages discrete diffusion through masking and demonstrates capabilities in tasks such as joint image-text inpainting, outperforming autoregressive models in terms of performance and inference-time compute, while also offering enhanced controllability and editability.

ECLeKTic

Google Research published ECLeKTic, a novel benchmark designed to evaluate the ability of large language models (LLMs) to transfer knowledge across different languages by using a closed-book question answering task based on single-language Wikipedia articles. This dataset assesses whether LLMs can access and utilize knowledge originally present in one language when questions are posed in other languages, highlighting discrepancies in current models and providing a tool for improvement.

AI for Software Eng

In the paper "AI for Software Engineering: The State of the Art and Promising Directions" researchers from University of California, Berkeley, MIT CSAIL, the authors provide a comprehensive overview of the field of AI for software engineering, highlighting its recent progress and remaining challenges. The paper offers a structured taxonomy of tasks beyond code generation, emphasizes key limitations of current models, and proposes promising research directions to achieve higher levels of automation in software development.

CodeARC

In the paper "CodeARC: A Dataset and Evaluation Framework for Inductive Program Synthesis of General-Purpose Python Functions", the authors introduce CodeARC, the first comprehensive dataset for general-purpose inductive program synthesis, featuring 1114 Python functions with initial input-output examples. This benchmark is designed to evaluate the ability of LLM agents to synthesize general programming tasks and employs differential testing for correctness evaluation, revealing that existing LLMs face significant challenges on this dataset.

🤖 AI Tech Releases

HallOumi

Oumi introduced a frontier model for claims verification.

Midjourney v7

Midjourney released its new image generation model.

Devin 2.0

Cognition released the second version of its software engineering agent.

Nova Act

Amazon introduced Nova Act, its new web browsing agent.

🛠 AI in Production

LLMs at Pinterest

LLMs shares details of its LLM-powered search capabilities.

📡AI Radar


#2025, #Accessibility, #Accidents, #Agent, #AgenticAI, #Agents, #AGI, #AGIDevelopment, #Ai, #AIInfrastructure, #AiPlatform, #AISystems, #AIVideo, #Analysis, #Anthropic, #Applications, #Approach, #Architecture, #Art, #Articles, #Artificial, #ArtificialGeneralIntelligence, #Assessment, #Audio, #Automation, #AutomationInSoftwareDevelopment, #AutoRegressive, #Benchmark, #Benchmarks, #Billion, #Book, #Building, #California, #CarnegieMellonUniversity, #Claude, #Claude3, #Code, #CodeGeneration, #Codebase, #Coding, #Collaboration, #Community, #Comprehensive, #ContextualUnderstanding, #Deal, #DeepMind, #Deepseek, #Details, #Developers, #Development, #Diffusion, #Drug, #DrugDiscovery, #Editorial, #Education, #Efficiency, #Employees, #Engineering, #Era, #Evaluation, #EvaluationMetrics, #Forms, #Framework, #Full, #Functions, #Funding, #Generative, #GenerativeAi, #Github, #Google, #GPT, #GPT4, #Gpu, #H100, #HuggingFace, #ImageGeneration, #Images, #Inference, #Infrastructure, #Inpainting, #Intelligence, #It, #Landscape, #Language, #LanguageModels, #Languages, #LargeLanguageModels, #Learning, #LED, #Legal, #Literature, #Llama, #Llm, #LLMs, #Logic, #Meta, #Metrics, #MidJourney, #Mit, #MixtureOfExperts, #Model, #Models, #MoE, #Monitoring, #Multimodal, #Newsletter, #One, #OpenSource, #OPINION, #Other, #PAID, #Paper, #Papers, #Performance, #Platform, #Platforms, #Proactive, #ProblemSolving, #Programming, #Python, #Qualcomm, #Radar, #Reasoning, #Research, #Risk, #RiskAssessment, #Risks, #Runway, #Safety, #Scale, #Scientific, #Search, #Security, #Shares, #Signals, #Softbank, #Software, #SoftwareDevelopment, #SoftwareEngineering, #Startups, #Stealth, #STEM, #Strategy, #Synthesis, #Tech, #Testing, #Text, #Time, #Tool, #Training, #Transfer, #Unified, #University, #Version, #Video, #Voice, #Web, #Wikipedia, #Work, #Workflows, #World, #Zencoder
Published on The Digital Insider at https://is.gd/1uTZar.

Comments