The Sequence Radar #549: Google, Microsoft and Anthropic Monster AI Week | By The Digital Insider

Probably the biggest week of AI releases in recent years.

Created Using GPT-4o

Next Week in The Sequence:

Our series about evals dives into safety benchmars. In our opinion installment we discuss a simple but controversial topic: what is an AI agent? In engineering we will chat about the new Magentic-UI framework while the research section will focus on Meta’s recent J1 paper.

You can subscribe to The Sequence below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: Google, Microsoft and Anthropic Monster AI Week

Apparently, there are not enough weeks in a year in AI! So much so that Anthropic, Microsoft and Google all decided to host major conferences and announcements last week. Really impossible to write about anything else when I have been playing with Claude 4 and Google

Google I/O 2025: A Leap Towards AGI

At Google I/O 2025, the spotlight was on Google's pursuit of Artificial General Intelligence (AGI). A centerpiece of this vision is Gemini 2.5, a model engineered to boost reasoning and creativity across a wide range of tasks. Google also introduced Project Astra, a universal AI assistant capable of real-time, multimodal interactions. Notably, Astra integrates with Android XR smart glasses—developed with Gentle Monster and Warby Parker—offering users a deeply immersive AR experience that merges the digital and physical worlds.

Reimagining Search and Creative Media

Google is redefining the search paradigm through its conversational AI Mode, supported by Deep Search and Project Mariner for highly contextual results. In the creative domain, the release of Veo 3 and Imagen 4 showcases new frontiers in AI-generated video and imagery. Veo 3 excels at producing synchronized audio-visual outputs, opening up novel possibilities in filmmaking. Complementing this, Google unveiled Flow, an AI filmmaking assistant designed to streamline creative workflows.

Microsoft Build 2025: The Era of Agentic AI

At Microsoft Build 2025, the focus turned toward agentic AI—autonomous agents designed to independently carry out complex tasks. Microsoft launched the Azure AI Foundry, a full-stack environment for building, monitoring, and deploying AI agents with a strong emphasis on reliability and governance. Accompanying this was the debut of Entra Agent ID, a security layer to ensure the safe and accountable operation of these agents.

Empowering Developers and Edge AI

Microsoft rolled out a suite of upgrades for developers, including enhancements to GitHub Copilot and the new Copilot Studio, enabling the orchestration of multi-agent systems. The introduction of Phi-4-mini also marked a significant step toward on-device intelligence, allowing AI to be embedded directly into web applications through the Edge browser. These efforts reflect Microsoft’s push to decentralize and democratize access to AI development.

Anthropic's Claude 4 Series

Anthropic revealed the Claude 4 series, led by Claude Opus 4 and Claude Sonnet 4, optimized for sustained and nuanced reasoning. Claude Opus 4 is particularly noteworthy for its ability to maintain contextual coherence over hours of interaction—ideal for sophisticated coding or analytical work. The models come with upgraded safety protocols, reinforcing Anthropic’s emphasis on alignment and ethical AI behavior.

Developer-First Features and Robust APIs

To enhance usability, Anthropic introduced new APIs featuring executable code blocks, persistent memory caching, and comprehensive file handling. These capabilities are tailored to help developers build more autonomous and intelligent agents capable of managing complex task flows with minimal intervention.

A Shared Vision for AI's Evolution

Collectively, these announcements reflect an industry-wide pivot toward more capable, agentic, and human-aligned AI systems. While each company pursues this goal from different angles, the convergence around autonomy, multimodality, and developer empowerment highlights a unified trajectory in AI evolution.

Join Me for a Chat About AI Evals and Benchmarks:

🔎 AI Research

Google Research at Google I/O

Google showcased its latest research efforts in AI across education, healthcare, and sustainability. Key initiatives include MedGemma and AMIE for AI-assisted healthcare, LearnLM for education aligned with learning science, and contributions to Gemini’s multilingual, grounded, and multimodal capabilities. The company also demonstrated AI models for wildfire detection (FireSat), quantum research, and scientific discovery through the AI Co-Scientist system.

Meta AI in Advanced Science

Meta AI announced a series of open-source breakthroughs spanning molecular modeling, language processing, and neuroscience. Highlights include Open Molecules 2025 (OMol25) and the Universal Model for Atoms (UMA), which together advance molecular property prediction and materials discovery. They also introduced Adjoint Sampling, a scalable reward-based generative modeling technique, and presented new neuroscience findings that align brain activity with LLM behavior during language learning. The research emphasizes Meta’s commitment to scientific openness, collaboration, and AI-for-science impact.

HumaniBench

InHumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation, researchers from the Vector Institute & University of Central Florida introduced HumaniBench introduces the first benchmark focused on evaluating large multimodal models (LMMs) against seven Human-Centered AI (HCAI) principles like fairness, empathy, and robustness. It comprises 32,000+ real-world image–question pairs and evaluates 15 state-of-the-art LMMs, revealing significant alignment gaps, especially in ethical and multilingual reasoning.

AGENTIF

In AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenariosresearchers from Tsinghua University & Zhipu AI present
AGENTIF is a benchmark for evaluating how well LLMs follow long and constraint-heavy instructions in real-world agentic tasks, constructed from 50 agent-based applications. It introduces a taxonomy of constraint types and shows that most LLMs perform poorly under complex instruction-following settings due to issues with conditionals, tools, and semantic constraints.

AceReason-Nemotron

AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement LearningNVIDIA presents AceReason-Nemotron demonstrates that large-scale reinforcement learning (RL) on math-first then code prompts significantly improves reasoning abilities in 7B and 14B models, outperforming distillation-based baselines on AIME 2025 and LiveCodeBench. The authors also contribute a robust data pipeline and training methodology, showing that RL can push SFT-trained models to solve previously unsolvable problems.

LMGAME-BENCH

LMGAME-BENCH: How Good are LLMs at Playing Games?
AI Labresearchers from UC San Diego, UC Berkeley, MBZUAI
introduced LMGame-Bench introduces a benchmark suite using six diverse video games to test LLM abilities in perception, memory, and long-horizon planning, enhanced with perception and memory scaffolds. It reveals distinct model behaviors, confirms cross-game generalization through RL fine-tuning, and highlights benchmark transferability to planning tasks like WebShop and Blocksworld

🤖 AI Tech Releases

Gemma 3n

Google released Gemma 3n preview, a new version optimized for mobile computing.

Devstral

Mistral released Devstral, an agentic system for software engineering tasks.

Magentic-UI

Microsoft released Magentic-UI, an agent built on the Magentic framework specialized on web tasks.

🛠 AI in Production

1000 AI Models

Meta discusses the architecture supporting 1000+ AI models for Instagram.

📡AI Radar


#000, #2025, #Acquisition, #Agent, #AgenticAI, #Agents, #AGI, #Ai, #AiAgent, #AIAGENTS, #AiAssistant, #AIDevelopment, #AiModel, #AIModels, #AISystems, #Alation, #Amd, #Amie, #Amp, #Android, #Announcements, #Anthropic, #APIs, #Applications, #Ar, #Architecture, #Art, #Artificial, #ArtificialGeneralIntelligence, #Astra, #Atoms, #Audio, #Autonomous, #AutonomousAgents, #Azure, #Behavior, #Benchmark, #Benchmarking, #Benchmarks, #Brain, #BrainActivity, #Browser, #Building, #Claude, #ClaudeSonnet, #Code, #Coding, #Collaboration, #Comprehensive, #Computing, #ConversationalAi, #Creativity, #Data, #DataCenter, #DataPipeline, #Deploying, #Detection, #Developer, #Developers, #Development, #Discovery, #Distillation, #Edge, #EdgeBrowser, #Editorial, #Education, #Empathy, #Emphasis, #Engineering, #Environment, #Era, #Ethical, #EthicalAi, #Evaluation, #Evolution, #Eyewear, #Features, #Focus, #Framework, #Full, #Game, #Games, #Gemini, #Gemma, #Generative, #Github, #GitHubCopilot, #Google, #Governance, #GPT, #Hardware, #Healthcare, #Horizon, #How, #Human, #Imagen, #Impact, #Industry, #Instagram, #Intelligence, #Interaction, #Issues, #It, #Language, #LanguageLearning, #LanguageModels, #LargeLanguageModels, #LargeMultimodalModels, #Learning, #LED, #Llm, #LLMs, #Manufacturing, #Materials, #Math, #Memory, #Meta, #MetaAI, #Microsoft, #Mobile, #Model, #Modeling, #Models, #Molecules, #Money, #Monitoring, #MultiAgent, #Multimodal, #Nemotron, #Neuroscience, #Nvidia, #O3, #Openai, #Operator, #OPINION, #Opus, #Orchestration, #PAID, #Paper, #Perception, #Persistent, #PHI, #Phi4, #Planning, #Project, #ProjectAstra, #Prompts, #Quantum, #Radar, #RealTime, #Reasoning, #ReinforcementLearning, #Reliability, #Research, #Safety, #Sale, #Scalable, #Scale, #Science, #Scientific, #ScientificDiscovery, #Scientist, #Search, #Security, #Software, #SoftwareEngineering, #Solve, #Sonnet, #Spotlight, #Stack, #Sustainability, #Tech, #Test, #Time, #Tools, #Training, #Tuning, #Uc, #UI, #Unified, #University, #Vector, #Veo, #Version, #Video, #VideoGames, #Vision, #Web, #WebDevelopment, #WhatIs, #Work, #Workflows, #World, #XR, #ZT
Published on The Digital Insider at https://is.gd/Hg39S3.

Comments