The Sequence Radar: AI Browsers are Coming | By The Digital Insider

Perplexity and OpenAI announced in initiatives in that area.

Created Using GPT-4o

Next Week in The Sequence:

Over the next few weeks, you are going to see us experimenting with new content sections based on the installments that regularly get more traction. In a market inundanted by newsletters that published paper’s analysis done by LLMs without any original opinion, I would like to double down in the things that we can do best: keep you current in AI and discuss original ideas. I have some fresh ideas that I would like to test in those areas.

Let’s Go! You can subscribe to The Sequence below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: AI Browsers are Coming

Get accustomed to this term: AI browser because you are going to hear a lot about it in the next few months!

After decades in which Google Chrome and Microsoft Edge have dominated the browser market, a new wave of AI-first platforms is poised to challenge their hegemony by embedding advanced language models directly into the browsing core. Platforms like Perplexity’s Comet and the rumored OpenAI browser are transforming our web gateway from a static rendering engine into a dynamic AI assistant, offering conversational search, real-time content synthesis, and automated workflows that redefine navigation and productivity.

Perplexity’s Comet, launched in July 2025, exemplifies this shift by placing an AI agent in the sidebar to parse on-page content, manage multiple tabs, and automate multi-step workflows, all within a familiar Chromium shell that supports existing extensions and bookmarks. Early adopters laud the browser’s uncanny ability to distill hours of online research into concise bullet points and to handle end-to-end tasks like finding the best hotel deals or populating spreadsheets. Meanwhile, OpenAI’s impending release promises to extend the ChatGPT ecosystem into a full-fledged browser, where users may interface with web content solely through a chat window that interprets commands and orchestrates actions behind the scenes.

What sets AI-first browsers apart is their natural language interface, which transcends traditional keyword queries in favor of nuanced, conversational dialogue. This allows users to ask follow-up questions, refine search parameters on the fly, and receive contextually aware responses tailored to their needs. In professional settings—be it legal research, academic literature reviews, or market analysis—the ability to auto-summarize disparate sources and maintain thematic thread across web pages can dramatically cut down on cognitive load and accelerate decision-making.

Traditional browser vendors are not standing still. Google has woven generative features and Bard integrations into Chrome, and Microsoft’s Edge preview of Copilot modes hints at a future where every browser window is an AI cockpit. Even niche players like The Browser Company are experimenting with embedded assistants that perform on-the-fly translation, sentiment analysis, and intelligent shopping recommendations. These moves underscore the fact that the next frontier in browser innovation is not new layout designs or performance benchmarks, but the depth and responsiveness of integrated AI capabilities.

As Perplexity, OpenAI, and their competitors vie for a share of browser mindshare, the ultimate question becomes not which homepage we set but which AI collaborator we choose to navigate the web. In this unfolding chapter of digital exploration, browsers will no longer be defined by tabs and toolbars, but by the intelligence they bring to each click and keystroke. For developers, content creators, and policymakers alike, the mission is clear: to harness this wave of AI-driven browsing in ways that maximize efficiency, uphold trust, and safeguard the open ethos of the internet.

🔎 AI Research

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Salesforce Research
Builds on VLM2Vec by introducing a unified embedding space that effectively aligns video, image, and document representations through a novel contrastive loss and joint cross-modal attention modules. Demonstrates state-of-the-art results on video-text retrieval (YouCook2, MSR-VTT), image-text retrieval (MSCOCO, Flickr30K), and visual document understanding (DocVQA) benchmarks while improving computational efficiency.

Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

SAP Labs
Introduces a two-stage fine-tuning framework—first on synthetic, tool-augmented data to teach correct tool invocation, then on human-annotated dialogues to model realistic, disambiguation-driven interactions. Achieves a 69 percent reduction in harmful or hallucinated API calls compared to standard instruction tuning, while improving user-observed task success by 33 percent.

LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing

Stanford University
Presents LitBench, the first benchmark for evaluating LLM-generated creative writing, featuring 2,480 debiased, human-labeled test pairs across four literary genres and a 43,827-pair training corpus of human preference labels. Includes guidelines to reduce annotator bias and demonstrates that current LLMs lag significantly behind human writers, highlighting directions for future model improvements.

Evaluating Large Language Models Trained on Code

OpenAI
Systematically assesses four code-focused LLMs fine-tuned on Python, JavaScript, Java, and Go, revealing that code models excel at automated code write-and-explain tasks but underperform compared to general LLMs on code summarization and reasoning benchmarks. Finds that specialized models benefit most from chain-of-thought prompting in reasoning-intensive tasks, informing best practices for code model deployment.

MedGemma Technical Report

Google Research & Google DeepMind
Introduces MedGemma, a suite of open, medically-tuned vision-language models built on the Gemma 3 architecture, with a 4 B-parameter multimodal variant and a 27 B-parameter text-only variant. Demonstrates strong zero-shot and fine-tuned performance across 25 medical benchmarks—including radiology report generation, image classification, and EHR question answering—while maintaining competitive general-purpose capabilities.

🤖 AI Tech Releases

Grok4

xAI released Grok4, and the results are quite impressive.

Comet

Perplexity launched Comet, its AI-first web browser.

Pin-4 Flash Reasoning

Microsoft released Phi-4-mini-flash-reasoning, a reasoning LLM optimized for inference speed.

Open Model Architecture

The LMSys research lab released Open Model Architecture(OME), a new Kubernetes platform with models as first-class components.

Reachy Mini

Hugging Face launched Richi Mini, an open source robot design for human-robt integration.

📡AI Radar


#000, #2025, #Acquisition, #Agent, #Agents, #Ai, #AiAgent, #AIAGENTS, #AiAssistant, #AICoding, #AIInfrastructure, #AISearch, #Amazon, #AmazonWebServices, #Amp, #Analysis, #Anthropic, #API, #Architecture, #Art, #Assistants, #Attention, #AWS, #Azure, #Bard, #Benchmark, #Benchmarks, #Bias, #Billion, #Browser, #CEO, #ChainOfThoughtPrompting, #Challenge, #ChatGPT, #Chrome, #Chromium, #Cloud, #Cockpit, #Code, #Coding, #Comet, #Companies, #Compliance, #Content, #ContentCreators, #Creators, #Cutting, #Data, #Deal, #Deals, #DeepMind, #Deployment, #Design, #Developers, #Dialogue, #Double, #EARLY, #Edge, #Editorial, #Efficiency, #Engine, #Enterprise, #Equity, #Evaluation, #Excel, #Executives, #Extensions, #Eyes, #Features, #Federal, #Flash, #Framework, #Full, #Funding, #Future, #FutureOfAI, #Gemini, #Gemma, #Gemma3, #Generative, #Global, #Google, #GoogleCloud, #GPT, #Guidelines, #Hiring, #Human, #Ideas, #ImageClassification, #Images, #Inference, #Infrastructure, #Innovation, #Integration, #Integrations, #Intelligence, #Internet, #Investing, #It, #Java, #JavaScript, #Jobs, #Kubernetes, #Labels, #Langchain, #Language, #LanguageModels, #LargeLanguageModels, #Layoffs, #Layout, #LED, #Legal, #LESS, #Link, #Literature, #Llm, #LLMs, #Lmsys, #MarketAnalysis, #Medgemma, #Medical, #Mgx, #Microsoft, #MicrosoftEdge, #Mission, #Mistral, #Model, #Models, #Multimodal, #Natural, #NaturalLanguage, #Navigation, #NewYork, #Ome, #OpenSource, #Openai, #Operations, #OPINION, #PAID, #Paper, #Parameter, #Partners, #Partnership, #Performance, #Perplexity, #PHI, #Phi4, #Pin, #Platform, #Platforms, #Positioning, #Productivity, #Professional, #Prompting, #Python, #Radar, #Radiology, #RAG, #Raise, #RealTime, #Reasoning, #Reduction, #Report, #Reports, #Research, #Reviews, #Robot, #SaaS, #Savings, #Scale, #Scientific, #Scores, #Search, #SentimentAnalysis, #Sidebar, #SmarterAI, #Space, #Speed, #Startups, #Success, #Synthesis, #Tech, #Test, #Text, #Time, #Tool, #Training, #Translation, #Trust, #Tuning, #Unified, #Us, #VC, #Vendors, #Video, #Videos, #Vision, #VisionLanguage, #Wave, #Web, #WebBrowser, #Work, #Workflows, #Writing, #ZeroShot
Published on The Digital Insider at https://is.gd/zUKyvb.

Comments