The Sequence Radar: AI Browsers are Coming

Perplexity and OpenAI announced in initiatives in that area.

Next Week in The Sequence:

Over the next few weeks, you are going to see us experimenting with new content sections based on the installments that regularly get more traction. In a market inundanted by newsletters that published paper’s analysis done by LLMs without any original opinion, I would like to double down in the things that we can do best: keep you current in AI and discuss original ideas. I have some fresh ideas that I would like to test in those areas.

Let’s Go! You can subscribe to The Sequence below:

📝 Editorial: AI Browsers are Coming

Get accustomed to this term: AI browser because you are going to hear a lot about it in the next few months!

After decades in which Google Chrome and Microsoft Edge have dominated the browser market, a new wave of AI-first platforms is poised to challenge their hegemony by embedding advanced language models directly into the browsing core. Platforms like Perplexity’s Comet and the rumored OpenAI browser are transforming our web gateway from a static rendering engine into a dynamic AI assistant, offering conversational search, real-time content synthesis, and automated workflows that redefine navigation and productivity.

Perplexity’s Comet, launched in July 2025, exemplifies this shift by placing an AI agent in the sidebar to parse on-page content, manage multiple tabs, and automate multi-step workflows, all within a familiar Chromium shell that supports existing extensions and bookmarks. Early adopters laud the browser’s uncanny ability to distill hours of online research into concise bullet points and to handle end-to-end tasks like finding the best hotel deals or populating spreadsheets. Meanwhile, OpenAI’s impending release promises to extend the ChatGPT ecosystem into a full-fledged browser, where users may interface with web content solely through a chat window that interprets commands and orchestrates actions behind the scenes.

What sets AI-first browsers apart is their natural language interface, which transcends traditional keyword queries in favor of nuanced, conversational dialogue. This allows users to ask follow-up questions, refine search parameters on the fly, and receive contextually aware responses tailored to their needs. In professional settings—be it legal research, academic literature reviews, or market analysis—the ability to auto-summarize disparate sources and maintain thematic thread across web pages can dramatically cut down on cognitive load and accelerate decision-making.

Traditional browser vendors are not standing still. Google has woven generative features and Bard integrations into Chrome, and Microsoft’s Edge preview of Copilot modes hints at a future where every browser window is an AI cockpit. Even niche players like The Browser Company are experimenting with embedded assistants that perform on-the-fly translation, sentiment analysis, and intelligent shopping recommendations. These moves underscore the fact that the next frontier in browser innovation is not new layout designs or performance benchmarks, but the depth and responsiveness of integrated AI capabilities.

As Perplexity, OpenAI, and their competitors vie for a share of browser mindshare, the ultimate question becomes not which homepage we set but which AI collaborator we choose to navigate the web. In this unfolding chapter of digital exploration, browsers will no longer be defined by tabs and toolbars, but by the intelligence they bring to each click and keystroke. For developers, content creators, and policymakers alike, the mission is clear: to harness this wave of AI-driven browsing in ways that maximize efficiency, uphold trust, and safeguard the open ethos of the internet.

🔎 AI Research

VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents

Salesforce Research
Builds on VLM2Vec by introducing a unified embedding space that effectively aligns video, image, and document representations through a novel contrastive loss and joint cross-modal attention modules. Demonstrates state-of-the-art results on video-text retrieval (YouCook2, MSR-VTT), image-text retrieval (MSCOCO, Flickr30K), and visual document understanding (DocVQA) benchmarks while improving computational efficiency.

Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

SAP Labs
Introduces a two-stage fine-tuning framework—first on synthetic, tool-augmented data to teach correct tool invocation, then on human-annotated dialogues to model realistic, disambiguation-driven interactions. Achieves a 69 percent reduction in harmful or hallucinated API calls compared to standard instruction tuning, while improving user-observed task success by 33 percent.

LitBench: A Benchmark and Dataset for Reliable Evaluation of Creative Writing

Stanford University
Presents LitBench, the first benchmark for evaluating LLM-generated creative writing, featuring 2,480 debiased, human-labeled test pairs across four literary genres and a 43,827-pair training corpus of human preference labels. Includes guidelines to reduce annotator bias and demonstrates that current LLMs lag significantly behind human writers, highlighting directions for future model improvements.

Evaluating Large Language Models Trained on Code

OpenAI
Systematically assesses four code-focused LLMs fine-tuned on Python, JavaScript, Java, and Go, revealing that code models excel at automated code write-and-explain tasks but underperform compared to general LLMs on code summarization and reasoning benchmarks. Finds that specialized models benefit most from chain-of-thought prompting in reasoning-intensive tasks, informing best practices for code model deployment.

MedGemma Technical Report

Google Research & Google DeepMind
Introduces MedGemma, a suite of open, medically-tuned vision-language models built on the Gemma 3 architecture, with a 4 B-parameter multimodal variant and a 27 B-parameter text-only variant. Demonstrates strong zero-shot and fine-tuned performance across 25 medical benchmarks—including radiology report generation, image classification, and EHR question answering—while maintaining competitive general-purpose capabilities.

🤖 AI Tech Releases

Grok4

xAI released Grok4, and the results are quite impressive.

Comet

Perplexity launched Comet, its AI-first web browser.

Pin-4 Flash Reasoning

Microsoft released Phi-4-mini-flash-reasoning, a reasoning LLM optimized for inference speed.

Open Model Architecture

The LMSys research lab released Open Model Architecture(OME), a new Kubernetes platform with models as first-class components.

Reachy Mini

Hugging Face launched Richi Mini, an open source robot design for human-robt integration.

📡AI Radar

Google lands Windsurf CEO and team in $2.4B AI coding 'acquihire' – Google secures Windsurf’s tech and key executives including CEO Varun Mohan in a $2.4 billion licensing and hiring deal to bolster DeepMind’s Gemini agentic coding efforts, after OpenAI's planned $3 billion acquisition fell through.
Microsoft’s Reports $500M Saved Right After 9K Layoffs – Microsoft reveals over $500 million in AI‑driven savings across departments just days after cutting about 9,000 jobs. TechCrunch link (TechCrunch, TechCrunch)
OpenAI to Drop AI‑Powered Web Browser Soon – OpenAI is set to launch its own AI‑centric web browser in the coming weeks to rival Chrome. TechCrunch link (TechCrunch)
ZeroEntropy Scores $4.2M to Build the Future of AI Search– A YC‑backed founder secures $4.2 million for ZeroEntropy, aiming to layer smarter AI‑powered search using RAG.
LangChain Nears Unicorn Status with $1B Valuation Round – Sources say AI toolmaker LangChain is raising a round valuing it at around $1 billion led by IVP.
French AI Star Mistral Eyes $1 Billion Raise from Global Investors – Paris‑based Mistral is reportedly in talks to raise a $1 billion equity round backed by MGX Fund and others.
Replit Picks Microsoft Over Google in Strategic Cloud Partnership – Replit partners with Microsoft, boosting its presence in Azure despite continuing support for Google Cloud.
CoreWeave & Core Scientific Unite in $9B All‑Stock AI Deal – CoreWeave acquires data‑center provider Core Scientific in a $9 billion all‑stock merger to scale AI infrastructure.
AWS to launch AI Agent marketplace alongside Anthropic – Amazon Web Services is set to unveil a dedicated AI agent marketplace next week at its New York Summit, enabling startups like Anthropic to offer AI agents directly to AWS customers through a curated platform.
Knox raises $6.5M to challenge Palantir in federal compliance – Knox has secured $6.5 million in seed funding to accelerate its platform that helps SaaS companies navigate FedRAMP compliance in under three months, positioning itself as a challenger to Palantir’s dominant FedStart offering.
Sarah Smith launches $16M solo GP fund fueled by AI– VC Sarah Smith closes a $16 million fund and asserts AI empowers solo general partners like herself to scale operations and deliver value up to 10× faster, transforming early-stage investing.

#000, #2025, #Acquisition, #Agent, #Agents, #Ai, #AiAgent, #AIAGENTS, #AiAssistant, #AICoding, #AIInfrastructure, #AISearch, #Amazon, #AmazonWebServices, #Amp, #Analysis, #Anthropic, #API, #Architecture, #Art, #Assistants, #Attention, #AWS, #Azure, #Bard, #Benchmark, #Benchmarks, #Bias, #Billion, #Browser, #CEO, #ChainOfThoughtPrompting, #Challenge, #ChatGPT, #Chrome, #Chromium, #Cloud, #Cockpit, #Code, #Coding, #Comet, #Companies, #Compliance, #Content, #ContentCreators, #Creators, #Cutting, #Data, #Deal, #Deals, #DeepMind, #Deployment, #Design, #Developers, #Dialogue, #Double, #EARLY, #Edge, #Editorial, #Efficiency, #Engine, #Enterprise, #Equity, #Evaluation, #Excel, #Executives, #Extensions, #Eyes, #Features, #Federal, #Flash, #Framework, #Full, #Funding, #Future, #FutureOfAI, #Gemini, #Gemma, #Gemma3, #Generative, #Global, #Google, #GoogleCloud, #GPT, #Guidelines, #Hiring, #Human, #Ideas, #ImageClassification, #Images, #Inference, #Infrastructure, #Innovation, #Integration, #Integrations, #Intelligence, #Internet, #Investing, #It, #Java, #JavaScript, #Jobs, #Kubernetes, #Labels, #Langchain, #Language, #LanguageModels, #LargeLanguageModels, #Layoffs, #Layout, #LED, #Legal, #LESS, #Link, #Literature, #Llm, #LLMs, #Lmsys, #MarketAnalysis, #Medgemma, #Medical, #Mgx, #Microsoft, #MicrosoftEdge, #Mission, #Mistral, #Model, #Models, #Multimodal, #Natural, #NaturalLanguage, #Navigation, #NewYork, #Ome, #OpenSource, #Openai, #Operations, #OPINION, #PAID, #Paper, #Parameter, #Partners, #Partnership, #Performance, #Perplexity, #PHI, #Phi4, #Pin, #Platform, #Platforms, #Positioning, #Productivity, #Professional, #Prompting, #Python, #Radar, #Radiology, #RAG, #Raise, #RealTime, #Reasoning, #Reduction, #Report, #Reports, #Research, #Reviews, #Robot, #SaaS, #Savings, #Scale, #Scientific, #Scores, #Search, #SentimentAnalysis, #Sidebar, #SmarterAI, #Space, #Speed, #Startups, #Success, #Synthesis, #Tech, #Test, #Text, #Time, #Tool, #Training, #Translation, #Trust, #Tuning, #Unified, #Us, #VC, #Vendors, #Video, #Videos, #Vision, #VisionLanguage, #Wave, #Web, #WebBrowser, #Work, #Workflows, #Writing, #ZeroShot

Published on The Digital Insider at https://is.gd/zUKyvb.

Julio Marchi © Speaks Out Network

Search This Blog