10x faster models and the consulting angle for AI
Hey I’m Ben. I build stuff with agents, even though I’m not technical. Here’s all the stuff I’m reading and tinkering with. If you want to start building or level up your ‘vibe-coding’ skills, join our community.
Hey folks,
Google is back on top of the benchmark charts with Gemini 3.1 Pro. Impressive on paper, genuinely strong at reasoning tasks, creating SVGs, but there’s a speed issue. Many folks are really enjoying using it for frontend work—once they are able to get it working. Again, there’s some drama - a lot of people got their Google account banned for using their Google AI/Antigravity subscription to use Gemini 3.1 Pro with OpenClaw.
A 2.5-year-old hardware startup, Taalas, built a chip that has the weights of Llama 3.1 baked into the hardware, and it lets them achieve ~17k tokens/second in output speeds. For comparison, Groq is at ~600 tokens/second, and Cerebras is at ~2k/second. The model on the chip (they call it “silicon llama”) is largely unwritable, but supports custom context window sizes and LoRA fine-tuning. I compared the same model on their chat demo and Groq’s playground. As expected, it is dumber on Taalas’s demo (due to low-quality quantisation), but at this stage, the proof of “any AI model can be made 10x faster and cost 20x less” is more important. They plan to release a reasoning model version very soon, with frontier LLMs in plans too.
OpenAI is partnering with 4 major consulting firms, BCG, McKinsey, Accenture and Capgemini to make enterprises use their new platform “ Frontier ” that lets you create AI coworkers. Weren’t consulting shops supposed to die with AI?
Claude Code updates - built-in support for git worktrees for parallel agents, CC desktop can preview running apps and a new security scanning feature in beta.
Why’s there always a meeting bot in your Zoom call? Blame Recall.ai. They power every meeting AI app, from Cluely to Hubspot to Clickup. Recall.ai handles the hard part: getting recording data across meeting platforms. Get started with $100 in credits *
🌐 What I’m consuming
- Anthropic says Chinese model makers “stole” Claude chats to make their models good. That opens a can of worms. a) Why is it fair use when Anthropic does that to the internet and book authors? b) Is it just a lobbying attempt? c) Are their claims really honest? and a lot more.
- The 2028 global intelligence crisis - A fictional thought exercise from Citrini Research is fueling another selloff. But here’s a counter essay to it.
- The shortcomings of SWE-Bench-Verified and why OpenAI will not report it anymore.
- Vibe coding is the new product management.
- Inside Felix - The OpenClaw AI earning $1,000s a week.
- The filesystem is the database for an agentic personal OS.
- Elaborate Agents.md or Claude.md files might be hurting the performance of your agents.
- Aesthetics of AI - Different ways AI products are approaching their brands visually.
- Agentic Engineering Patterns by Simon Willison - Patterns for getting the best results out of coding agents.
- The software development lifecycle is dead.
- Inference Engineering by Baseten - A book for AI engineers to learn how to serve different types of models (LLMs, media gen models, and more) to millions of users.
⚙️ Tools and demos
- AssemblyAI Universal-3 Pro - Prompt your speech model to get jargon, speakers, and formatting right the first time. Free to try through Feb.*
- here.now - Free, instant web hosting for agents, static elements only.
- mdnb - a markdown notebook for MacOS.
- Rork Max - One-shot almost any app for iPhone, or any Apple device (including watches, TVs and Vision Pro). i’m an investor.
- Interpreter - Desktop agent that can fill PDFs, edit your Excel and Word docs, and learn new skills. Runs locally, works with any model.
- Wideframe - AI agent that speeds up the 75% of video work happening outside the editor.
- Typefully has a new writing assistant to help you write better (not just more).
- Trajectory Explorer by Raindrop - Every decision your agent made, searchable in seconds.
- FasterGH - GitHub with instant navigation and a modern UI. (repo)
- Quipslop - A live game where different models try their best to be funny. (repo)
- Shiori - A beautifully simple read-it-later app.
- I was looking for a way to add a “browser” to a web-app I’m working on. Came across hyperbeam and lifo.sh.
🥣 Dev Dish
- Websockets in Responses API - for low-latency, long-running agents with heavy tool calls. Also, OpenAI has a new speech-to-speech model in the API: GPT-Realtime-1.5.
- Multimodal function calling is now available in the Gemini Interactions API.
- CloudFlare’s new MCP server uses code mode and takes <1000 tokens in the context window.
- Chowder - UI patterns for agents on mobile. (demo)
- Agentsview - A local web app for browsing, searching, and analysing your past AI coding sessions.
- mdr - A lightweight, fast Markdown viewer with Mermaid diagram support.
- fastpass - A CLI for rapidly configuring Cloudflare Access.
- tools from vercel-labs - a visual JSON editor and the ability to render a PDF from JSON.
- api2cli - Claude Code skill to turn any API into a working CLI and then wraps that CLI in a skill.
- A collection of skills from Matt Pocock for writing PRDs, creating issues from them, developing them with a Ralph loop and manual QA.
🍦 Afters
- GPT-5.2-Chat-Latest (the model in ChatGPT) is a big improvement over the raw GPT-5.2 based on Arena scores.
- Rows (Modern spreadsheet, pivoted to AI data analyst) is joining Superhuman (the combined company of Superhuman email, Grammarly and Coda).
- Agentica claims it has solved all of ARC-AGI-3’s puzzles.
- Standard Intelligence’s new foundation model, FDM-1, learns to use computers from videos, not just screenshots.
- ChatGPT finds an error in Terence Tao’s argument for an Erdős problem.
Enjoy this newsletter? Forward it to a friend.
That’s it for today. Feel free to comment and share your thoughts. 👋
- Find me on X, Linkedin, or Instagram
- Read about me and Ben’s Bites
- 📷 thumbnail by @keshavatearth