--- Summary:

  • Article Software Products in The Age of Big Coding AIs Last Sunday, I shared with my team at Liquid a more extensive version of this article on how we should think about building software and products in the age of powerful coding AI agents.
  • The world of software engineering and products is undergoing a significant shift, with AI agents becoming more and more capable of completing coding jobs autonomously.
  • We must understand this shift from first principles as creators, curators, designers, programmers, and product owners to build software products that matter in the age of god-like AIs.
  • Since 2020, we have gone from coding language models that auto-complete in IDEs, to models that write functions from descriptions, to Claude Code (and its alikes from Cursor, Cognition, Replit, and Codex), allowing for big AI models to act as agents co-developing and iterating over software together with human operators.

--- Full Article:

Article

Conversation

Image 1: Image

Software Products in The Age of Big Coding AIs

Last Sunday, I shared with my team at Liquid a more extensive version of this article on how we should think about building software and products in the age of powerful coding AI agents. I hope you find it useful too.

The world of software engineering and products is undergoing a significant shift, with AI agents becoming more and more capable of completing coding jobs autonomously. We must understand this shift from first principles as creators, curators, designers, programmers, and product owners to build software products that matter in the age of god-like AIs.

Since 2020, we have gone from coding language models that auto-complete in IDEs, to models that write functions from descriptions, to Claude Code (and its alikes from Cursor, Cognition, Replit, and Codex), allowing for big AI models to act as agents co-developing and iterating over software together with human operators.

Almost all coding agents/products above are reactive with the following user journey for solving increasingly complex problems: The operator describes the task → LLMs try to provide an answer and come back to the user → user follows up. repeat.

Where are the big coding AIs going? With the new wave of big AI releases (Opus 4.6, GPT 5.3 codex), there is a significant focus on the full autonomy of the agents, where a full-blown complex project can be described by the user, and the AI tries to zero-shot the whole project. Nicholas’s blog is a must-read if you wanna dig deeper [

].

The way the big AIs are doing it is either by breaking the problem into many sub-problems for many subagents with access to tools/skills/memory to solve within their context limit, or by using recursive language models (

) to treat their limited context length preciously and use recursive allocation of tasks and read/writes from/to memory to handle tasks with a theoretically infinite context length.

One layer higher and significantly noiser, we have the Clawbook, and OpenClaw, which I think Jack Clark’s (Anthropic co-founder) take is something that describes it the closest to how I see this evolve [

].

As a secondary effect, this trend strongly impacts the nature of SaaS, from UI for human operators to agents operating software through 1) self-built services, 1) interacting with third-party services, and 2) APIs (if enterprises still provide them).

Modularization of our software products and software platforms is an absolute key to coping with this new evolution. The pricing and value are also changing from seat-based licensing to consumption/outcomes, but more on this point at a later time.

Software and Product Design Given the above paradigm shift with an increasing availability of autonomous coding agents, as software developers, ML engineers, and product builders, we should ask:

What has to change in the way we work as a tech organization? What should an AI product look like, given the existence of these oracles around us? How does an AI product (in Liquid’s case, small hyper-specialized foundation models, LEAP, and Apollo) look like? For whom are we building software/products/platforms for?

These are fundamental questions that every software startup should be asking and have a good roadmap for, but here is my take:

AI will be writing most of our code. We will inevitably and perhaps more extensively than ever be using SoTA big coding AIs in our day-to-day. But we will make sure to understand the limitations and strengths of them as they evolve.

Coding is becoming orchestration + testing + verification: Agents will write more of the code, but we will choose what to build, define constraints, and prove it works in the real world. The bulk of human programming is gonna be the design of executable PRDs and example implementations, clear test and feedback loops.

Being aware of the BIG SLOP. As of today, the AI-generated code is admittedly sloppy and inefficient if your problem goes beyond frontend design. You must carefully review your code. We should make sure we do not push inefficient code to our repos.

To optimize better and converge faster with AI agents, at the start of your brainstorming sessions, provide examples of efficient code.

AIs today tend to strictly follow instructions for the task at hand, and sometimes they tend to ignore or even overwrite previous features they wrote. In complex and multi-stage prototyping projects, make sure you have instructions that are chained clearly to protect previously implemented features.

Model evaluation discipline. Robustify our evaluation pipelines to the point that we ensure the gap between demos and PoCs and production runs is minimal.

What has to change in how we work? We will have to radically change our internal engineering culture to an agent-era production to be able to maximally utilize the power of autonomous agents. Allow me to explain what I mean:

  1. Turn every tech spec and PRD into code. This means for every project, the contracts, scope, tool definitions, interfaces, budget, tests, evals, benchmarks, prod policy checks, and hardware runtimes, there should be an executable version of a PRD and not just notes. Let me give an example:
  • Tech spec (for humans): Audio function calling must respond <100ms p95 Snapdragon X/ARM laptop class.

  • Code version (for machines): a. A benchmark harness runs on each target device. b. CI fails if p95 > 100ms

  1. Tests and evals are the main focus. Treat harnesses as product infrastructure. 3. Agents are “time-blind,” as Nicholas says, and are context-limited. The build/test output, logs, and docs must be structured for machine parsing. 4. Have a unified format for code performance reviews to deal with inefficient code.

What should we build and for whom? When designing software products, we must always have the progression of big AI capabilities in mind. If your product is mainly prompt templates, a generic planner and orchestrator, a thin wrapper around tool-calling, a nicer UI for running an agent, it is going to be at a high disruption risk. It is important to build products to support simultaneously three users: Agents, human operators, and the client stakeholder. In the era of big AIs, I think the following products are worth building: 1. Tools that allow verifying the intermediate outputs and decisions of AIs. This can range from agentic evals, dynamic test pipelines, to simulation environments + golden datasets. In general, we need software that enhances our confidence and trust in the AI execution. 2. Tools that help enterprises transform their internal processes and data pipelines to become agent-parsable. Legacy enterprises do not have the infrastructure necessary for a continuously improving agentic AI layer. There are many opportunities here to explore. 3. software for constraint-native compute, device AI, where physics and regulations matter: Low-latency, air-gapped runtimes, hardware-aware benchmarking, fleet rollout/rollback, device policy, and telemetry. 4. AI for science software: specialized foundation models for predictive medicine, drug discovery, frontiers of math and physics, energy, and material science. 5. Human in the loop software: products that are deliberately designed to keep human experts, validators, and operators in the loop. This could be data, RL, and simulation environments for reliability enhancement of AI systems in decision-critical applications. As a foundation model company, we at Liquid go one layer deeper and think about core capabilities that are essential to power the product classes above. For instance, extending the context length of models, verifiability of foundation models, adding various data modalities, memory management, choice of architecture, and meta-algorithms that allow us to explore fundamental limitations of human- and agent- curated data, evals, and environments.

Looking forward In an agent era, the value of a software platform like our own edge AI platform, LEAP, is to construct it with verifiable capability modules. This includes contracts, harnesses, evals, benchmarks, and production-grade rollouts that agents can reliably call, test, and compose. We should build software products for agents, humans, and stakeholders simultaneously, and make proof in production the primary artifact.