From vibe coding to agentic engineering

--- Summary:

AI engineering is evolving from autocomplete to autonomous software factories. GovTech Singapore applies risk-tiered governance, ensuring critical systems remain human-supervised while low-risk projects move toward autonomous agent networks.
The primary bottleneck has shifted from engineering capacity to decision speed. Procurement and compliance must be redesigned for machine-speed delivery; organizational inertia becomes the main constraint as agents generate code faster than humans can approve it.
The profession is shifting toward “supervisory engineering,” where the focus is on writing precise specifications and verifying AI outputs. Junior engineers often adapt natively, while seniors must move beyond using AI for simple lookups.
Code is becoming ephemeral and regenerable, making specifications and decision history the true durable artefacts. This shift requires new audit practices to preserve institutional knowledge and accountability when source code is disposable.
Opinionated platforms are essential for security at scale. By encoding policies into infrastructure and deployment pipelines, organizations create a “fast but safe path” where compliance happens automatically rather than through manual checklists.
The falling cost of custom software disrupts traditional vendor and SaaS models. Agencies may “vibe code” tailored alternatives to expensive SaaS subscriptions, forcing vendors to offer specialized expertise instead of mere engineering headcount.

--- Full Article:

How AI is reshaping software engineering in the Singapore government, and what comes next

44 min read

1 day ago

Press enter or click to view image in full size

Generated by Nano Banana 2 from an original photo by Marvin Meyer on Unsplash

I’ve been having a lot of conversations lately. With engineers, with leaders in the industry, with people building tools that didn’t exist six months ago. And one thing has become very clear to me. AI isn’t just changing software engineering. It’s reshaping the whole thing, from how we write code to how we think about what code even is.

This isn’t a future thing. It’s happening right now, and it’s moving faster than most of us can keep up with.

At GovTech Singapore, we’ve been thinking hard about this. We put together an AI Strategy for Software Engineering earlier this year, and I’ve been actively engaging with technologists in Singapore and around the world, contributing a public sector perspective to an industry that honestly has more open questions than settled answers. What I’ve seen is a field in profound transition. Tools are evolving faster than organisations can adapt. The role of the engineer is being redefined. Old certainties about how software is built, reviewed and governed? They don’t hold anymore.

For government, this is both an extraordinary opportunity and a serious challenge. And the response isn’t just about adopting new tools. It’s about rethinking how organisations decide, govern and learn.

I’ve organised this article in two parts. The first covers the broader industry landscape as I’ve observed it. The second covers what we’re doing about it at GovTech.

The information in this article didn’t come from me alone, but through conversations within our internal teams and engagements with local and global technology leaders and practitioners in the industry.

The five levels of AI engineering

One of the more useful frameworks I’ve come across is Dan Shapiro’s five-level model for understanding where AI sits in software development. It helped me figure out where different teams are, think about trajectory, and make deliberate choices rather than just being swept along by the pace of change.

Level 1 is what some cheekily called spicy autocomplete. AI suggests code completions within the developer’s current context. The developer stays in control. This is where most organisations started, back when GitHub Copilot first appeared.

Level 2 is AI coding assistants. AI executes multi-step tasks across files and tools. Think Claude Code, Cursor, Windsurf. Most government agencies are currently transitioning here.

Level 3 is autonomous development agents. AI independently takes tickets from backlog to deployment. Humans shift to defining requirements and reviewing outputs. This is supervisory engineering. Very few organisations have actually reached this level.

Level 4 is collaborative agent networks, where multiple specialised agents work together on design, coding, testing and deployment. Humans orchestrate. This is largely theoretical but it’s the near-term horizon.

Level 5 is what you might call the software factory. Organisations describe desired business outcomes and entire systems emerge from agent collaboration. Humans focus on strategy and product vision. This is the theoretical endpoint.

As you probably realise, different parts of an organisation can and probably should operate at different levels simultaneously. Low-risk, easily rebuilt systems might accelerate toward Level 3 or 4, while critical national infrastructure should stay at Level 2 with stronger governance. The framework saves you from the trap of chasing the highest level uniformly. Each level jump requires fundamentally different governance, risk management and quality assurance. I’ve seen organisations try to use Level 1 governance frameworks with Level 3 capabilities. It doesn’t end well.

Before diving into the detail, here’s the summary of what’s shifting. Three changes stood out from every conversation I had, and they go well beyond engineering.

First, the bottleneck is inverting. Engineering capacity is no longer the constraint. Decision speed is. Governance, procurement and compliance designed for human-speed delivery will become the primary brake on value.

Second, code may not be the durable artefact anymore. Specifications, domain models and decision history may matter more than source code, which becomes regenerable and ephemeral.

Third, team structures are about to shrink dramatically. If one person with AI can reach proof-of-concept before any engineering resource is engaged, everything we assume about team composition, vendor engagement and project economics needs re-examination.

These three shifts recur throughout what follows, and they inform everything we’re doing at GovTech in response.

What I’ve been hearing from the front line

From my conversations both internally and externally, fifteen major themes kept coming up. Each of them resonated with my own experience, so I think they’re worth sharing.

The bottleneck has shifted, and nobody’s ready

This is probably the most fundamental insight I took away. The constraint isn’t engineering capacity anymore. Agents can produce work faster than humans can review it, faster than customers can absorb it, faster than organisations can adapt to it. One person I spoke with put it bluntly. “Humanity is not ready for this much software.” I think they’re right.

For decades, the common complaint has been that business stakeholders with product and feature needs outpace the engineers who build them. Backlogs grow. Priorities shift. Engineering becomes the bottleneck everyone plans around. That dynamic is about to reverse. When agents can generate working code in hours rather than weeks, engineering capacity stops being the scarce resource. What becomes scarce is the ability to decide what should be built, to review whether it’s right, and to absorb the consequences of shipping it.

For those of us in government, this inversion is especially acute. Procurement processes, approval chains and compliance requirements were all designed for human-speed delivery. They assume that building takes longer than deciding. When that assumption breaks, these processes don’t just slow things down. They become the primary constraint on value delivery. An agent can produce a working prototype overnight, but the approval to deploy it might take months. The work of software delivery is being automated, but the work of deciding what to build and whether it’s right has become the critical path.

This has implications that go beyond process reform. It means the organisations that will move fastest aren’t those with the best engineering teams. They’re those with the shortest distance between a decision and an action. Layers of review, committees that meet monthly, sign-off chains that span multiple departments, all these become the real bottleneck. Those who can decide quickly will outpace those who merely build quickly.

What is the artefact, if not code?

I found multiple conversations, especially amongst engineers, circling around a provocative question. If AI can regenerate code on demand, is code even the right thing to preserve? I heard arguments for specifications, domain models and decision history as potentially more durable artefacts. The story of why decisions were made may matter more than the final implementation. Code, in this view, becomes ephemeral, regenerable, and potentially not even worth reading.

Think about what this actually means in practice. Today, when an engineer leaves a team, the code they wrote stays behind. It’s the institutional memory. Future engineers read it, understand the system, and build on it. But if code can be regenerated from a sufficiently detailed specification, then the specification becomes the thing worth preserving, not the code. The code is just one possible expression of the spec, and perhaps not even the best one by the time someone needs to revisit it.

This sits uneasily with me. In government, audit trails, accountability and long-term stewardship are paramount. We maintain systems for decades. We need to explain why things work the way they do, sometimes years after the people who built them have moved on. If code becomes disposable, what exactly are we auditing? What are we preserving?

The answer could be that the conceptual model, not the codebase, may become the primary artefact of record. The specifications, the decision logs, the reasoning behind trade-offs, the history of what was tried and why it was rejected. These become the durable layer. Code becomes the output, regenerated as needed from that layer, possibly by a different model, in a different language, for a different platform.

However, there is a counter-argument worth taking seriously. If AI can generate specifications from code just as easily as it can generate code from specifications, why not keep code as the primary artefact and generate the spec as needed? Code, after all, is unambiguous. It compiles, it runs, it can be tested. Specifications are prone to drift, incompleteness and interpretation. A codebase is always a faithful representation of what the system actually does, even if it doesn’t explain why.

This is a genuinely open question, and I don’t think the answer is the same for every context. For systems that are rebuilt frequently, where the implementation is cheap and the reasoning is expensive, the specification and decision history are probably more durable. For long-lived systems where the implementation embodies hard-won edge cases and regulatory logic that no specification fully captures, the code may remain the better source of truth. In practice, we may need both, maintained in tandem, with AI keeping them in sync rather than forcing us to choose one over the other.

Either way, the implications for how we define, manage and preserve institutional knowledge are significant. We’ve spent decades optimising for code as the source of truth. Whether the source of truth shifts to specifications, stays with code, or becomes a living relationship between the two, much of our documentation practices, version control workflows and knowledge management systems will need to evolve.

Trust, care, and what’s lost in abstraction

As engineers increasingly rely on AI to navigate codebases they can’t fully understand, questions of trust and stewardship become acute. My own principle is that fully trusting an LLM’s answer is folly, but using LLMs to navigate toward an answer is wise. The distinction matters. An LLM can point you to the right file, surface a pattern you hadn’t considered, or draft a solution that gets you eighty percent of the way there. But treating that output as authoritative, without verification, without understanding, is where things go wrong. The value is in the navigation, not the destination it hands you.

There’s a deeper concern I keep coming back to and that is part of the value of pair programming comes from explaining things to your pair. That act solidifies the explainer’s own understanding. With AI as the pair, that forcing function disappears. You don’t need to explain your reasoning to an agent. It doesn’t ask why you chose one approach over another, or push back when your assumptions are shaky. The conversation is one-directional in a way that human pairing never is. Active knowledge decays without practice. If understanding implementation details becomes optional, we need to be very deliberate about how we maintain genuine expertise in our teams.

There’s also a more subtle form of erosion. When engineers stop reading code because an agent can summarise it, when they stop debugging because an agent can fix it, when they stop designing because an agent can propose an architecture, they don’t just lose specific knowledge. They lose the muscle memory of thinking through hard problems and the intuition that tells a senior engineer that something feels wrong before they can articulate why. That intuition is built through years of direct engagement with code. If we abstract that away entirely, we may produce engineers who are highly productive but brittle. Fast in normal conditions, helpless when something genuinely novel goes wrong.

I noticed this in myself recently. I was using Claude Code to work through a codebase I hadn’t touched in months. The agent navigated it faster than I could have, found the right files, made the changes, ran the tests. It was impressive. But afterwards I realised I had no real understanding of what had changed or why it worked. I’d approved the output without building any mental model of it. If something broke a week later, I’d have to start from scratch. That experience made the abstraction risk feel very concrete to me. The speed is real, but so is the erosion of understanding, and if we’re not careful, we’ll end up with teams that can ship fast but can’t explain what they shipped.

This doesn’t mean we should resist these tools. It means we need to design deliberately for the knowledge we want to retain. That might mean requiring engineers to write the specification before the agent writes the code. It might mean rotating engineers through periods of hands-on implementation alongside periods of agent-supervised work. It might mean treating code review not as a quality gate but as a learning exercise, the last remaining point where engineers are forced to engage deeply with what’s being built. Whatever the mechanism, the goal is the same. Ensure that speed doesn’t come at the cost of understanding, because understanding is what you need when speed alone isn’t enough.

Platforms as the enabling layer

Security, governance, agent experience, quality assurance. All of these need platforms to succeed. There’s a nice phrase I heard, the “fast but safe path.” Create guardrails so teams can move quickly within limits. Encode constraints into the platform itself rather than blocking progress with manual approvals. The platform becomes the mechanism for embedding practices like security, quality and governance that teams would otherwise skip.

The logic is straightforward. If you make compliance a manual checklist, teams will skip it under deadline pressure. If you embed compliance into the deployment pipeline so that it happens automatically, teams comply by default. The platform doesn’t ask for permission. It doesn’t rely on goodwill or discipline. It simply makes the right thing the easy thing, and the wrong thing hard or impossible.

Current government platforms are generally un-opinionated, if they exist at all. Very often, government systems depend on policy to enforce, rather than code. We publish guidelines, circulate circulars, and trust that teams will follow them. That works when the pace of delivery is slow and teams are small enough to oversee directly. It breaks down completely when AI is generating code at volume, when non-technical builders are deploying applications without engineering oversight, and when the number of systems in flight exceeds anyone’s ability to track manually. Policy that isn’t encoded into infrastructure is policy that will be ignored, not out of malice, but out of speed.

Platforms matter and we need to double down on them. In an AI-augmented world, the platform is not just infrastructure. It’s the primary mechanism through which an organisation exerts quality control, maintains security posture, and ensures governance at scale. Without opinionated platforms, we’re relying on humans to enforce standards that machines are generating faster than humans can review. That’s not a sustainable model.

How we think about agents matters

Here’s a core tension that hasn’t been settled. Should we treat agents as non-deterministic technical systems? Or as team members requiring onboarding and mentoring? Or somewhere between both?

This isn’t just philosophical navel-gazing. It determines governance models, investment decisions, and how the human role is defined.

If you treat agents as technical systems, you reach for familiar engineering controls. Inputs and outputs are validated. Behaviour is bounded by rules. Failures are handled through retries, circuit breakers and rollback mechanisms. The governance model looks like infrastructure management. Uptime, compliance, access control. This is comfortable territory for most engineering organisations, including ours.

If you treat agents as team members, the picture changes significantly. You think about onboarding them to your codebase and standards. You give them context about your organisation’s conventions and constraints. You evaluate their output the way you’d evaluate a colleague’s pull request, not just for correctness but for judgment. You invest in making them better over time through feedback loops rather than just configuration changes.

Most organisations, including ours, will probably land somewhere between both. But the point is that each framing leads to very different decisions. The “technical system” framing leads you to invest in guardrails, sandboxing and monitoring. The “team member” framing leads you to invest in context engineering, shared knowledge and progressive trust. The hybrid framing, which is where I think we’ll end up, means doing both simultaneously, and that’s genuinely hard to design for.

For government, this ambiguity has immediate practical consequences. How do we classify an agent’s access to sensitive code? Do we apply the same clearance and access frameworks we use for contractors? When an agent produces a security vulnerability, is that an infrastructure failure or a performance issue? These aren’t hypothetical questions but decisions we need to make now, and the answers depend entirely on which mental model we adopt.

Organisational readiness gates everything

AI amplifies existing conditions. It doesn’t create them. Strong teams become faster and dysfunctional teams become more chaotic. There’s a “you must be this tall for AI” threshold. Organisations need baseline maturity in common practices, cost attribution, organised business data and governance before AI platforms deliver value. Without the right conditions, AI makes things worse, not better. You have to prepare the system first.

I’ve seen this play out concretely. A team with clear coding standards, well-organised repositories, and disciplined documentation can hand an AI agent meaningful context and get meaningful output. A team with inconsistent practices, undocumented tribal knowledge and scattered codebases gets output that’s confidently wrong, and worse, wrong in ways that are hard to detect. AI doesn’t compensate for organisational dysfunction. It accelerates it. If your codebase is a mess, AI will produce more mess, faster.

The typical adoption pattern of making it work, making it reliable, then making it secure leaves organisations vulnerable during the early phases. “Make it work first” is fine for a weekend prototype. It’s dangerous when applied to government systems handling citizen data. Many organisations are deliberately staying “behind the leading edge” to reduce risk, and for government, where failure consequences are high, this cautious approach has merit. But it can’t become an excuse for inaction. The gap between doing nothing and doing something reckless is wide, and there’s plenty of room in that gap for deliberate, well-governed experimentation.

The readiness challenge also has a people dimension. Organisations where leadership genuinely understands the technology, even at a conceptual level, make better decisions about where and how to apply it. Organisations where leadership delegates AI strategy entirely to technical teams without engaging with the substance tend to either over-invest in hype or under-invest out of anxiety. Readiness isn’t just about technical infrastructure. It’s about institutional literacy.

The brownfield reality

There’s a dimension of readiness that deserves its own discussion, because it’s largely absent from the AI success stories circulating online. Most of those stories assume reasonably clean system architectures. Greenfield projects, modern stacks, well-defined boundaries between services. Our context is different. Some of our architectures are genuinely hard to reason about end to end, even for experienced engineers. Lambda-heavy event flows with hidden coupling. Brittle batch jobs that have evolved over years through incremental patches. Sprawling architectures where no single person understands the full dependency chain.

In these environments, rewriting code is not the hard part. Dealing with the downstream operational impact is. AI may generate syntactically correct code, but it does not fully understand hidden dependencies, production quirks, or the institutional knowledge that lives in people’s heads rather than in documentation. A change that looks correct in isolation can cascade through event-driven systems in ways that are difficult to predict and painful to debug. If humans already struggle to understand these systems, and we layer AI-driven rewrites on top without first improving our understanding of the system itself, we could unintentionally amplify fragility rather than reduce it.

This isn’t an excuse to hold back. We still need to move. But it does mean that AI maturity can be gated by system maturity. Before you can safely apply AI to a legacy system, you need to understand that system well enough to evaluate the AI’s output. That means investing in observability, dependency mapping, and documentation. The unglamorous work of making existing systems legible has to happen before or alongside the more visible work of deploying AI coding tools. This is precisely why our Graphiqode workstream matters so much. You can’t safely modernise what you can’t see.

The human role is being redefined, but not yet designed

The shift is from traditional software engineering to what people are calling supervisory engineering. That means directing, evaluating and correcting AI outputs, writing specifications and acceptance criteria, and exercising judgment about what to build. This requires new skills that many current engineers don’t have, and there are no clear pathways to acquire them yet.

What should a staff engineer spend their time on? The role is clearly shifting. Nobody has fully designed what it shifts to. Today, a staff engineer’s value comes from deep technical knowledge, the ability to make sound architectural decisions, and the credibility to guide teams through complex implementation. In a world where agents handle much of the implementation, that value proposition doesn’t disappear, but it migrates. The staff engineer’s job becomes less about knowing how to build something and more about knowing what should be built, whether the agent’s output is trustworthy, and where the real risks lie.

This is a harder role, not an easier one. It requires a broader range of judgment. You need to understand systems at the architectural level, evaluate AI output for subtle correctness issues that tests might not catch, reason about security and compliance implications, and make trade-off decisions that balance speed against risk. You also need to do all of this while managing the cognitive load of supervising multiple agents working in parallel, any one of which might be producing plausible but flawed output.

Unfortunately we don’t yet have training programmes, career ladders, or even job descriptions that reflect this new reality. Universities are still teaching people to write code. Professional development frameworks are still organised around implementation skills. The gap between what the role demands and what we’re preparing people for is widening, and it will take deliberate effort to close it.

Infrastructure and standards uncertainty

The field is experiencing unprecedented volatility. Innovation cycles are compressing from years to days, creating decision paralysis and security vulnerabilities. Large enterprises are constrained by liability concerns and the unanswered question of who’s actually liable when things go wrong. When an AI agent introduces a vulnerability into production code, is the liability with the developer who approved it, the organisation that deployed the tool, or the vendor that built the model? Nobody has a clear answer, and until they do, large organisations will move cautiously.

There’s also the question of sovereign AI models, energy and compute constraints affecting token economics, and the fact that nobody agrees on what “AI-assisted” versus “AI-native” versus “AI-first” even means. These aren’t just semantic differences. An “AI-assisted” team uses AI tools within existing workflows. An “AI-native” team redesigns workflows around AI capabilities. An “AI-first” organisation rebuilds its entire operating model on the assumption that AI is the primary producer of code. Each requires fundamentally different investment, governance and talent strategies. Teams are experimenting without clear intent or alignment on which of these they’re actually pursuing.

For government, the volatility creates a particular challenge. We need to make platform and tooling decisions that will hold for years, in an environment where the leading tools change every few months. Committing too early risks lock-in to approaches that quickly become obsolete. Waiting too long risks falling so far behind that catching up becomes prohibitively expensive. The right posture, I think, is to invest in abstractions and platforms that are tool-agnostic where possible, while making deliberate, reversible bets on specific tools where the productivity gains justify the risk.

New primitives for agent-based systems

Traditional software primitives like APIs, databases and deployment pipelines are being supplemented by new ones designed for agent interaction. These include tool registries that let agents discover and use capabilities dynamically, context protocols that let agents share state and reasoning, permission and trust frameworks that govern what agents can do autonomously versus what requires human approval, and memory and learning systems that let agents improve from experience.

These aren’t just infrastructure details. They represent a fundamental shift in how software systems are composed and operated. For decades, we’ve designed systems for human developers to build and human operators to run. The interfaces, the documentation, the error messages, the deployment workflows, all of it assumes a human in the loop. Agent-based systems need a different set of primitives because agents interact with systems differently. They don’t read documentation the way humans do and they don’t navigate UIs. Instead, they need machine-readable descriptions of capabilities, structured context about the systems they’re working with, and clear boundaries defining what they can and cannot do.

If you’re building platforms today, you need to be thinking about agent experience the same way you think about developer experience. A platform that’s easy for humans to use but opaque to agents will become a bottleneck as agent-driven workflows become the norm. This means API design needs to account for agent consumption, not just human consumption. System metadata needs to be rich enough for agents to reason about dependencies and impacts. And permission models need to be granular enough to give agents appropriate autonomy without exposing systems to unacceptable risk.

This is new territory for most organisations, including ours. We’re still building for developer experience. The shift to agent experience is coming, and the organisations that start designing for it now will have a significant advantage when agent-driven workflows become the default mode of operation.

Learning systems and organisational memory

Individual AI interactions need to evolve into systemic organisational learning loops, capturing reasoning, failures and context. Right now, most AI usage in engineering is stateless. An engineer prompts an agent, gets a result, and the interaction is forgotten. The next engineer working on the same system starts from scratch. All the context that was built up in that first interaction, the dead ends explored, the constraints discovered, the trade-offs weighed, is lost.

There are some key architectural decisions here. Choosing knowledge graphs over markdown files, because relationships between concepts matter as much as the concepts themselves. Treating the story as data, because the history of iterations and decisions is more valuable than the final code. Keeping knowledge source-centric so information stays close to its source to avoid sync issues. And adopting a pull rather than push model so agents discover information at runtime rather than having rules pushed into prompts.

There’s a nice principle someone called the alien intelligence principle. Agents don’t learn like humans. We’re tempted to teach them the way we’d teach a junior engineer, front-loading instructions and best practices. But that approach has diminishing returns with stochastic systems. Rather than over-engineering instructions upfront, let agents fail, then build the specific skills identified by those failures. This way, the failures are the signal. They tell you what the agent actually needs to know, as opposed to what you assumed it needed to know.

For government, the learning systems challenge has an additional dimension. We need organisational memory that persists across leadership changes and team turnover. The institutional knowledge embedded in a senior engineer’s head is valuable but fragile. If we can capture the reasoning and context behind engineering decisions in structured, machine-readable formats, we create institutional memory that’s both more durable and more accessible than what exists today. The agents of the future shouldn’t just build software. They should understand why the software was built the way it was.

Code review’s uncertain future

Tiering software by risk profile, with review rigour proportional to rebuild cost and failure impact, is emerging as the dominant framework. For high-risk code, people are discussing formal verification, specification-checking and test-centric review. The field hasn’t converged on a single answer, but the movement away from universal line-by-line review is clear.

The logic is compelling. Line-by-line code review made sense when humans wrote every line and the volume was manageable. When AI is generating thousands of lines per day, human line-by-line review becomes physically impossible without either slowing delivery to a crawl or expanding review teams to unsustainable sizes. Something has to give.

The central design challenge is this. What has to be in place before you can confidently push large AI-generated changesets without traditional review? The emerging answer involves a combination of comprehensive automated testing, specification-level review where humans verify that the right thing was built rather than how it was built, formal verification for critical paths, and AI-assisted review tools that flag anomalies for human attention. None of these individually replaces human review. Together, they might.

There’s an intriguing parallel here. Could there be an “AI Development Centre” model, like offshore development centres, where AI agents handle routine development while human oversight focuses on specification review, architecture coherence, security and compliance? The offshore development model works because humans review outputs rather than supervising every keystroke. The same logic could apply to agent-generated code, where human engineers focus on what matters most and trust automated systems to handle the rest.

For government, the stakes in getting this wrong are high. Under-review leads to vulnerabilities in systems that handle citizen data. Over-review leads to bottlenecks that negate the productivity benefits of AI tools entirely. The tiered approach, where the rigour of review is proportional to the risk of the system, is the right direction. But calibrating those tiers, deciding which systems are truly low-risk and which only appear to be, is judgment-intensive work that we’re still learning how to do.

Business value beyond developer productivity

True platform value lies in enabling previously impossible business outcomes, not just faster coding. AI spend needs to be tied back to products so the business can decide whether token costs are worth the value delivered. This sounds obvious, but in practice most organisations are measuring AI adoption through developer productivity metrics like lines of code generated, time saved and pull requests merged. These are input metrics. They tell you how much activity is happening, not whether that activity is producing value.

For government, this means understanding what citizen outcomes AI-generated software is enabling, not just how many lines of code were produced. Did a new service reduce processing times for permit applications? Did a prototype tool help policy officers identify gaps in service delivery? Did an AI-built dashboard give decision-makers visibility they didn’t have before? These are the questions that matter. Near-term financial ROI is probably the wrong metric for government AI investment. The time horizons are too long, the benefits too diffuse, and the counterfactuals too hard to establish.

A portfolio approach makes more sense than traditional ROI calculations. Try many initiatives, back the ones that work, and tie wins to outcomes. Accept that some experiments will fail, and treat those failures as learning rather than waste. This is uncomfortable for government organisations accustomed to justifying every dollar of expenditure upfront, but it’s the only approach that works in an environment where nobody can predict which AI applications will deliver the most value until they’re tried.

Self-healing and the future of operational resiliency

Future systems will need to distinguish between self-healing (returning to homeostasis) and self-improving (actively evolving). The difference matters. Self-healing means a system detects a failure and restores itself to a known good state. Self-improving means a system detects a failure, diagnoses the root cause, and modifies itself to prevent recurrence. We’re comfortable with the first. The second introduces questions about autonomous change that most governance frameworks aren’t designed to handle.

You’ll need “time machine” capabilities for clean rollbacks without database corruption, incident commander agents to manage response, and unified ledgers aggregating disparate logs into a single source of truth. These aren’t luxury features. They’re prerequisites for trusting AI-operated systems in production. If an agent makes a change that breaks something, you need to be able to undo that change completely and cleanly, including any data state changes that occurred in the interim. Without that capability, autonomous operation is too risky for any system that matters.

There’s a concept I find particularly relevant for government, which is the “subconscious” knowledge graph. Imagine ingesting years of post-mortems into knowledge graphs, allowing agents to weigh historical hypotheses against real-time telemetry. It replicates the operational intuition that experienced engineers have built over decades, and preserves it institutionally when those engineers move on. Every government technology organisation I know has lost critical operational knowledge when senior engineers retired or moved on. If we can capture that knowledge in a form that agents can reason about, we make our systems more resilient not just technically, but institutionally.

The vision is systems that don’t just recover from failures but learn from them, accumulating operational wisdom over time in a way that no human team can match for consistency and recall. We’re not there yet. But the architectural foundations we lay now, the knowledge graphs, the unified logging, the structured post-mortem processes, will determine whether we get there in years or decades.

Changes to careers, roles and team structures

Perhaps the most consequential dimension of this transition is its impact on careers, team structures and the experience of the people doing the work. The technology conversation is important, but if I’m honest, this is where I spend most of my time thinking. Tools can be adopted, platforms can be built, governance frameworks can be redesigned. But people are harder to change than systems, and the human dimension of this transition is where the real risk of getting it wrong lies.

Junior engineers, more viable than expected

Counter to the prevailing narrative about AI eliminating entry-level roles, I’m quite positive about junior engineers in general. The common argument is that if AI can do what junior engineers do, we won’t need junior engineers. I think this misreads the situation.

AI-native juniors carry potential advantages that are easy to underestimate. They need to prove themselves, which creates motivation to master new tools. They’re willing to make mistakes, which is essential in stochastic AI environments where perfect output isn’t guaranteed and experimentation is the path to quality. And they lack organisational baggage, with no preconceptions about the “right way” to build software. A junior engineer doesn’t have to unlearn twenty years of habits. They can adopt AI-native workflows as their default mode of working from day one.

There’s also something more fundamental at play. Junior engineers who grow up with AI tools develop a different kind of intuition. They learn to evaluate AI output rather than write everything from scratch. They learn to prompt effectively, to recognise when an agent is confidently wrong, and to navigate between multiple AI-generated options. These are the skills of supervisory engineering, and juniors are learning them natively in a way that many seniors are struggling to acquire.

Curiosity and broad knowledge may now be more valuable hiring signals than traditional coding proficiency. The engineer who can understand a business problem, frame it as a specification, evaluate whether an AI’s solution actually addresses it, and identify edge cases the agent missed. That engineer is valuable regardless of whether they can implement a red-black tree from memory. We need to update our hiring criteria accordingly.

The risk, of course, is that we produce a generation of engineers who can supervise AI but can’t function without it. That’s a real concern, and it needs to be addressed through deliberate training design. But the viability of junior engineers in an AI-augmented world is, I think, much higher than the prevailing narrative suggests.

Senior engineers, a pattern of resistance

Conversely, the broader sentiment around senior engineers has been more negative than I expected. The resistance to adoption doesn’t manifest as outright refusal but as using new tools to do familiar things rather than fundamentally rethinking approaches. Senior engineers who don’t adapt deeply risk becoming bottlenecks in AI-native workflows, slowing down the teams they’re meant to lead.

Get Sau Sheong’s stories in your inbox

Join Medium for free to get updates from this writer.

Remember me for faster sign in

Many years ago, in the earlier years of Java, I spent a long time persuading a C programmer to write in Java. When I finally managed to convince him, I realised that he simply continued writing C, but in Java syntax. He took none of the advantages of Java, only the disadvantages and ended up declaring Java is not good at all. This is the deja vu I’m feeling now.

As I try to convince engineers to use AI tools, they might just use these tools with the same processes they have always followed. They would gain none of the advantages but all the disadvantages of using those tools and this would convince them these tools only make them less productive.

The pattern is predictable. A senior engineer tries Claude Code or Copilot. They use it the way they use Stack Overflow, as a lookup tool for syntax they’ve forgotten or boilerplate they don’t want to type. They don’t restructure their workflow around it. They don’t rethink how they decompose problems. They don’t let the agent take a first pass at a whole feature while they focus on specification and review. They slot the tool into their existing process, find that it adds friction without transforming outcomes, and conclude that it’s overhyped.

This isn’t a failure of the tools. It’s a failure of adoption. The value of agentic coding tools doesn’t come from doing the same things slightly faster. It comes from doing fundamentally different things. Working at a higher level of abstraction, delegating implementation to agents, spending more time on specification and review and less on typing. That requires a willingness to let go of direct control that many senior engineers find deeply uncomfortable. Their expertise, their identity, their career trajectory. All of it has been built on the ability to implement. Asking them to stop implementing and start supervising feels like asking them to give up the thing that makes them valuable.

In other words, exposure to AI tools won’t drive transformation. The industry broadly hasn’t cracked how to change this. But I believe the answer lies in structured intervention rather than passive exposure. You can’t just hand someone a tool and expect them to reinvent their workflow. You need protected time for experimentation. You need peer examples from engineers they respect. You need leadership that models the new ways of working rather than just mandating them. And critically, you need to make it safe to be bad at something new. Senior engineers are used to being the expert in the room. Learning to work with AI tools means being a beginner again, and that’s a psychologically difficult transition that organisations need to actively support.

There’s a harder question lurking behind all of this. Some companies are now tying performance evaluation, and in some cases continued employment, to demonstrable AI adoption. It’s a forcing function. If structured intervention and peer examples don’t move the needle fast enough, do we eventually need to make AI proficiency a formal expectation of the role? I don’t think we’re there yet. Mandating tool usage before people understand why creates resentment rather than transformation. But I do think we need to be honest about the possibility that encouragement alone won’t be sufficient, and that at some point, the ability to work effectively with AI tools may need to become a baseline expectation for engineering roles, not a bonus.

The emerging skill of supervisory engineering

The new core competency is supervisory engineering. That means directing, evaluating and correcting AI outputs. Writing specifications and acceptance criteria. Reasoning at the level of systems and outcomes rather than lines of code. Exercising judgment about what to delegate to agents and what to reserve for human decision-making.

This is quite different from traditional engineering. A good implementer can be a poor supervisor, and vice versa. Implementation rewards depth. The ability to hold a complex system in your head and work through it methodically. Supervision rewards breadth. The ability to evaluate whether an agent’s output is correct, secure, performant, and aligned with requirements, often across multiple workstreams simultaneously. It requires a different kind of attention, less like a craftsman and more like an air traffic controller.

This is the skill AI can’t replace, and it’s not well taught by traditional computer science curricula or current professional development frameworks. Universities teach algorithms, data structures, and system design. They don’t teach how to write a specification precise enough for an agent to implement correctly, how to review a thousand-line changeset you didn’t write for subtle correctness issues, or how to decompose a project into tasks that are well-suited for agent delegation versus tasks that require human judgment. These are the skills that will define the next generation of effective engineers, and we have no established curriculum for them.

The burnout risk is real too. Managing multiple agents in parallel could significantly increase cognitive load and context switching. Imagine supervising three agents working on different features simultaneously, each producing output that needs review, each hitting blockers that need human intervention, each generating questions that require context you have to reconstruct from memory. The transition will be especially tough for engineers accustomed to deep, focused implementation work, where you enter a flow state and emerge hours later with something built. Supervisory engineering is inherently fragmented, and that fragmentation is cognitively expensive.

Organisations need to design for this reality rather than just assuming engineers will adapt. That means thinking carefully about how many agents an engineer can effectively supervise, building tooling that helps manage the cognitive overhead of multi-agent workflows, and creating team structures that allow engineers to balance supervisory work with periods of focused, deep work. The worst outcome would be to unlock the productivity gains of AI while burning out the people who are supposed to be guiding it.

Three shifts beyond engineering

I previewed these three shifts earlier, but having worked through the detail, I want to draw out their implications more concretely. These aren’t engineering problems that can be solved by engineering teams alone. They require responses from leadership, from procurement, from HR, from governance functions that have traditionally operated at arm’s length from the technology practice. That’s what makes them hard, and that’s what makes them important.

Decision speed becomes the bottleneck

Consider a concrete scenario. An engineering team uses AI agents to build a working prototype of a new citizen-facing service in two weeks. The prototype works. It’s been tested. It solves a real problem. Now it needs approval to go to production. The security review takes three weeks. The procurement approval for the hosting environment takes a month. The data governance review takes another two weeks. The compliance sign-off requires a committee that meets quarterly. By the time the service is approved, the technology landscape has shifted, the requirements have evolved, and the team has moved on to other priorities.

This isn’t a hypothetical. It’s the pattern I see emerging. The engineering work is no longer the long pole in the tent. The long pole is everything that wraps around the engineering work. The approvals, the reviews, the sign-offs, the committees. These processes were designed for a world where building took months and the approval cycle was a small fraction of the total delivery timeline. When building takes days, the approval cycle becomes the dominant cost.

The response can’t simply be to remove governance. Government needs governance. Citizens expect and deserve it. But we need governance that operates at a fundamentally different tempo. That might mean pre-approved deployment environments where applications that meet certain criteria can go live without case-by-case review. It might mean risk-tiered approval processes where low-risk applications follow a fast track while high-risk systems get full scrutiny. It might mean delegating approval authority closer to the teams doing the work, with periodic auditing rather than upfront gatekeeping. The specific mechanisms will vary, but the principle is the same. Governance needs to be redesigned for machine-speed delivery, or it becomes the thing that prevents machine-speed delivery from delivering any value at all.

The artefact may not be code

Our entire infrastructure for managing software, from version control systems to code review tools to CI/CD pipelines to static analysis, is built around the assumption that code is the primary artefact. If code becomes regenerable, if it can be rebuilt from a specification by a different agent using a different model in a different language, then the specification is what we should be versioning, reviewing and preserving. The code is just one rendering of it.

In a government context, this has implications for accountability and audit. Today, when something goes wrong, we can trace it back to a specific commit, a specific code change, a specific engineer who approved it. If code is being regenerated by agents from specifications, the accountability chain shifts. Who wrote the specification? Who approved the agent’s interpretation of it? Who verified that the regenerated code faithfully implements the intent? These are different questions from the ones our current audit frameworks are designed to answer.

Government systems can run for decades. If the primary artefact is a specification with its decision history rather than a codebase, we need preservation practices for specifications that are at least as rigorous as what we have for code. That means structured formats, versioning, and the ability to reconstruct the reasoning behind decisions long after the people who made them have moved on.

Team structures will change fundamentally

Today, a typical government digital project might start with a product manager, a designer, a tech lead, and several engineers. The team is assembled upfront based on an estimate of the engineering effort required. With AI, a product manager or designer can build a working prototype themselves, validate it with users, iterate on it, and only bring in engineers when the concept has been proven and needs to be hardened for production. The team forms around validated demand rather than anticipated demand.

This solo-to-scale model dramatically reduces the cost of experimentation. Instead of committing a full team to an unproven idea, you commit a single person with AI tools. If the idea doesn’t work, the sunk cost is days, not months. If it does work, you scale the team to match the validated need. This is how startups have always operated, but AI makes it feasible within large organisations that have traditionally required full team mobilisation to get anything built.

For government HR and workforce planning, this is a significant disruption. How do you plan headcount when a team of three with AI tools might deliver what previously required fifteen? How do you structure career progression when the path from individual contributor to team lead assumes teams of a certain size? How do you run procurement when the build-versus-buy calculation has fundamentally changed? These aren’t questions the technology function can answer alone. They require engagement from across the organisation, and the sooner that engagement begins, the better positioned we’ll be.

Vendor and SaaS disruption

Traditional engagement models for both outsourced vendors and SaaS products are under pressure. Government needs to start reassessing them now, before circumstances force it. This isn’t about vendors or SaaS becoming irrelevant overnight. It’s about the assumptions underlying our engagement models being quietly undermined, and the risk of continuing to operate on assumptions that no longer hold.

The changing vendor landscape

The traditional value proposition of outsourced vendors, namely capacity, specialised skills and speed to market, is being challenged by agent orchestration that can deliver verified code at scale. For decades, the logic has been straightforward. You need more engineering capacity than you can hire internally, so you engage a vendor to supply it. The vendor brings people, the people write code, and you pay for the effort. The economics work because software engineering is labour-intensive, skilled engineers are scarce, and building a large internal team is slow and expensive.

AI is eroding every leg of that argument. If agents can generate code at scale, the labour intensity drops. If AI tools make individual engineers dramatically more productive, you need fewer of them. If a single person with AI tools can reach proof-of-concept independently, the traditional model of augmenting with vendor capacity for initial builds becomes less compelling. The vendor’s value proposition of “we’ll bring you twenty engineers” loses its force when your internal team of five, equipped with AI, can match that output.

This doesn’t mean vendors become obsolete. But the value they provide needs to shift. Vendors may need to move toward platform and tooling provision, providing the infrastructure and developer experience layers that organisations don’t want to build in-house. They may need to focus on highly specialised domain expertise that AI can’t replicate. Deep knowledge of regulatory frameworks, legacy system architectures, or industry-specific compliance requirements. And they may need to evolve into integration and orchestration services that span complex system landscapes, where the challenge isn’t building individual components but making disparate systems work together.

Government procurement frameworks should start to reflect this evolution. Today, most government IT procurement is structured around effort-based contracts like time and materials, or fixed-price for a defined scope of work. If the engineering effort for a given scope shrinks dramatically because of AI, these contract structures stop making sense. We may need to move toward outcome-based procurement, where vendors are paid for results rather than effort. We may need shorter contract cycles that reflect the pace of change in the tooling landscape. And we need evaluation criteria that assess a vendor’s ability to leverage AI effectively, not just their ability to supply headcount.

There’s also a timing dimension to this. Vendors who recognise the shift early and reposition themselves around AI-augmented delivery, platform services, and domain expertise will thrive. Vendors who continue to sell capacity as their primary offering will find that offering increasingly difficult to justify. Government, as one of the largest buyers of technology services in any economy, has both the opportunity and the responsibility to signal this shift through how it procures.

The SaaS question

The SaaS value proposition faces a similar challenge, though the dynamics are slightly different. The core logic of SaaS is that it’s cheaper to share a product across many customers than for each customer to build their own. The vendor amortises development costs across a large customer base, achieves economies of scale, and passes some of those savings on. The customer gets a product that would be too expensive to build alone, continuously maintained and updated, for a predictable subscription fee.

If AI can build a subset of features that meets an agency’s needs, why pay for a full SaaS subscription? Non-technical builders are already “vibe coding” dashboards and tools that would previously have warranted SaaS procurement. Forward-leaning startups are building their own CRMs, project management tools and analytics platforms. The marginal cost of custom software, long the justification for SaaS subscriptions, is falling rapidly.

The key phrase there is “subset of features.” Most organisations use only a fraction of what any SaaS product offers. They’re paying for the full product but using twenty percent of it. If AI can build that twenty percent as a custom tool tailored to their specific workflow, the value proposition of paying for the other eighty percent becomes hard to defend. The custom-built tool does exactly what you need, integrates with your existing systems in exactly the way you want, and doesn’t come with the compromises and workarounds that inevitably accompany adopting someone else’s product.

This doesn’t mean SaaS is obsolete. Complex, compliance-heavy or deeply integrated products, particularly those requiring continuous updates, regulatory alignment or network effects, still make sense. Nobody should be vibe coding their own identity management system or building a custom email platform. Products that depend on network effects, that require continuous security patching, that need to track evolving regulatory requirements across multiple jurisdictions. These remain firmly in SaaS territory. The maintenance burden alone would overwhelm any AI-generated alternative.

But the middle ground is shifting. The category of tools where SaaS was the obvious choice because building was too expensive is shrinking. Internal dashboards, workflow tools, data visualisation, simple CRUD applications, lightweight project tracking. These are all candidates for AI-built alternatives that cost a fraction of a SaaS subscription and fit the organisation’s needs more precisely.

For government procurement, this means every SaaS renewal should include a genuine assessment of whether the product could be partially or fully replaced by an AI-built alternative. Not because it always can, but because the default assumption that SaaS is always cheaper than building needs to be tested case by case. The calculation is changing, and procurement decisions should reflect that rather than defaulting to patterns established when custom software was prohibitively expensive.

From vibe coding to agentic engineering

We’ve framed our strategic response around a concept that’s rapidly becoming central to global discussions. Vibe coding describes the phenomenon of non-technical users like product managers, policy officers and designers building functional software through AI assistance with minimal traditional coding skills. In Singapore’s government context, this is already happening, with and without formal sanction.

This isn’t just about enabling non-technical builders. Product managers and designers can now add features rather than creating entirely new applications. This democratisation of building capability is a significant shift in team composition and value creation.

I saw this firsthand when a product manager in one of our teams showed me a working internal tool he’d built over a weekend using an AI coding assistant. It pulled real data, had a usable interface, and solved a genuine workflow problem his team had been waiting months to get engineering time for. My first reaction was excitement. My second was a knot in my stomach, because it had no security review, no tests, no lifecycle plan, and no one besides him knew it existed. That moment crystallised the whole challenge for me. The capability is extraordinary, but without guardrails it creates risk at exactly the speed it creates value.

Using the five-level framework, our engineering teams are currently operating at Level 2 and preparing for Level 3. But this strategy also addresses a different audience. For the growing community of non-technical builders, we’re extending Level 2 governance, quality assurance and software lifecycle management to people who are effectively operating at Level 1.

However, this isn’t a permanent strategy. Agent swarms, orchestration and collaborative agent networks will continue to evolve, fundamentally changing team dynamics well beyond the current “vibe” paradigm. Agentic engineering is a transitional phase leading toward agent orchestration, not an end state. We need to explicitly acknowledge this trajectory and start preparing for what comes after.

How we’re responding

Our AI strategy for software engineering isn’t just conceptual. It’s being actively built across several workstreams, each targeting a specific dimension of the challenge. To make the link between diagnosis and response explicit, each workstream below maps to one or more of the problems identified in the industry landscape, and each moves us toward a specific level in the five-level framework.

AI coding assistants

Addresses the decision speed bottleneck and human role redefinition. Moves us from Level 1 to solid Level 2, with foundations for Level 3.

The most visible workstream is deploying AI coding tools across our organisation and the broader Whole-of-Government estate. We initially tested IDE-based tools like GitHub Copilot, Windsurf and GitLab Duo in sandbox environments, and the data was encouraging. Copilot alone showed a doubling of merge request volume for high versus low AI usage. But as time goes by I realise that the industry has moved on. Claude Code has emerged as the dominant agentic coding tool in industry discussions. Traditional IDEs were largely absent from conversations, which suggests the terminal is re-emerging as the primary interface for software development, with IDEs needing fundamental reinvention.

We’re now focusing on two standard offerings for GovTech and eventually WOG, specifically Copilot and Claude Code. These are approved for GovTech use, and we’re proposing credits of $* per engineer per month to ensure our people have the tools they need. The preferred tool will be installed by default on developer devices, making it a key enabler for developer efficiency and team scale.

A key governance challenge has come up. AI coding tools are evolving faster than central approval cycles can accommodate. We’re proposing that AI coding-related approvals be delegated, with periodic reporting to maintain oversight without creating bottlenecks. It’s a microcosm of the broader tension between governance and the speed of AI development.

Shared context and the Agent Prime Directives

Addresses organisational readiness, learning systems and brownfield complexity. Enables Level 2 consistency and is a prerequisite for Level 3.

One of the workstreams I’m most excited about is what we call the Agent Prime Directives. It’s a centrally managed tool that provides the shared context, skills and tools that make AI coding assistants effective and consistent across teams. Think of it as curated context packs covering golden paths, IM8 security controls and platform documentation, combined with reusable skills and prompt templates that any team can consume. It’s open to community contributions, centrally maintained but locally consumed.

The impact has been striking. Some engineers in GovTech have already stopped writing code by hand entirely and are using only Claude Code with the Agent Prime Directives for all their work. That’s not a mandate. It’s organic adoption driven by genuine productivity gains. The Agent Prime Directives are a key enabler for scaling AI adoption consistently, and our next focus is extending them to support specifications and verification, which will be essential as we move toward Level 3 autonomous agents and beyond.

Code classification and governance

Addresses infrastructure uncertainty and standards volatility. Governance foundation for all levels.

A significant workstream addresses how government code is classified and governed when AI systems are involved. The core insight is that code is not the same classification as system data. It is typically lower risk, and this separation unlocks safe AI tool usage while maintaining appropriate governance. Code classification is approved and operational within our teams, enabling AI tools to be used with appropriate guardrails based on code tier, with a clear audit trail for compliance. But the governance challenge goes beyond classification. Each configuration of agent capabilities, models and components carries a distinct compliance posture.

Think about it this way. A Claude Code deployment using a Singapore-based model with read-only tools on object-oriented code presents a fundamentally different risk profile from a Copilot deployment using an overseas model with broad access. Governance in this environment isn’t about applying blanket policies. It’s about finding compliant paths through a graph of possibilities.

This “governance as graph traversal” concept is directly relevant to our compliance requirements. Policy clarity, particularly for agency-owned code that we develop, needs attention to resolve current blockers.

Code review evolution

Addresses code review’s uncertain future and the trust and abstraction risk. Enables safe operation at Level 2, essential for Level 3.

The future of code review is one of our most actively considered questions. We’re converging on a tiered framework where review rigour is proportional to rebuild cost and failure impact. For low-risk, easily rebuilt systems, traditional line-by-line review may give way to specification review, test quality assessment and formal verification. For high-risk national systems, human review stays essential. But even there, what humans review is shifting from implementation details to system properties and outcomes.

To support this transition, we’ve built Prelude, an in-house AI code review tool that complements existing security and code scanning tools by covering areas traditional tools miss. Prelude reviews for code bugs, best practices, security vulnerabilities, and will soon include IM8 compliance checking. The next phase will add automated patching of identified bugs. Early results are encouraging. Roughly half of Prelude’s review comments have been rated by engineers as either a good catch or genuinely helpful, and we’re continuously improving the signal-to-noise ratio based on feedback. As AI coding tools increase code volume, Prelude reduces the review burden on human engineers, catching issues in both human-written and AI-generated code before they reach production.

Legacy modernisation with AI

Addresses the brownfield reality, artefact preservation and organisational memory. Critical for making Level 2–3 viable on legacy estates.

One of the most strategically significant workstreams addresses Singapore government’s substantial estate of legacy systems. We’ve developed Graphiqode, an in-house tool that analyses legacy codebases to build visual dependency graphs, extract business rules and generate documentation.

The approach follows three phases. First, analyse. AI reviews legacy codebases to map dependencies, data models and business logic. Second, document. The tool auto-generates system documentation, architecture maps and decision records. Third, modernise. Documentation is fed into AI agents to improve, refactor and rebuild systems.

This approach is consistent with what industry leaders have described, treating the conceptual model rather than the code as the source of truth.

Platform and visibility

Addresses platforms as the enabling layer, organisational readiness and decision speed. Infrastructure for all levels.

We’re building the enabling infrastructure that makes everything else possible. Three tools are central here.

Backstage provides a service catalogue to track the growing number of prototypes and AI-developed applications across our organisation, with system ownership documentation, health tracking and resource graphs that allow AI agents to understand and improve systems.

GovPaas, ShipHats and RabbitDeploy provide secure, compliant hosting for AI-developed applications and prototypes, with standard CI/CD pipelines and golden path deployment.

Astrolabe provides engineering metrics and maturity tracking to measure the impact of AI-augmented workflows on throughput, quality and team health.

These tools collectively embody what I’d call the platform as Trojan horse principle. Platforms are the mechanism through which practices that teams would otherwise skip get embedded. If the right thing is also the easy thing, teams will do the right thing.

What ties all of this together is a coherent idea-to-product pipeline. A non-technical officer prototypes an idea using a vibe coding tool. The application gets registered and tracked in Backstage, then deployed safely on GCC (Government on Commercial Cloud). Prelude runs automated security scans and patching. When the prototype shows enough promise to warrant product approval and funding, engineers come in, using agentic coding assistants with the Agent Prime Directives for context engineering. AI-assisted code review through Prelude continues throughout. Automated compliance, penetration testing and log scanning happen in the background. And when it launches, automated exploratory testing validates the product. The entire journey, from idea to production, is tracked, auditable and on approved platforms. No step requires the builder to go outside the guardrails.

Research into formal verification

Addresses code review at scale and trust in AI-generated code. Prerequisite for Level 3–4 on high-risk systems.

Looking further ahead, we’re in discussion with researchers on formal verification of AI-generated code, focusing on practical feasibility. As AI generates more code at higher volumes, formal verification offers mathematical proof of correctness, not just testing. This becomes increasingly important for high-risk systems where human review can’t scale.

Our approach to people and teams

The industry observations I described earlier apply directly to our situation. Here’s how we’re responding.

On junior engineers, I intend to keep investing. We’ll continue graduate and intern hiring programmes, but modify onboarding to embrace AI-native workflows from day one. Willingness to experiment and make mistakes should be treated as features, not bugs. Curiosity and broad knowledge are the hiring signals that matter most right now.

On senior engineers, passive exposure won’t be enough. I’m designing active intervention programmes. Dedicated hackathons, protected learning days, time for genuine experimentation. The investment in these programmes isn’t optional. It’s the single most important lever we have for transforming existing teams.

On supervisory engineering, we need to build deliberate pathways. This skill isn’t being taught by universities or current training frameworks. We need to create the conditions for engineers to develop it through practice, structured mentoring and exposure to agent-driven workflows. The burnout risk from managing multiple agents in parallel is real, and our team designs need to account for it explicitly rather than just assuming adaptation will happen on its own.

What needs to happen

Pulling together the industry landscape, our strategic response and the work already underway, here’s what I think we need to focus on next.

We need active senior engineering engagement. Design and deploy structured interventions like hackathons, dedicated learning days and protected experimentation time to push senior engineers into deep AI engagement. Passive exposure won’t drive transformation.

We need to invest in learning systems. Build organisational memory through knowledge graphs that capture reasoning, failures and context, not just final artefacts. Treat the story as the data. The history of decisions is more valuable than the code.

We need to prepare for the bottleneck shift. Develop strategies for decision-making at agent speed. Approval processes, governance chains and change management will become the constraints. Design processes for machine-speed delivery.

We need to rethink code review. Move toward tiered review based on risk profile. Invest more heavily in specification review, test quality assessment and formal verification rather than line-by-line review for all changes.

We need to tier by risk, not uniformly. Develop frameworks for treating different software differently based on rebuild cost and failure impact. Not everything needs the same rigour. Applying the same governance to all code is both inefficient and insufficiently protective where it matters most.

We need to accelerate platform investment. Use platforms as the mechanism to embed practices that teams would otherwise skip. Make the right thing the easy thing.

We should explore solo-to-scale team models. Pilot projects where single individuals or very small teams take concepts to proof-of-value before scaling. Measure what this does to time-to-value, headcount requirements and bottlenecks.

We should reassess vendor and SaaS engagement models. Examine the value proposition of outsourced vendors and SaaS in an agent-orchestrated world. Start reassessing now.

We should continue junior hiring with AI-native onboarding. Leverage the positive sentiment around AI-native juniors by continuing graduate and intern programmes with AI workflows integrated from day one.

We need to prepare for post-vibe-engineering. Recognise agentic engineering as transitional. Start exploring agent orchestration and swarm patterns that will define the next phase of team dynamics and software delivery.

And we need to develop AI bias evaluation frameworks. Assess AI tools not just for capability but for potential commercial bias, particularly for procurement decisions with long-term platform implications.

Where this leaves us

We now live in a period of profound transition, with more open questions than settled answers. The shift from AI-assisted to AI-native development isn’t just a technological change. It’s a fundamental rethinking of how software is built, governed, valued, and what it means to be a software engineer.

For us in Singapore government, the strategic direction is clear and validated. Our vibe coding to agentic engineering strategy is on the right track, but it has to be understood as a transitional phase, not a destination. Junior engineers are more viable than prevailing narratives suggest, and I intend to keep investing in AI-native onboarding. Senior engineers need active intervention because passive exposure alone won’t cut it. Team structures may radically shrink, and the economics of vendor and SaaS engagement are being rewritten.

The organisations that will thrive through this transition aren’t necessarily those with the most advanced tools. They’re those that recognise we’re entering a genuinely different paradigm. One where the artefact may not be code, the bottleneck is human cognition and institutional process, and the core competency is knowing which decisions to delegate and which to reserve for human judgment.

The field agrees the role of engineering is shifting. Nobody has fully envisaged what it shifts to. That’s the work ahead, and it’s work that we at GovTech are uniquely positioned to help lead.

Success will require investment in foundations like platforms, governance, AI fluency and internal tooling. It’ll require tolerance for stochastic outcomes, willingness to redesign organisational structures around new constraints, and an honest reckoning with what’s gained and what’s lost as abstraction layers multiply.

The five-level framework provides a valuable mental model for understanding this progression and making deliberate choices about where to invest and how fast to move. Different parts of government will appropriately operate at different levels, and the governance and talent implications of each level transition need to be designed explicitly, not discovered accidentally.

The state of software development in Singapore government is one of managed, deliberate transition, grounded in strategy, validated by global peers, and clear-eyed about the profound changes ahead. The work to get there well has already begun.

Keen's Clippings

Explorer

From vibe coding to agentic engineering

How AI is reshaping software engineering in the Singapore government, and what comes next

What I’ve been hearing from the front line

The bottleneck has shifted, and nobody’s ready

What is the artefact, if not code?

Trust, care, and what’s lost in abstraction

Platforms as the enabling layer

How we think about agents matters

Organisational readiness gates everything

The brownfield reality

The human role is being redefined, but not yet designed

Infrastructure and standards uncertainty

New primitives for agent-based systems

Learning systems and organisational memory

Code review’s uncertain future

Business value beyond developer productivity

Self-healing and the future of operational resiliency

Changes to careers, roles and team structures

Junior engineers, more viable than expected

Senior engineers, a pattern of resistance

Get Sau Sheong’s stories in your inbox

The emerging skill of supervisory engineering

Three shifts beyond engineering

Decision speed becomes the bottleneck

The artefact may not be code

Team structures will change fundamentally

Vendor and SaaS disruption

The changing vendor landscape

The SaaS question

From vibe coding to agentic engineering

How we’re responding

AI coding assistants

Shared context and the Agent Prime Directives

Code classification and governance

Code review evolution

Legacy modernisation with AI

Platform and visibility

Research into formal verification

Our approach to people and teams

What needs to happen

Where this leaves us

Graph View

Table of Contents