Agentic File System Abstraction for Context Engineering
- Source: https://arxiv.org/pdf/2512.05470
- arXiv: 2512.05470 (cs.SE)
- Author metadata (arXiv submission): Xiwei Xu
- Clipped: 2026-03-09 (SGT)
TL;DR
The paper argues that the bottleneck in GenAI systems has shifted from model tuning to context engineering. It proposes a Unix-inspired architecture (“everything is a file”) where memory, tools, external knowledge, and human inputs are mounted and governed through a unified file-system abstraction, then operationalized via a 3-part context pipeline: Constructor, Updater, Evaluator.
Core contributions
- Reframes context engineering as a software architecture problem, not just prompt design.
- Proposes an Agentic File System (AFS) abstraction to manage heterogeneous context sources under one namespace.
- Maps design to software engineering principles: abstraction, modularity, separation of concerns, traceability, composability.
- Defines a persistent context repository lifecycle with:
- History (immutable log),
- Memory (structured/indexed, mutable),
- Scratchpad (temporary working context).
- Formalizes key GenAI constraints driving the architecture:
- token-window limits,
- model statelessness,
- probabilistic/non-deterministic outputs.
- Introduces a closed-loop context pipeline:
- Context Constructor (select/score/compress context),
- Context Updater (inject/refresh context during runtime),
- Context Evaluator (validate outputs, update memory, trigger human review when needed).
- Implements the approach in the open-source AIGNE framework and demonstrates two exemplars:
- memory-enabled agent,
- MCP-based GitHub assistant.
Practical implications
- Pushes teams toward treating context as versioned, governed infrastructure (closer to DevOps/DataOps) instead of ad-hoc prompts.
- Improves auditability/provenance for enterprise AI systems.
- Supports human-in-the-loop co-work as a first-class design element.
Caveats / limits
- Mainly an architectural proposal + framework implementation; limited quantitative benchmark comparisons in the text.
- Evaluation is exemplar-driven (proof-of-feasibility) rather than broad empirical validation across many domains.
- Real-world performance depends on quality of retrieval, metadata discipline, and governance policy design.