Agentic File System Abstraction for Context Engineering

TL;DR

The paper argues that the bottleneck in GenAI systems has shifted from model tuning to context engineering. It proposes a Unix-inspired architecture (“everything is a file”) where memory, tools, external knowledge, and human inputs are mounted and governed through a unified file-system abstraction, then operationalized via a 3-part context pipeline: Constructor, Updater, Evaluator.

Core contributions

  • Reframes context engineering as a software architecture problem, not just prompt design.
  • Proposes an Agentic File System (AFS) abstraction to manage heterogeneous context sources under one namespace.
  • Maps design to software engineering principles: abstraction, modularity, separation of concerns, traceability, composability.
  • Defines a persistent context repository lifecycle with:
    • History (immutable log),
    • Memory (structured/indexed, mutable),
    • Scratchpad (temporary working context).
  • Formalizes key GenAI constraints driving the architecture:
    • token-window limits,
    • model statelessness,
    • probabilistic/non-deterministic outputs.
  • Introduces a closed-loop context pipeline:
    1. Context Constructor (select/score/compress context),
    2. Context Updater (inject/refresh context during runtime),
    3. Context Evaluator (validate outputs, update memory, trigger human review when needed).
  • Implements the approach in the open-source AIGNE framework and demonstrates two exemplars:
    • memory-enabled agent,
    • MCP-based GitHub assistant.

Practical implications

  • Pushes teams toward treating context as versioned, governed infrastructure (closer to DevOps/DataOps) instead of ad-hoc prompts.
  • Improves auditability/provenance for enterprise AI systems.
  • Supports human-in-the-loop co-work as a first-class design element.

Caveats / limits

  • Mainly an architectural proposal + framework implementation; limited quantitative benchmark comparisons in the text.
  • Evaluation is exemplar-driven (proof-of-feasibility) rather than broad empirical validation across many domains.
  • Real-world performance depends on quality of retrieval, metadata discipline, and governance policy design.