Design notes
AgenticMD Design Notes
Companion to SPEC.md. SPEC states what conformance requires; this document states why those choices were made and what was considered and set aside. The reasoning here informs but does not constrain conformance.
Why split SPEC and design rationale
Two kinds of content live in spec projects:
- Normative. Rules an implementation must enforce. What conformance requires.
- Informative. Reasoning behind the rules. Why the choices were made. What was considered and rejected.
Mixing them in one document makes both harder to read. An implementer needs the rules cleanly; an evaluator needs the reasoning cleanly. SPEC.md is the first; DESIGN_NOTES.md is the second.
Standard pattern in spec projects — see RFCs (“Considerations” sections), OpenAPI (separate FAQ), language designs (rationale docs).
The node-type vocabulary — design rationale
The question
The vocabulary codifies seven types: domain_brief,
architecture_note, decision, guardrail, correction, failure,
project_rule. Why these seven? Why not six, or eight? On what
principle is closedness defended?
The answer in two parts
1. Empirical sufficiency. The seven are the set observed in production use across the reference corpus (~74 topic files, ~1,200 typed nodes). They were not designed top-down; they were extracted from a working corpus during a conformance audit. The claim being made is empirical sufficiency, not exhaustiveness — sufficient to express the agent-state-interaction behaviors documented to date, nothing more.
2. Principled organization. Empirical sufficiency alone is a weak claim — “it works for our one corpus” is not a defense against the critique “you’d find more if you looked at more corpora.” The principled companion is the dimension along which the seven are non-overlapping. The dimension is how each type interacts with the agent’s existing state.
| relation to existing agent state | type |
|---|---|
| gates further loading | domain_brief |
| adds to knowledge | architecture_note |
| adds to knowledge + locks rationale | decision |
| subtracts from action space (per task) | guardrail |
| invalidates prior knowledge | correction |
| conditionally activated by symptoms | failure |
| modifies operating posture across tasks | project_rule |
Each type occupies a distinct region of this relation-space. The addition gate (SPEC §4.3) requires an eighth type to occupy a region none of the seven occupies.
This converts “is the vocabulary exhaustive?” from a philosophical question (which it cannot answer) into an operational one (which it can): “show a relation along this dimension that none of the seven occupies, AND a corpus where that relation matters.”
Stress-tests of the dimension
The dimension was stress-tested against three failure modes before being adopted into the spec:
Test 1: Does any type sit at two points along the dimension?
decision was the most likely failure — it both adds knowledge and
forecloses options (a chosen path implicitly subtracts the unchosen
ones). But the primary relation is add+lock; the subtraction is a
downstream consequence, not the relation itself. The dimension
tolerates primary relations without requiring every secondary effect
to map separately. Holds.
Test 2: Are there relations not covered that aren’t example or
open_question?
| candidate relation | verdict |
|---|---|
| ”future obligation / promise” | folds into project_rule — obligation to act is a posture committing to action |
| ”passive observation / status” | folds into architecture_note — additive knowledge |
| ”alternative / fork” | folds into architecture_note + decision composition |
| ”warning / caveat” | folds into guardrail (subtractive on action space) or correction (if it invalidates) |
| “credit / acknowledgment” | social rather than functional; doesn’t have a relation-to-agent-state at all |
The two named candidates (example, open_question — see watchlist
below) are the strongest gaps. No third was found.
Test 3: Could the dimension itself fragment?
Natural worry: “subtracts from action space” (guardrail) and
“modifies posture” (project_rule) both look like operations on
what-the-agent-may-do. Are they actually one relation?
They are not. guardrail narrows the action space for a specific
task; project_rule modifies the agent’s operating mode across all
tasks. The first changes a per-task constraint; the second changes
the agent’s frame for evaluating any task. Distinct.
The dimension holds under these tests.
Why this dimension and not another
The dimension is principled but not unique. Other valid organizing dimensions exist:
- Temporal — was-true / is-true / will-be-true / unresolved
- Epistemic — certain / probabilistic / contested
- Modal — descriptive / prescriptive / proscriptive
The agent-state-interaction dimension is selected because it matches what AgenticMD is modeling: how an agent should treat content it retrieves. A spec organizing documentation for human historians might legitimately pick the temporal dimension; a spec for adversarial review might pick the epistemic one. The agent-state-interaction dimension is the one that earns its place in a retrieval-and-treatment spec for software agents.
Other dimensions either don’t carve cleanly here (temporal cross-cuts the types — corrections can be about past or present claims) or model something other than agent behavior (epistemic models source reliability, not agent action).
This is a defense by fit-to-purpose, not by uniqueness. SPEC §4.2 states this explicitly.
Chronology of the seven types
The node types did not come from agent improvisation during spec drafting. They were already in production use across the reference corpus when the extraction agent ran the conformance audit. The audit discovered the existing distribution; it did not propose it.
Usage data from the audit:
| type | count | share | likely load-bearing? |
|---|---|---|---|
architecture_note | 575 | 48% | yes — workhorse |
domain_brief | 172 | 14% | yes — convention-required |
guardrail | 131 | 11% | yes — deliberate authorial choice each time |
decision | 113 | 10% | yes |
correction | 92 | 8% | yes — used purposefully |
failure | 59 | 5% | mid — distinct shape, narrower applicability |
project_rule | 43 | 4% | lowest usage AND flagged in review for split — converging evidence to revisit at v1.0 |
Six of seven pass the load-bearing smell test. project_rule wobbles
and is on the v1.0 review list.
Watchlist (candidates for v1.0)
These types are not in the v0.3 vocabulary but have plausible candidacies. The watchlist is timeboxed to v1.0 — indefinite watchlist is not acceptable. If no second-corpus evidence has emerged by the v1.0 timeline, each candidate will be explicitly resolved: either promoted to formal type (if the dimension shows it is distinct) or formally folded into existing types (with documentation of how the existing types should absorb the use case).
example — the strongest candidate
Relation under the dimension: “adaptable substrate the agent transforms” — a thing the agent can copy and mutate.
This is genuinely absent from the seven. architecture_note adds
knowledge; decision constrains future change; guardrail subtracts
action. None of those is “here is a thing the agent can copy and
mutate.”
Distinct retrieval: triggered by “show me,” “give me a template,” “I want to write one like X.” Different from “how does X work.”
Distinct treatment: copy/adapt, not understand/summarize.
Why not in v0.3: currently lives inside architecture_note
bodies. The reference corpus doesn’t show the existing convention
breaking down. Watchlist for v1.0; if a second corpus shows examples
buried in architecture_note bodies hurting retrieval (briefs return
notes when the operator wants templates), promotion is justified.
open_question — secondary candidate
Relation under the dimension: “uncertainty to preserve.”
Treatment is genuinely distinct: the agent must not act as if the matter is settled; it must surface the uncertainty and defer to the user.
Why not in v0.3: the reference corpus shows few of these. Teams tend to decide or stay silent. Could be valuable for in-flight or incomplete corpora. Watchlist for v1.0.
Folds — not candidates
These were considered and folded into existing types:
specification(testable MUST/SHOULD contracts) — prohibitions →guardrail, choices →decision, descriptions of contracts →architecture_notewith MUST language in the bodymigration_note(version-conditional behavior) —correctioninvalidates; migration supplements. Distinction is real but v0.x corpora don’t have migration complexity. v2+ if at all.definition(single-term lookup) — glossary topics with per-termarchitecture_notesections cover this; the granularity is rightbackground(why a domain exists) — folds intodomain_brief; same purpose at a different altitude
§6.3 resolution semantics — why deferred
SPEC §6.3 declines to mandate a resolution mechanism in v0.x. The v1.0 obligation is to promote the non-normative defaults to MUST.
The reasoning for deferral:
- The v0.x draft window exists for the schema to move.
- Pinning ambiguous semantics before a second implementation exists is how specs end up with deprecation sections.
- Every spec that hard-coded resolution semantics in early drafts has had to walk them back when a second implementation surfaced reasonable disagreements.
The non-normative defaults in §6.3 give publishers a starting point without committing the spec to those exact defaults. When a second binding emerges (a database backend, a JSON-LD binding, etc.) and stress-tests the semantics, v1.0 will promote what survived to MUST.
The residual weakness
The (purpose, retrieval, treatment) triples defend why each of the seven is real. The dimension defends why the seven are non-overlapping. Neither derives the seven from first principles.
The spec can show “each type has a defensible triple” and “no two types occupy the same relation along the dimension.” It cannot show “every defensible triple corresponds to one of the seven types.”
This asymmetry is the residual weakness. The honest path forward is empirical-sufficiency with the operational addition gate, not philosophical claims of completeness. A credible derivation would require actual cognitive-science work on documentation-agent behavior — outside the scope of a v1.0 spec.
If such a derivation becomes available (from research, from a second corpus exposing real gaps, from a different organizing dimension that turns out to carve more cleanly), the spec is positioned to absorb it without losing what’s already working.
Why we dropped .amd
v0.2 registered .amd as a canonical extension for AgenticMD
documents. v0.4 reverses that decision. The reasoning:
The TSX/JSX analogy was wrong
The strongest v0.2-era argument for .amd borrowed the TSX/JSX
pattern: same substrate, different intent, dedicated extension. That
analogy is wrong. .tsx deserves its own extension because the
syntax actually differs — JSX adds non-JavaScript token sequences
that a standard JS parser cannot handle. AgenticMD adds no new syntax.
The frontmatter is YAML, the body is CommonMark, the section markers
{#id node:type} are valid CommonMark heading-attribute syntax. There
is nothing for a parser to do differently. A custom extension here
was signaling without substance.
The comparable-format pattern
| Project | Extension | Syntax differs from substrate? |
|---|---|---|
| OpenAPI | .yaml / .json | No — content-identified |
| JSON Schema | .json | No — content-identified |
| Pandoc Markdown | .md | Stricter rules, same syntax |
| GitHub Flavored Markdown | .md | Extended rules, same syntax |
| MyST | .md | Stricter profile, same syntax |
| MDX | .mdx | Yes — embeds JSX |
| AsciiDoc | .adoc | Yes — different markup |
| reStructuredText | .rst | Yes — different markup |
The pattern: dedicated extension when the syntax actually differs; same extension when only the discipline differs. Pandoc Markdown is in the Pandoc category. MDX is in the MDX category. AgenticMD belongs with Pandoc.
The free-tooling argument is decisive
The whole reason markdown is the chosen substrate is universal tool
support. Every renderer (GitHub, GitLab, Bitbucket), IDE (VS Code,
JetBrains, Vim), code-review tool, syntax highlighter, spell-checker,
formatter, prettier plugin, vale linter, and LLM context window
already handles .md. None handle .amd without per-tool
configuration.
This is the same trust-preservation argument that justified the markdown substrate in SPEC §1.4: a human reviewer should be able to inspect what an agent wrote without an intermediary tool. A custom extension violates that property — the reviewer’s editor now needs configuration to render the file. We chose markdown to keep the substrate free; inventing a new extension surrenders the win for the signaling benefit, which is small.
Discrimination happens at the corpus level
The mixed-content-directory argument for .amd assumed file-level
discrimination was required. It isn’t. Two cleaner mechanisms exist:
- Corpus-level declaration (SPEC §1.5.1): a directory contains an
AgenticMD corpus iff it (or some directory reachable from it) holds
a file with
node_type: corpus_root. Within scope, all.mdfiles follow the discipline. This mirrors how.editorconfigandtsconfig.jsondiscover their scopes. - Content-based identification: a file with
node_type: topic_rootfrontmatter plus typed##markers has a signature that any tool willing to read 50 lines of YAML can identify. The validator already does this.
In neither case is a custom extension load-bearing.
The reference corpus already told us
The strongest single signal: the reference corpus never migrated to
.amd in the first place. AI Studio’s docs/book-ai/ directory is
and always has been .md. After v0.2 was published, the reference
adopter — the one entity with the most reason to follow the spec
exactly — quietly chose not to follow it on the extension question.
v0.4 is not reversing a decision adopters made; it is reversing a decision adopters quietly ignored. The spec is catching up to its own reference corpus.
Why this was hard to see in v0.2
v0.2 was still operating under the “AgenticMD is a file format” framing. v0.3 reframed it as “a retrieval model with a markdown binding,” which already implied that the substrate is incidental and the discipline is the contribution. The extension question wasn’t on the table during the v0.3 reframe, so the v0.2 decision persisted through v0.3 unexamined. v0.4 is the catch-up.
The trajectory is convergent: v0.2 → v0.3 sharpened the framing; v0.4 drops the extension that the new framing already obsoleted. Each correction makes the next one visible. This is the v0.x draft window working as intended.
Substrate independence
SPEC framing positions AgenticMD as a retrieval model with a CommonMark + YAML binding. The substrate is chosen because every renderer already supports it; the same retrieval model could be expressed in other bindings:
- Database table — typed sections as rows, frontmatter as columns, references as foreign keys
- JSON-LD — typed sections as nodes, references as IRIs
- Structured columnar store — same idea, different indexing
None of these have been built. The markdown binding is the one v0.x ships. A second binding is on the v1.0+ list partly because it would stress-test substrate-independence and partly because it would give the resolution-semantics question (§6.3) a second voice to disagree with.
The substrate is incidental, not load-bearing. The contribution is the retrieval model.
See also
SPEC.md— the normative specification (what conformance requires)CHANGELOG.md— what changed in each versionREADME.md— project orientation