Your Voice File Shouldn't Be a Dead Markdown File

There's a thesis going around that's basically correct: your voice is your last competitive moat. We've hit peak AI homogenization. Every LinkedIn post sounds the same. Every code review reads identical. Every proposal uses the same three hedging phrases. When everyone runs the same frontier model with default settings, professional output flattens into one generic voice.

The fix people are sharing: a voice file. Run yourself through a ~100-question interview, compress the transcript into a ~4k-token markdown file that encodes how you think, your communication patterns, your domain expertise and battle scars, your contrarian takes and frameworks. Build it once, drop it into any AI tool, sound like yourself again.

The instinct is right. The format is already obsolete.

The static markdown voice file is a v0

Be fair to it first: a markdown voice file is simple, it's zero-infra, it works offline, and it lives in one portable file you can hand to a ghostwriter or version in Obsidian. That's a real starting point and a lot of people have built one. Keep it if it's working for you.

But once you actually use it for a while, four cracks show:

It goes stale. Your voice changes. To update it you re-run the interview and re-compress. So most people never do, and the file slowly drifts from who they are now.
It can't be queried. You paste the entire 4k-token blob into every prompt — your contrarian takes on databases when you're writing about hiring, your communication style facets when you need a war story. One undifferentiated lump, every time.
It has no history. A flat file can't tell you how your thinking evolved, when you recorded a take, or that you softened on something last quarter. It's a snapshot with no provenance.
It's a single blob. It's portable as one file, but it isn't shared — there's no clean way for a team to build and recall a collective voice from it.

The idea — a durable identity layer you bring to any model — is exactly right. The blob is just the wrong data structure for it.

prime makes your voice live

prime is the durable, queryable identity layer the voice file always wanted to be. Instead of a flat file, each interview answer becomes one embedded voice node in a durable store: a thinking pattern, a communication-style rule, a piece of domain expertise, a contrarian take, a framework. You record facets once — and from then on you don't paste your voice, you recall the slice that matters for the task in front of you.

That store is durable the same way the rest of AllSource is: every facet you record is an immutable event in AllSource Core (write-ahead log with CRC32 checksums and fsync, snapshotted to Snappy-compressed Parquet). Your voice survives restarts and lives on your own disk. It is never a row in Postgres and never an in-memory cache that vanishes — it's an event log you own.

The magic moment: recall the right slice by meaning

Here's the working hero feature, and it's the whole point. We seeded a real voice file — 12 facets — into the live allsource-prime binary (v0.21.6) and asked it to recall for this writing task:

"Write a short LinkedIn post arguing that when your system's availability hurts, adding another database is usually the wrong fix."

prime_recall returned, ranked by meaning, exactly the facets that matter for this task — and nothing else:

recalled facet	type	score
"Add a database is usually the wrong fix"	contrarian_take	0.757
"Distributed systems durability"	domain_expertise	0.688
"Event sourcing battle scars"	domain_expertise	0.619
"Make the implicit explicit"	strategic_framework	0.587
"Invert the question"	thinking_pattern	0.573

The query was reworded — it never said "contrarian take" or "replication" — and recall still surfaced the user's own contrarian thesis as the top hit at 0.757, then their durability expertise and a relevant framework. That's the thing a flat blob can't do: it returns your voice for this task, not your whole voice every time.

(Numbers here trace to the captured demo in tooling/voice-demo/RESULTS.md; nothing is hand-written.)

The proof: voice-ON vs voice-OFF, same prompt

We then had an agent write the post twice, same prompt — once with that recalled slice injected, once generic. The difference is the product.

voice-OFF (no recall — generic):

When your system's availability starts to suffer, it can be tempting to reach for another database to help share the load. However, adding a new database isn't always the best solution... Here are a few things to keep in mind: 🚀 Complexity ... 🔄 Consistency ... 💸 Cost ... What's your experience been? Would love to hear your thoughts in the comments below. 👇

Hedged, list-y, emoji, hype, the default LLM register everyone now recognizes.

voice-ON (recalled facets injected):

Adding a database is usually the wrong fix.

When availability hurts, the reflex is to bolt on another stateful component — a replica database, a cache, a queue. It feels like progress. It isn't. You just added a second consistency surface to keep in sync and a second thing to page you at 3am.

...I watched a team add a read-replica datastore to fix p99 latency. Six weeks later the outage wasn't the original database — it was the replication lag between the two. They'd traded one failure mode for a worse one.

Punchline first. A concrete war story. No emoji, no hype. The recalled contrarian thesis carried verbatim ("replication of the system you already run," not "bolting on a new stateful component"). That difference is entirely the recalled voice slice — the only variable between the two completions.

Your voice has history

A markdown blob has no provenance. A prime voice file time-travels: prime_history on any facet shows when it was recorded and every change since. In the demo, querying one facet returns its prime.node.created event with a real timestamp. You can see how your voice evolved — and prove it.

Portable across every AI tool

The portability promise the trend makes — "one file, works across Claude, GPT, Gemini" — prime keeps, and makes native. Your voice file is one --data-dir. The prime_* tools are MCP-standard, not Claude-specific, so any MCP-capable agent speaks them: Claude Code, Claude Desktop, Cursor, Cline, Windsurf, Codex, Gemini. Point each tool's MCP config at the same data dir and your voice follows you with no copy-paste:

{
  "mcpServers": {
    "prime": {
      "command": "allsource-prime",
      "args": ["--data-dir", "~/.prime/voice"]
    }
  }
}

Same store, spoken by every agent. You version your voice like code, and it deploys everywhere — without re-pasting a 4k-token blob into each tool.

Team voice: a shared identity layer

Individual voice is the start. The same mechanism scales to a team's collective voice — shared ADR style, code-review culture, conventions, the contrarian takes your team actually defends. Add the hosted-sync flags and your facets ship to your team's tenant:

{
  "mcpServers": {
    "prime": {
      "command": "allsource-prime",
      "args": ["--data-dir", "~/.prime/voice", "--sync-to", "https://api.all-source.xyz"]
    }
  }
}

Local-only stays the free default; hosted/team is a pull, not a gate. Auth terminates at the Control Plane — Core never authenticates public traffic. Prefer PRIME_API_KEY (env) over an inline key so it never lands in a committed config.

Build yours in 60 seconds

prime ships as a skill + command inside the mammoth plugin — the voice file is the identity layer of the same durable memory substrate mammoth gives your agent. Install once:

cargo install allsource-prime
/plugin marketplace add all-source-os/all-source
/plugin install mammoth

Then run the interview:

/voice run

Pick a short pass (~20 questions, four per facet) or the full ~100. The agent records each answer as an embedded voice node as you go — no separate "compress the transcript" step. After that, the skill recalls your relevant voice slice automatically before it drafts any first-person writing. You don't paste your voice. It just shows up.

It composes with the rest of the family: caveman compresses what your agent says; mammoth remembers what it learned; the voice file is how you say it.

caveman make few token. mammoth never forget token. voice file make token sound like you.

Static markdown vs prime voice (fair comparison)

	static markdown voice file	prime voice (recall)
Simplicity / zero-infra	✅ one file, nothing to run	⚠️ runs a local MCP server
Works offline, today	✅ yes	⚠️ after a one-time embedder warm — `allsource-prime --mode warm` (or vendor the model dir via `PRIME_EMBED_MODEL_DIR`); then record/recall fully offline. First embed on a fresh install fetches the model
Stays current	❌ re-run interview + recompress	✅ append one facet
Query by relevance	❌ paste the whole 4k-token blob	✅ `prime_recall` returns only the relevant slice (top hit 0.757 in the demo)
History / provenance	❌ none	✅ `prime_history` time-travels every facet
Portable across MCP agents	⚠️ re-paste the file per tool	✅ one `--data-dir`, every MCP agent natively
Team sharing	❌ hand a file around	✅ hosted sync (`--sync-to`) — shared team voice
Auto-compressed one-file export	✅ that's all it is	✅ generated on demand (12 nodes / 5 domains / 77 tokens in the demo)

prime wins on durability, queryability, history, portability, and team — not by strawman. Markdown genuinely wins on simplicity and a single zero-infra file; prime's auto-compressed export works too, generated live on demand (see honest limits below).

Honest limits

We hold over-claiming as the cardinal sin on an identity product, because the entire point of a voice file is trust. So, plainly:

What works today, proven against the real binary (allsource-prime 0.21.6, now on crates.io): recording facets (prime_add_node + prime_embed), prime_recall (the relevant-slice recall — the hero feature, demo top hit 0.757), the auto-generated compressed voice index (prime_index returns the live, populated index — 12 nodes, 5 domains, 77 tokens — so the one-file export and --auto-inject work too), prime_stats, and prime_history. The voice flow runs end to end on these.
The first prime_embed on a fresh install needs the network once. The embedder model (all-MiniLM-L6-v2, 384 dims) is fetched from HuggingFace on first use, so the very first embed behind a proxy / firewall / offline will fail until the model is cached. Make it offline-clean by warming once (allsource-prime --mode warm) or vendoring the model (PRIME_EMBED_MODEL_DIR); or skip the bundled embedder entirely by passing your own 384-dim vector to prime_embed / prime_recall. Tracked in #200.
prime_context's tiered recall works across all tiers: L0 is stats-only (no vectors — that's the tier's job, by design, not a gap), L1 carries recent-conversation context, and L2 now returns full hybrid recall — the compressed index plus vector hits plus graph nodes (an L2 query returns the populated index + 4 vectors + 3 graph nodes, top node score 0.73). The earlier vector-arm gap is fixed (commit 5083017). There is no remaining prime_context or prime_index TODO.
The real, honest limit: recall quality scales with how many facets you've recorded. A voice file with a handful of answers recalls a thinner slice than one with the full ~100-question pass — the index and the ranking are only as rich as what you've put in. Record more facets, recall more of your voice.

The shift

In 2026 everyone has the same frontier models. The differentiator is the identity layer you bring — and that layer shouldn't be a file that goes stale the moment you save it. Make it live. Make it queryable. Make it yours, with a history, portable across every tool, shareable across your team.

What would your 100 questions reveal about how you actually think — and wouldn't you rather they stayed current?

cargo install allsource-prime
/plugin install mammoth
/voice run

Your voice file shouldn't be a dead markdown file. Make it live.