AllSource Prime queries agent memory in 12 microseconds. Not milliseconds. Microseconds.
For context: zer0dex's vector search takes 70ms. Mem0 and Letta are in the hundreds of milliseconds. A typical HTTP round-trip is 50-200ms.
12μs means memory lookup is invisible — faster than the time it takes to print the response. Here's how.
The Architecture
Query → DashMap projection → O(1) lookup → result
(no serialization, no network, no disk)
Every Prime projection is a DashMap — Rust's concurrent hash map with sharded locking. When you call prime_neighbors("node:person:alice"), it's a hash table lookup. No SQL parsing, no query planning, no disk I/O.
Why DashMap?
| Data Structure | Read Latency | Concurrent Reads | Write Contention |
|---|---|---|---|
HashMap + RwLock |
~50ns | Readers block on write | High |
DashMap |
~100ns | Lock-free reads | Per-shard only |
| SQLite | ~50μs | Single writer | Full table lock |
| PostgreSQL | ~500μs | MVCC | Row-level |
DashMap uses 64 shards by default. Reads are wait-free unless the exact same shard is being written to. In practice, with graph data spread across thousands of entity IDs, contention is near-zero.
The Projection Stack
Event arrives → WAL (durable) → Broadcast to projections
NodeState projection: entity_id → {properties, timestamps}
AdjacencyList projection: entity_id → [(peer, edge_id, relation)]
VectorIndex projection: entity_id → [f32; N] + HNSW index
DomainIndex projection: domain → [entity_ids]
CrossDomain projection: (domain_a, domain_b) → [edge_ids]
Each projection processes the event independently. The WAL provides durability. The DashMap provides speed. Projections are rebuilt from WAL on startup (with optional checkpoint acceleration).
Vector Search: HNSW at Memory Speed
For semantic queries, we use an HNSW (Hierarchical Navigable Small World) index via instant-distance. The index lives in memory alongside the DashMap projections.
| Operation | Latency |
|---|---|
| Embed (store vector) | ~1μs (DashMap insert) |
| Similar (k-NN search) | ~100μs for k=10 on 10K vectors |
| Recall (vector + graph) | ~200μs (search + 1-hop expansion) |
The HNSW index is rebuilt lazily on first query after mutations. For read-heavy workloads (the common case for agent memory), this means near-zero overhead.
Ingest: 469K Events/Second
Event → CRC32 checksum → WAL append → fsync batch → DashMap insert → Projection broadcast
The WAL batches fsyncs every 100ms (configurable). This means a power failure can lose at most 100ms of events — a trade-off between durability and throughput.
Parquet compaction runs in the background, converting WAL segments to columnar storage with Snappy compression. This provides 60-80% space savings without affecting query latency.
Why This Matters for Agents
Agent memory queries happen in the hot path — between user input and agent response. Every millisecond of memory lookup adds to perceived latency:
| Memory System | Query Latency | Impact on Response |
|---|---|---|
| AllSource Prime | 12μs | Invisible |
| zer0dex (ChromaDB) | 70ms | Barely noticeable |
| Mem0 (cloud API) | 200-500ms | Noticeable pause |
| Letta (LLM-managed) | 500ms+ | Visible delay |
At 12μs, you can query memory on every single message without the user ever noticing. That's the difference between "memory is a tool the agent calls" and "memory is always there."
Try It
cargo install allsource-prime
allsource-prime --data-dir ~/.prime/memory --mode http --port 3905
# Benchmark it yourself
curl -w "time_total: %{time_total}s\n" http://localhost:3905/api/v1/prime/stats
