brain.db is the personal knowledge graph behind the
1n2.org universe — a 2.84 GB SQLite store holding the
MadBitcoins, World Crypto Network and Bitcoin Group
transcripts going back to spring 2013, Thomas Hunt's full Twitter
archive plus live scrapes of the WCN / Tone Vays / Ben Arc / Josh Scigala
handles, every published Curio Wiki article, the daily reports, and
the canonical entity database that stitches it all together.
brain.db is Thomas Hunt's working archive — thirteen
years of show transcripts (MadBitcoins, WCN, The Bitcoin Group),
the Twitter archive export plus the freshly-scraped feeds of
@WorldCryptoNet,
@ToneVays,
@arcbtc, and
@JScigala,
the 140 Curio Wiki articles, the published reports, Google News digests,
Plex viewing history, and the canonical entity database
(310 profiles as of this build)
that links every name to every mention.
Everything — raw row, tag, link, embedding — lives in one
SQLite file at ~/brain/brain.db. The public surfaces
(/brain/, /brain/ask/,
the entity profiles, the predictions verifier, the reports) are all
rendered windows onto this one store.
Every value below is the output of a live SQL query against
~/brain/brain.db at build time. Snapshot
2026-05-30T16:34Z.
itemstagslinksembeddingsitems_fts/entities/
The store is a single SQLite database file running in
WAL (Write-Ahead Logging) journal mode so that the
hourly capture writers (voice memos, Twitter monitor, Plex sync, news
fetchers) and the readers (the public site generator, the
/brain/ask/ RAG endpoint, the entity-summary pipeline)
don't block each other. Schema is denormalized on purpose: every
captured thing — a tweet, a transcript segment, a screenshot, a
news headline, a voice memo — lands as one row in
items, tagged with a source string that
identifies which collector wrote it.
Dedup is enforced by a partial UNIQUE index on
(source, source_id) — collectors are free to
re-run as often as they like; if the upstream ID has already been
seen, the insert is silently skipped:
CREATE UNIQUE INDEX idx_items_src_sid
ON items(source, source_id)
WHERE source_id IS NOT NULL;
The core table: id (sha1), source,
source_id, file_path, ISO8601
created_at + captured_at,
raw_text, summary,
details_json, status (captured /
linked / refined), tier (working / cold).
561,095 rows.
Many-to-one join from item_id to tag
string. Drives the topical facets in /brain/'s
sidebar — #bitcoin, #twitter-algo, etc.
496,262 edges.
Item-to-item edges with a kind (similar,
quote, reply, …) and a numeric
score. Built nightly by the graphify pass.
2.9 million edges.
One float vector per item, used by /brain/ask/
for retrieval. 587,817 vectors covering ~95% of the writable
surface.
FTS5 virtual table over raw_text +
summary. Built contentless
(content='') for write throughput, so MATCH returns
id only — the readable text has to be joined
back from items.
Per-collector run state, the queue of items that still need LLM summarization or entity-tagging, and a slot for facts that contradict other facts — the substrate for future reconciliation passes.
Because items_fts is contentless, you can't just
SELECT * FROM items_fts WHERE items_fts MATCH '?' and
read the snippet directly — FTS5 doesn't store the body. The
public /brain/ask/ endpoint runs the LIKE-based search
against items.raw_text for snippet display, then uses
the FTS index only to rank candidate IDs. If you ever query the
store yourself, do the same: FTS to narrow, items
to read.
Everything published at 1n2.org is a rendered window onto this one store. When a number changes here, those downstream surfaces change at their next build.
310 people, projects, organizations, shows, and Curio cards
— each one cross-referenced against items
transcript hits and tweet appearances to build the "mentioned
in" backlink list on its profile page.
Auto-expanded nightly by the wiki cron, which pulls quote
candidates from items_fts and writes
well-attributed paragraphs. Each article carries
[[entity-slug]] references that resolve against the
same entity store.
The 19-panelist accuracy chart joins each verified-or-falsified
call back to the TBG transcript segment that contains the original
quote. The correct/wrong/abstain totals come from a SQL
pass over items tagged
#prediction-verified.
The 86-guest swimlane chart draws its canonical names from the
entity store (which itself was built off the
tbg-mirrors/guests.json + Whisper-variant correction
log).
The site-wide search index ingests both the rendered HTML
pages and a curated public-safe slice of
items (no private capture, no health, no location,
no DMs), so a query like "Saylor 2014" can return a
MadBitcoins transcript hit alongside a wiki article.
The retrieval-augmented Q&A endpoint embeds the user's
question, runs an ANN search over the 587k vectors in
embeddings, joins back to items.raw_text
for the actual quote, and returns a cited answer with the
transcript or tweet ID as the source.
Where the data came from. Every artifact path below is on disk at the host writing this page — the dates are file mtimes, not narrative.
condition_on_previous_text=True re-pass that
improves cross-segment coherence.
items.
items. Runs via cron every 10 minutes.
items.
The verification rule is the same as in every published report on this site: a number is only allowed on the page if the SQL that produced it can be re-run, and a date is only allowed if the artifact exists at the path cited. If you find a number here that won't reproduce, that's a bug and worth filing.
Last 12 items written to items, by
captured_at. The hourly capture loop is dominated
right now by the WCN segment ingest from the live transcription
supervisor — that's why the source column reads
world_crypto_network over and over.
Live stream of the daily curated surface: /brain/. Per-source archives at /brain/source/. Today's window: /brain/today/.
The store is local to the host that writes this page; it's not
exposed over the wire. But the public reports cite item IDs and
the public mirror at /brain/data.json carries the
curated, public-safe slice. The shape of a useful query is:
-- Find all references to a name across the corpus, newest first
SELECT i.id, i.source, substr(i.captured_at,1,16) AS when_,
substr(i.raw_text, 1, 220) AS snippet
FROM items_fts f
JOIN items i ON i.id = f.id
WHERE items_fts MATCH 'saylor NEAR microstrategy'
ORDER BY i.captured_at DESC
LIMIT 50;
Or directly against items.raw_text when you need
LIKE-style fragment search:
SELECT id, source, substr(captured_at,1,16), substr(raw_text,1,180)
FROM items
WHERE raw_text LIKE '%curio cards%'
AND source IN ('the_bitcoin_group','world_crypto_network','mad_bitcoins')
ORDER BY captured_at DESC LIMIT 25;