About the knowledge store — brain.db

§ 01 · what brain.db is

A personal knowledge spine.

brain.db is Thomas Hunt's working archive — thirteen years of show transcripts (MadBitcoins, WCN, The Bitcoin Group), the Twitter archive export plus the freshly-scraped feeds of @WorldCryptoNet, @ToneVays, @arcbtc, and @JScigala, the 140 Curio Wiki articles, the published reports, Google News digests, Plex viewing history, and the canonical entity database (310 profiles as of this build) that links every name to every mention. Everything — raw row, tag, link, embedding — lives in one SQLite file at ~/brain/brain.db. The public surfaces (/brain/, /brain/ask/, the entity profiles, the predictions verifier, the reports) are all rendered windows onto this one store.

§ 02 · by the numbers

By the numbers.

Every value below is the output of a live SQL query against ~/brain/brain.db at build time. Snapshot 2026-05-30T16:34Z.

Items

561,095

rows in items

Breakdown by source category

Twitter — archive export + live handle scrapes twitter_archive, twitter_like_archive, twitter_tweet_header, twitter_follower, twitter_algo, twitter_tweets_*, twitter_replies_*

405,134

Facebook export (reactions, messages, posts, photos, comments) facebook_reaction, facebook_message, facebook_photo, facebook_post, facebook_comment

62,678

Screenshots (auto-captured + manual) screenshot

24,970

Google News digests gnews_tech, gnews_bitcoin, gnews_ai, …

21,303

Plex media + watch history plex_media (11,817) + plex_watch (3,335)

15,434

Lifecycle places (Google location) lifecycle_place

13,028

World Crypto Network episode bodies world_crypto_network

4,162

Mad Bitcoins episode bodies mad_bitcoins

812

The Bitcoin Group episode bodies the_bitcoin_group

482

Recent growth

last 24 hours

15,295

new items captured

last 7 days

38,928

~5,560 per day

last 30 days

135,290

~4,510 per day

most recent capture

2026-05-30
T16:30Z

WCN segment, hourly cron

§ 03 · how it works

How it works.

The store is a single SQLite database file running in WAL (Write-Ahead Logging) journal mode so that the hourly capture writers (voice memos, Twitter monitor, Plex sync, news fetchers) and the readers (the public site generator, the /brain/ask/ RAG endpoint, the entity-summary pipeline) don't block each other. Schema is denormalized on purpose: every captured thing — a tweet, a transcript segment, a screenshot, a news headline, a voice memo — lands as one row in items, tagged with a source string that identifies which collector wrote it.

Dedup is enforced by a partial UNIQUE index on (source, source_id) — collectors are free to re-run as often as they like; if the upstream ID has already been seen, the insert is silently skipped:

CREATE UNIQUE INDEX idx_items_src_sid
  ON items(source, source_id)
  WHERE source_id IS NOT NULL;

items

The core table: id (sha1), source, source_id, file_path, ISO8601 created_at + captured_at, raw_text, summary, details_json, status (captured / linked / refined), tier (working / cold). 561,095 rows.

links

Item-to-item edges with a kind (similar, quote, reply, …) and a numeric score. Built nightly by the graphify pass. 2.9 million edges.

embeddings

One float vector per item, used by /brain/ask/ for retrieval. 587,817 vectors covering ~95% of the writable surface.

items_fts

FTS5 virtual table over raw_text + summary. Built contentless (content='') for write throughput, so MATCH returns id only — the readable text has to be joined back from items.

capture_metadata · refine_events · contradictions

Per-collector run state, the queue of items that still need LLM summarization or entity-tagging, and a slot for facts that contradict other facts — the substrate for future reconciliation passes.

caveat · search

Because items_fts is contentless, you can't just SELECT * FROM items_fts WHERE items_fts MATCH '?' and read the snippet directly — FTS5 doesn't store the body. The public /brain/ask/ endpoint runs the LIKE-based search against items.raw_text for snippet display, then uses the FTS index only to rank candidate IDs. If you ever query the store yourself, do the same: FTS to narrow, items to read.

§ 04 · what's connected to it

What's connected to it.

Everything published at 1n2.org is a rendered window onto this one store. When a number changes here, those downstream surfaces change at their next build.

consumer · entities

Canonical entity database

310 people, projects, organizations, shows, and Curio cards — each one cross-referenced against items transcript hits and tweet appearances to build the "mentioned in" backlink list on its profile page.

consumer · wiki

Curio Wiki (140 articles)

Auto-expanded nightly by the wiki cron, which pulls quote candidates from items_fts and writes well-attributed paragraphs. Each article carries [[entity-slug]] references that resolve against the same entity store.

consumer · predictions

Predictions verified leaderboard

The 19-panelist accuracy chart joins each verified-or-falsified call back to the TBG transcript segment that contains the original quote. The correct/wrong/abstain totals come from a SQL pass over items tagged #prediction-verified.

consumer · timeline

TBG timeline

The 86-guest swimlane chart draws its canonical names from the entity store (which itself was built off the tbg-mirrors/guests.json + Whisper-variant correction log).

consumer · search

Unified site search

The site-wide search index ingests both the rendered HTML pages and a curated public-safe slice of items (no private capture, no health, no location, no DMs), so a query like "Saylor 2014" can return a MadBitcoins transcript hit alongside a wiki article.

consumer · ask

/brain/ask/ (RAG)

The retrieval-augmented Q&A endpoint embeds the user's question, runs an ANN search over the 587k vectors in embeddings, joins back to items.raw_text for the actual quote, and returns a cited answer with the transcript or tweet ID as the source.

§ 05 · provenance

Provenance.

Where the data came from. Every artifact path below is on disk at the host writing this page — the dates are file mtimes, not narrative.

Pipeline

Artifact on disk

Last run

Whisper transcription (yt-dlp + ggml-base.en) Show audio → segmented text. Corrected condition_on_previous_text=True re-pass that improves cross-segment coherence.

~/brain/transcripts/_ctx_prompt/ (5,580 transcript files)

2026-05-23

Twitter archive ingest Personal Twitter data export (tweets, likes, headers, followers, algo timeline) parsed into items.

~/brain/brain/capture/twitter_archive.py

2026-04-12

Live Twitter handle scrape (Playwright + Chrome profile) Attaches to an authenticated Chrome session and scrapes the timelines of WCN / Tone Vays / Ben Arc / Josh Scigala / thuntnet / TBG.

~/Sites/1n2.org/tweetster/scrape_with_chrome.py

2026-05-30

Voice memos transcription (every 10 min) Voice Memos.app capture → Whisper transcribe → items. Runs via cron every 10 minutes.

~/brain/brain/capture/voice_memos.py

cron

Curio Wiki cron (LLM expansion) Picks 2-3 wiki topics per day, draft via Claude, gate by 200-word quality check, deploy to droplet only on change.

~/Sites/1n2.org/content-cron.sh + wiki-expander.sh

2026-05-30 01:08

Entity database build Per-kind Python modules (people, shows, projects, organizations, cards, vegas) → rendered HTML profiles + JSON dump + backlinks.

~/Sites/1n2.org/_entities/build.py

2026-05-30 09:31

Google News digests (3x daily) RSS pull across Bitcoin / Tech / AI feeds, OG backfill, dedup against existing items.

~/brain/brain/capture/gnews_rss.py + gnews_og_backfill.py

cron

Plex weekly sync Watch history + library from three servers (tnas, tnas2, dolphin), Sunday 04:00.

~/brain/brain/capture/plex_history.py

2026-05-25 (last Sun)

The verification rule is the same as in every published report on this site: a number is only allowed on the page if the SQL that produced it can be re-run, and a date is only allowed if the artifact exists at the path cited. If you find a number here that won't reproduce, that's a bug and worth filing.

§ 06 · recent activity

Recently ingested.

Last 12 items written to items, by captured_at. The hourly capture loop is dominated right now by the WCN segment ingest from the live transcription supervisor — that's why the source column reads world_crypto_network over and over.

2026-05-30T16:30Z

world_crypto_network

Hi there and welcome to the World Crypto Network. I'm Jamie Nelson and today I am…

2026-05-30T16:29Z

world_crypto_network