1n2.org brain about the knowledge store
entity profile  ·  live feed
brain.db · about the knowledge store

One SQLite file.
Thirteen years of primary source.

brain.db is the personal knowledge graph behind the 1n2.org universe — a 2.84 GB SQLite store holding the MadBitcoins, World Crypto Network and Bitcoin Group transcripts going back to spring 2013, Thomas Hunt's full Twitter archive plus live scrapes of the WCN / Tone Vays / Ben Arc / Josh Scigala handles, every published Curio Wiki article, the daily reports, and the canonical entity database that stitches it all together.

§ 01 · what brain.db is

A personal knowledge spine.

brain.db is Thomas Hunt's working archive — thirteen years of show transcripts (MadBitcoins, WCN, The Bitcoin Group), the Twitter archive export plus the freshly-scraped feeds of @WorldCryptoNet, @ToneVays, @arcbtc, and @JScigala, the 140 Curio Wiki articles, the published reports, Google News digests, Plex viewing history, and the canonical entity database (310 profiles as of this build) that links every name to every mention. Everything — raw row, tag, link, embedding — lives in one SQLite file at ~/brain/brain.db. The public surfaces (/brain/, /brain/ask/, the entity profiles, the predictions verifier, the reports) are all rendered windows onto this one store.

§ 02 · by the numbers

By the numbers.

Every value below is the output of a live SQL query against ~/brain/brain.db at build time. Snapshot 2026-05-30T16:34Z.

Items
561,095
rows in items
Tags
496,262
edges in tags
Cross-item links
2,946,247
edges in links
Vector embeddings
587,817
rows in embeddings
FTS5 docs
875,657
indexed rows in items_fts
DB file size
2.84 GB
3,044,995,072 bytes · WAL mode
Show transcripts
5,456
TBG + WCN + MadBitcoins episodes
Canonical entities
310
profiles at /entities/

Breakdown by source category

Twitter — archive export + live handle scrapes twitter_archive, twitter_like_archive, twitter_tweet_header, twitter_follower, twitter_algo, twitter_tweets_*, twitter_replies_*
405,134
Facebook export (reactions, messages, posts, photos, comments) facebook_reaction, facebook_message, facebook_photo, facebook_post, facebook_comment
62,678
Screenshots (auto-captured + manual) screenshot
24,970
Google News digests gnews_tech, gnews_bitcoin, gnews_ai, …
21,303
Plex media + watch history plex_media (11,817) + plex_watch (3,335)
15,434
Lifecycle places (Google location) lifecycle_place
13,028
World Crypto Network episode bodies world_crypto_network
4,162
Mad Bitcoins episode bodies mad_bitcoins
812
The Bitcoin Group episode bodies the_bitcoin_group
482

Recent growth

last 24 hours
15,295
new items captured
last 7 days
38,928
~5,560 per day
last 30 days
135,290
~4,510 per day
most recent capture
2026-05-30
T16:30Z
WCN segment, hourly cron
§ 03 · how it works

How it works.

The store is a single SQLite database file running in WAL (Write-Ahead Logging) journal mode so that the hourly capture writers (voice memos, Twitter monitor, Plex sync, news fetchers) and the readers (the public site generator, the /brain/ask/ RAG endpoint, the entity-summary pipeline) don't block each other. Schema is denormalized on purpose: every captured thing — a tweet, a transcript segment, a screenshot, a news headline, a voice memo — lands as one row in items, tagged with a source string that identifies which collector wrote it.

Dedup is enforced by a partial UNIQUE index on (source, source_id) — collectors are free to re-run as often as they like; if the upstream ID has already been seen, the insert is silently skipped:

CREATE UNIQUE INDEX idx_items_src_sid
  ON items(source, source_id)
  WHERE source_id IS NOT NULL;

items

The core table: id (sha1), source, source_id, file_path, ISO8601 created_at + captured_at, raw_text, summary, details_json, status (captured / linked / refined), tier (working / cold). 561,095 rows.

tags

Many-to-one join from item_id to tag string. Drives the topical facets in /brain/'s sidebar — #bitcoin, #twitter-algo, etc. 496,262 edges.

links

Item-to-item edges with a kind (similar, quote, reply, …) and a numeric score. Built nightly by the graphify pass. 2.9 million edges.

embeddings

One float vector per item, used by /brain/ask/ for retrieval. 587,817 vectors covering ~95% of the writable surface.

items_fts

FTS5 virtual table over raw_text + summary. Built contentless (content='') for write throughput, so MATCH returns id only — the readable text has to be joined back from items.

capture_metadata · refine_events · contradictions

Per-collector run state, the queue of items that still need LLM summarization or entity-tagging, and a slot for facts that contradict other facts — the substrate for future reconciliation passes.

caveat · search

Because items_fts is contentless, you can't just SELECT * FROM items_fts WHERE items_fts MATCH '?' and read the snippet directly — FTS5 doesn't store the body. The public /brain/ask/ endpoint runs the LIKE-based search against items.raw_text for snippet display, then uses the FTS index only to rank candidate IDs. If you ever query the store yourself, do the same: FTS to narrow, items to read.

§ 04 · what's connected to it

What's connected to it.

Everything published at 1n2.org is a rendered window onto this one store. When a number changes here, those downstream surfaces change at their next build.

consumer · entities

Canonical entity database

310 people, projects, organizations, shows, and Curio cards — each one cross-referenced against items transcript hits and tweet appearances to build the "mentioned in" backlink list on its profile page.

consumer · wiki

Curio Wiki (140 articles)

Auto-expanded nightly by the wiki cron, which pulls quote candidates from items_fts and writes well-attributed paragraphs. Each article carries [[entity-slug]] references that resolve against the same entity store.

consumer · predictions

Predictions verified leaderboard

The 19-panelist accuracy chart joins each verified-or-falsified call back to the TBG transcript segment that contains the original quote. The correct/wrong/abstain totals come from a SQL pass over items tagged #prediction-verified.

consumer · timeline

TBG timeline

The 86-guest swimlane chart draws its canonical names from the entity store (which itself was built off the tbg-mirrors/guests.json + Whisper-variant correction log).

consumer · search

Unified site search

The site-wide search index ingests both the rendered HTML pages and a curated public-safe slice of items (no private capture, no health, no location, no DMs), so a query like "Saylor 2014" can return a MadBitcoins transcript hit alongside a wiki article.

consumer · ask

/brain/ask/ (RAG)

The retrieval-augmented Q&A endpoint embeds the user's question, runs an ANN search over the 587k vectors in embeddings, joins back to items.raw_text for the actual quote, and returns a cited answer with the transcript or tweet ID as the source.

§ 05 · provenance

Provenance.

Where the data came from. Every artifact path below is on disk at the host writing this page — the dates are file mtimes, not narrative.

Pipeline
Artifact on disk
Last run
Whisper transcription (yt-dlp + ggml-base.en) Show audio → segmented text. Corrected condition_on_previous_text=True re-pass that improves cross-segment coherence.
~/brain/transcripts/_ctx_prompt/ (5,580 transcript files)
2026-05-23
Twitter archive ingest Personal Twitter data export (tweets, likes, headers, followers, algo timeline) parsed into items.
~/brain/brain/capture/twitter_archive.py
2026-04-12
Live Twitter handle scrape (Playwright + Chrome profile) Attaches to an authenticated Chrome session and scrapes the timelines of WCN / Tone Vays / Ben Arc / Josh Scigala / thuntnet / TBG.
~/Sites/1n2.org/tweetster/scrape_with_chrome.py
2026-05-30
Voice memos transcription (every 10 min) Voice Memos.app capture → Whisper transcribe → items. Runs via cron every 10 minutes.
~/brain/brain/capture/voice_memos.py
cron
Curio Wiki cron (LLM expansion) Picks 2-3 wiki topics per day, draft via Claude, gate by 200-word quality check, deploy to droplet only on change.
~/Sites/1n2.org/content-cron.sh + wiki-expander.sh
2026-05-30 01:08
Entity database build Per-kind Python modules (people, shows, projects, organizations, cards, vegas) → rendered HTML profiles + JSON dump + backlinks.
~/Sites/1n2.org/_entities/build.py
2026-05-30 09:31
Google News digests (3x daily) RSS pull across Bitcoin / Tech / AI feeds, OG backfill, dedup against existing items.
~/brain/brain/capture/gnews_rss.py + gnews_og_backfill.py
cron
Plex weekly sync Watch history + library from three servers (tnas, tnas2, dolphin), Sunday 04:00.
~/brain/brain/capture/plex_history.py
2026-05-25 (last Sun)

The verification rule is the same as in every published report on this site: a number is only allowed on the page if the SQL that produced it can be re-run, and a date is only allowed if the artifact exists at the path cited. If you find a number here that won't reproduce, that's a bug and worth filing.

§ 06 · recent activity

Recently ingested.

Last 12 items written to items, by captured_at. The hourly capture loop is dominated right now by the WCN segment ingest from the live transcription supervisor — that's why the source column reads world_crypto_network over and over.

2026-05-30T16:30Z
world_crypto_network
Hi there and welcome to the World Crypto Network. I'm Jamie Nelson and today I am…
2026-05-30T16:29Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:26Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:23Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:20Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:13Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:10Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:08Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:03Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T16:00Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T15:56Z
world_crypto_network
(unprocessed segment — transcription pending)
2026-05-30T15:53Z
world_crypto_network
(unprocessed segment — transcription pending)

Live stream of the daily curated surface: /brain/. Per-source archives at /brain/source/. Today's window: /brain/today/.

§ appendix · querying

If you want to query it yourself.

The store is local to the host that writes this page; it's not exposed over the wire. But the public reports cite item IDs and the public mirror at /brain/data.json carries the curated, public-safe slice. The shape of a useful query is:

-- Find all references to a name across the corpus, newest first
SELECT i.id, i.source, substr(i.captured_at,1,16) AS when_,
       substr(i.raw_text, 1, 220) AS snippet
FROM items_fts f
JOIN items i ON i.id = f.id
WHERE items_fts MATCH 'saylor NEAR microstrategy'
ORDER BY i.captured_at DESC
LIMIT 50;

Or directly against items.raw_text when you need LIKE-style fragment search:

SELECT id, source, substr(captured_at,1,16), substr(raw_text,1,180)
FROM items
WHERE raw_text LIKE '%curio cards%'
  AND source IN ('the_bitcoin_group','world_crypto_network','mad_bitcoins')
ORDER BY captured_at DESC LIMIT 25;
brain.db · 1n2.org · about the knowledge store
snapshot 2026-05-30T16:34Z · entity profile · long-form: what is brain.db · live feed