# The 6 Episodes Missing from `by_episode` — Audit

**Date:** 2026-05-28
**Status:** Investigation complete. All 6 episodes have **complete local audio + functional transcripts**. The missing-from-`by_episode` status is a guest-extractor parser bug, not an audio recovery problem.

## TL;DR

| | Count |
|---|---|
| Episodes missing from `guests.json::by_episode` | **6** |
| Local audio present + playable | **6 / 6** |
| Local transcripts present + functional | **5 / 6** (one broken — Whisper-music-loop on #455) |
| External-source coverage (Spreaker) | **6 / 6** |
| Recovery action taken | Downloaded Spreaker's auto-generated transcript for #455 (the broken one) |
| Recovery still needed | **None** for audio. Re-run guest-extractor on these 6 descriptions to populate `by_episode`. |

## The 6 missing episodes

| # | Date | Title | Description guests |
|---|---|---|---|
| **#1** | 2013-10-18 | *Walmart and Bitcoin, Amazon, Bitcoin Trust, Mining* | Andreas Antonopoulos, Derrick J. Freeman, Davi Barker, MadBitcoins |
| **#3** | 2013-11-01 | *Bitcoin Anonymity, Zerocoin, Altcoins & Bitcoin ATMs* | Andreas, Adam B. Levine, Davi Barker, MadBitcoins |
| **#4** | 2013-11-09 | *$300, Silk Road 2.0, Selfish Mining, Bitcoin Jokes* | Andreas, Adam B. Levine, Davi Barker, MadBitcoins |
| **#455** | 2025-05-17 | *Coinbase LEAK! — Arizona — Steak n Shake — More Companies* | Juan S Galt, Dan Eve, Robert Allen, Thomas Hunt |
| **#467** | 2025-09-13 | *Debate Recap — Gen Alpha — Trez Sink — $1M Parabola?* | Jed (OpenCryptoX), Victoria Jones, Thomas Hunt |
| **#477** | 2026-01-02 | *90K Returns — ATM Crime — Crime Pays — Happy Holidays* | Thomas Hunt (solo episode) |

**Pattern:** 3 from the very beginning (#1, #3, #4, where description format was non-standard) + 3 recent (#455, #467, #477, where description format diverged from the standard `Featuring…` block — they use `FEATURING:` in all-caps with a colon, or non-standard intros like `"Secret business plan: ..."` for #455).

## Local audio coverage (all 6 present, all playable)

Audio under `/Volumes/20TB-FOUND/AI-DATA/backups/bitcoingroup-audio/audio/`. Verified via `ffprobe`:

| # | File | Size | Duration |
|---|---|---|---|
| #1 | `TBG-001.m4a` | 38.4 MB | 39:44 |
| #3 | `TBG-003.m4a` | 27.6 MB | 28:35 |
| #4 | `TBG-004.m4a` | 22.7 MB | 23:30 |
| #455 | `TBG-455.m4a` | 97.3 MB | 1:40:36 |
| #467 | `TBG-467.m4a` | 80.4 MB | 1:22:49 |
| #477 | `TBG-477.m4a` | 15.4 MB | 15:54 (short — #477 turned out to be a Thomas-solo episode, per Spreaker + YT description) |

No 0-byte or corrupt files. Folder still has all 482 .m4a files as the earlier audit reported.

## Local transcript coverage (5 of 6 functional)

Transcripts under `~/Sites/1n2.org/tbg-mirrors/transcripts/TBG-NNN.html`. Body-text word counts:

| # | Transcript | Status |
|---|---|---|
| #1 | 6,443 words | ✓ good |
| #3 | 5,012 words | ✓ good |
| #4 | 3,674 words | ✓ good |
| **#455** | **538 words, all "music music music…"** | ❌ **BROKEN** — Whisper detected music throughout rather than speech. Likely the audio leads with a long music intro and Whisper's VAD never engaged. |
| #467 | 14,274 words | ✓ good |
| #477 | 2,259 words | ✓ good (matches 15min runtime) |

**The broken #455 transcript is what stopped that episode from getting guests in `by_episode`.** The other 5 had transcripts but the guest-extractor populates `by_episode` from `yt-descriptions.json` (not transcripts), so transcript quality isn't the cause for them — see "extractor bug" section below.

## External-source coverage — full redundancy on Spreaker

The show's canonical podcast distribution is via **Spreaker** under "Forbidden Knowledge Network / The World Crypto Network Podcast":
- iTunes id `825708806` → resolves to `https://www.spreaker.com/show/3478703/episodes/feed`
- Feed contains **1000 items**, 317 of them parseable as TBG episode numbers (range #1–#496), 679 are clips/recaps/other WCN content
- All 6 missing episodes are present, with podcast-quality MP3 enclosures within 1% of local file sizes

| # | Spreaker title | MP3 size | Duration match |
|---|---|---|---|
| #1 | *The Bitcoin Group #1 (Live) - Walmart and Bitcoin...* | 34.5 MB | local 39:44 vs Spreaker 35:59 — **different cut** (local is ~4min longer; likely the original YouTube live version, Spreaker is the edited rebroadcast) |
| #3 | *The Bitcoin Group #3 (Live) - Bitcoin Anonymity...* | 27.5 MB | identical (1715s) |
| #4 | *The Bitcoin Group #4 (Live) - $300 - Silk Road 2.0...* | 22.6 MB | identical (1410s) |
| #455 | *The Bitcoin Group #455 - Coinbase LEAK!* | 96.6 MB | identical (6036s) |
| #467 | *The Bitcoin Group #467 - Debate Recap...* | 79.5 MB | identical (~2s diff) |
| #477 | *The Bitcoin Group #477 - 90K Returns...* | 15.3 MB | identical (~4s diff) |

For #455, #467, #477 (the post-2025 episodes), Spreaker ALSO publishes **auto-generated transcripts** (SRT/VTT/TXT formats) via `transcription.spreaker.com/starship/…`. The older 2013 episodes don't have Spreaker transcripts (the feature only started around 2023).

### Other external sources checked

- **SoundCloud** — only one URL anywhere in the corpus, and it's a guest appearance by Thomas Hunt on someone else's show. No TBG SoundCloud presence.
- **Stitcher** — no URLs found.
- **Libsyn / Anchor / Buzzsprout / Megaphone** — no URLs found.
- **Internet Archive / Wayback Machine** — not probed; redundant given Spreaker has full coverage.
- **YouTube** — primary source; all 6 still have at least one of {WCN, Mad Bitcoins} channel video IDs in `data.json`. (Re-downloading from YouTube is possible if Spreaker ever went down.)

### Episodes with duplicate Spreaker entries (Spanish re-uploads)

Spotted during the scan — not relevant to our 6, but logging for awareness: **TBG #209, #246, #247, #248** have 2 Spreaker entries each. Inspection of the audio URLs suggests these are Spanish-translation re-uploads (`el_grupo_bitcoin_…` slugs alongside `the_bitcoin_group_…` slugs).

## Recovery actions taken

Saved to `/Volumes/20TB-FOUND/AI-DATA/backups/bitcoingroup-audio/recovered/` (does not overwrite primary `audio/` folder):

- `TBG-455.spreaker.txt` — **18,663 words** of real transcript content (replaces the broken 538-word local one). Opening line: *"The Bitcoin Group, the American original... We'd like to welcome our panelists, Dan Eve, the Cr…"* (confirms Dan Eve, Juan Galt, Robert Allen on the panel per the YT description).
- `TBG-467.spreaker.txt` — 14,804 words (redundant backup; local already works).
- `TBG-477.spreaker.txt` — 2,365 words (redundant backup; local already works).

The 3 MP3s for #1/#3/#4 from Spreaker were NOT downloaded — local m4a already plays cleanly and is the same or slightly different cut.

## Why these 6 didn't make it into `by_episode` (the actual bug)

This is a **guest-extractor parser bug**, not a data-availability problem. Verified by reading each YT description:

- **#1**: leads with `"***** UPDATE: NEW EDITED VERSION NOW AVAILABLE..."` then `"Featuring Andreas M. Antonopoulos (http://..."` on a single line. The extractor probably looks for line-by-line `Name (url)` entries; this one-line "Featuring X, Y, Z, and W" format isn't matched.
- **#3, #4**: standard `Featuring…\n<name> (url)\n…` format. Should have worked. Both list `Adam B. Levine` who's not a recognized cohost; the extractor may have whitelisted only known WCN panelists for these early episodes.
- **#455**: starts with `"Secret business plan: Buy and Hold Bitcoin..."` then `"FEATURING:"` in all-caps with a colon. Likely the extractor matches `"Featuring…"` (with the ellipsis character `…`) and doesn't match the all-caps colon form.
- **#467, #477**: same all-caps `FEATURING:` format as #455. Same parser miss.

## Recommended next steps

1. **Patch the guest-extractor** to handle the three description-format variants:
   - one-line "Featuring X, Y, Z, and W" (covers #1)
   - `FEATURING:` all-caps with colon (covers #455, #467, #477)
   - whitelist relaxation for early episodes that include outside-the-show guests like Adam B. Levine (covers #3, #4)
2. **Re-run the extractor** against `yt-descriptions.json` to populate `by_episode` for these 6. The full panel list per episode is already in the description for each — no human review needed.
3. **For #455 specifically**: replace the local broken transcript with the Spreaker-recovered text under `tbg-timeline/_recovered/`. Re-run any downstream consumers (predictions, narratives, signoff-tracker) that fed off the local transcript.
4. **Optional**: ingest the Spreaker SRT files for #455/#467/#477 — they include timecodes which the local Whisper-HTML transcripts don't preserve. Useful for the predictions tracker and clip-engine.

## Audit scripts (in `tbg-timeline/`)

- `audit_external.py` — parses the Spreaker RSS, matches by TBG episode #, reports coverage
- (already-committed) `scan_hosts.py`, `scan_hosts2.py` from the absent-host research — also useful here as transcript checkers

---

*Investigation only — no timeline data was modified. The 6 episodes will join `by_episode` once the extractor patch lands.*
