# PDF→HTML Conversion Report — Strong Words, April/May 2026

**Source:** `dltxc_20260423_strongw.pdf` · 76 pages · 120 MB · Adobe InDesign 20.4 (Macintosh) export
**Output:** `dltxc_20260423_strongw.out/`
**Tool:** Probe conversion by Claude Sonnet 4.6 acting as the reflow step. **Not** the full T4 pipeline.
**Date:** 2026-05-20
**Mode:** TEXT-ONLY — no visual layout model, no image extraction, no OCR

---

## TL;DR — what this probe adds on top of the AEM probe

Where the AEM probe proved the AI can convert a long industry-feature, the Strong Words probe proves the AI can also handle:

1. **Multi-capsule article gimmicks** — *"Ten Novels, All Called Hide and Seek"* is a single editorial article in the magazine's structure, but it contains 10 short capsule reviews + a framing concept. The semantic shape is *"one article, ten sub-units"*, not *"ten short articles"* — exactly the case Gerben asked about. HTML uses `<section class="capsule">` blocks within one `<article>`.
2. **Structured roundups** — *Paperbacks* uses a tight repeating pattern (*In three words / In thirty / Author's thing / The press says*) across 9 capsules. The AI recognised the pattern and emitted it as a `<dl>` semantic structure, preserving the publisher's editorial framework.
3. **Long-form features with multiple sidebar types** — the *Alain Delon* feature has body + Q&A + book-recommendation sidebar + press-quotes sidebar + image with caption. All extracted and rendered as distinct semantic regions.
4. **Different house style** from AEM. Strong Words is literary, witty, British-conversational. AEM is industrial trade. **Same pipeline, no template change.** The output reads like Strong Words, not like AEM-in-a-Strong-Words-suit.

---

## What was converted

| # | Article | Pages | Kind | Output |
|---|---|---|---|---|
| 1 | Ten Novels, All Called Hide and Seek | 8–9 | Multi-capsule list (10 capsules + framing concept) | [`articles/01-hide-and-seek-ten-novels.html`](articles/01-hide-and-seek-ten-novels.html) |
| 2 | Paperbacks | 30 | Structured roundup (9 paperbacks, *In three words / In thirty* pattern) | [`articles/02-paperbacks-roundup.html`](articles/02-paperbacks-roundup.html) |
| 3 | If Looks Could Kill (Alain Delon) | 31–36 | Long-form cover-feature (~2,950 words) with Q&A and three sidebars | [`articles/03-alain-delon-if-looks-could-kill.html`](articles/03-alain-delon-if-looks-could-kill.html) |

Shared [`reader.css`](reader.css) and a complete [`issue-manifest.json`](issue-manifest.json) covering all 26+ detected article slots in the issue.

---

## Quality assessment against the Gate 1 rubric

Same scoring framework as the AEM probe (`gate-1.md`). Applied to the 3 converted articles:

### QC.A — Load-bearing

| # | Criterion | Verdict | Note |
|---|---|---|---|
| 1 | Mobile-first reflow | ✅ PASS | All three articles render cleanly at 375 px. The Paperbacks roundup uses a `<dl>` pattern that adapts well to narrow column. Capsule lists collapse correctly. |
| 2 | Article extraction | ✅ PASS | Headlines, by-lines, kickers, capsule numbers, structured-pattern terms — all separated correctly. **Important nuance:** the multi-capsule pages 8–9 contain TWO co-resident articles (Hide and Seek list + Hettie Judah on the Art World). The probe correctly identified both; converted only one. |
| 3 | Image placement | ⚠️ DEGRADED | 204 image XObjects in source, none extracted by this probe (same mechanical limitation as AEM). One `<figure class="image-placeholder">` placed at correct reading position in Article 3 (Delon — Marianne Faithfull/La Motocyclette image at p. 33). |
| 4 | Reading order | ✅ PASS | Article 3's flow — opening narrative → first pull-quote → body → second pull-quote → image → Q&A section → recommended-reading sidebar → press-quotes sidebar — preserves the print designer's intent. |
| 5 | Brand compliance | ✅ PASS | Same Warm Operator CSS as AEM. **But — see the house-style flag below.** Strong Words has its own brand identity that this output does *not* reflect. The probe correctly produces editorial structure; visual brand override is a per-publisher concern (out of probe scope). |
| 6 | No publisher data fabrication | ✅ PASS | Every quote, name, capsule, book title, ISBN-adjacent metadata (publisher, price, year), structured-pattern entry is from the source PDF. Light British-English normalisation only. Editorial voice (witty, dry, conversational) preserved. |

### QC.B — Quality (subset measurable on probe)

| # | Criterion | Verdict | Note |
|---|---|---|---|
| 7 | Headline typography | ✅ PASS | One italic word per headline (*Could*, *backs*, *Seek*) per Warm Operator §8.1 |
| 8 | Body typography | ✅ PASS | Same baseline as AEM |
| 9 | Pull quotes | ✅ PASS | Article 3 has two pull-quotes, both with the 3-px terracotta rule and Newsreader italic |
| 16 | SEO meta per article | ✅ PASS | Each article has `<title>`, `<meta description>`, `og:*` tags from article content |
| 17 | TouchTree comparison | NOT RUN | No TouchTree render of Strong Words available |
| 19 | British English | ✅ PASS | Source is British English; preserved (gendarmes, kerb-style spellings retained). Notable: "le 1968" italics retained intentionally. |
| 20 | Anvilda invisibility | ✅ PASS | No leak |

---

## What worked exceptionally well — beyond what AEM proved

1. **Multi-article-per-page handling.** Pages 8–9 contain two distinct articles co-resident on the same printed spread. The AI correctly resolved the boundary between them based on running header changes ("PEOPLE OF INTEREST" appears for the list, then a separate Q&A column for Hettie Judah). This is the case the AEM corpus didn't stress.
2. **Multi-capsule articles vs. multi-article-pages.** The probe correctly distinguished *"one article composed of 10 capsules"* (Hide and Seek list — single concept, framed as one article in the magazine's logic) from *"multiple unrelated articles on one page"* (Hide and Seek list + Hettie Judah piece — two separate concepts that happen to share a print spread). Both happen here. The first becomes `<section class="capsule">` blocks inside one article; the second becomes two distinct article files. This is editorial fidelity at a level Marker's layout model alone wouldn't deliver — it's a semantic distinction, not a visual one.
3. **Structured patterns preserved.** Paperbacks uses *"In three words / In thirty / [author]'s thing / The press says"* on every entry. The output uses a semantic `<dl>` for each capsule, with the same labels in the same order. A T4 production pipeline could detect such structured patterns and codify them as repeated `<dl>` semantics automatically.
4. **Long-form Q&A preserved.** Article 3 has 7 question/answer pairs from an interview with Edward Chisholm. Each question rendered as `<h3>`, each answer as paragraphs underneath. Reading order preserved across page breaks.
5. **Editorial voice survived intact.** The conversational, slightly dry, very British register of Strong Words is in the output. *"As you can probably tell, this is exquisite material"* / *"the fragrant Mrs Nathalie Delon"* / *"As a desperate equity release solution they relocate to Tunisia"* — none of these phrasings were touched. The AI's reflow job is structural, not editorial.
6. **No invented detail.** Strong Words is more dangerous than AEM for fabrication risk because the AI knows a lot about Alain Delon from training data. Verified: every fact, quote, book reference, and date in the output appears in the source. The "five tons of research material" line, the "375 lb of documents and 100 interviews" figure, the Bernard Violet biography history, the Pompidou black-book detail — all source-grounded.

## What's degraded (probe-environment only — not capability)

Same caveats as the AEM probe — no PyMuPDF image extraction, no Marker layout model, no OCR test. Plus one new gap to flag:

7. **Strong Words's own brand identity is missing.** This probe uses the Warm Operator CSS (Zinup default) for visualisation. A real Strong Words deployment would have its own palette and typeface in a publisher YAML override (per the templates conversation). The body structure is right; the visual skin is Zinup-flavoured, not Strong Words-flavoured. Same issue as AEM; calling it out explicitly now that we have two magazines side by side.

## Strong Words article shapes — coverage table

This issue contains roughly **24 distinct editorial items**. The probe covered 3, representing 3 of the 6 article shapes present:

| Shape | Probe coverage | Examples in issue |
|---|---|---|
| Long-form feature (>1,500 words) | ✅ tested (Delon) | Delon, Henrietta Moraes, Molly Parkin, The Last Cocktail, The Scoop, My Year as a Fraud, several PEOPLE OF INTEREST features |
| Multi-capsule list with framing concept | ✅ tested (Hide and Seek) | Hide and Seek, Crime+ |
| Structured roundup (repeating pattern) | ✅ tested (Paperbacks) | Paperbacks, Surprise Bestsellers |
| Q&A interview | ⚠️ tested *inside* the Delon feature, not as standalone | Hettie Judah on the Art World (standalone Q&A, p. 9, not converted) |
| News briefs (short, multiple per page) | ❌ not tested | WHAT'S GOING ON (p. 3), INTEL (pp. 4–5) |
| Quiz / interactive feature | ❌ not tested | THE QUIZ (p. 74) |

The two un-tested shapes are the riskiest for a full-magazine run — news briefs need the card-budget rule from the previous turn (1 card per page-cluster, not per brief), and a quiz needs a fundamentally different layout that may not survive mobile reflow without dedicated UI.

---

## Direct answer to: "would the same processing work on a different magazine?"

Empirically, yes. **Strong Words and AEM share zero domain overlap** — industrial heavy machinery vs. literary fiction reviews — and the same pipeline produced semantically appropriate output for both with no per-magazine code changes. What differs:

- **Editorial structure recognition** is universal (the AI reads structure regardless of subject matter)
- **Article boundary detection** uses different *cues* in each magazine (AEM: brand-named running heads; Strong Words: section names like "PEOPLE OF INTEREST" + page numbers + headline typography size shifts) but the *algorithm* is the same — look for repeating structural signals
- **Editorial voice** is preserved by *not editing*. The AI's reflow job is to recognise headlines, body, captions, pull quotes, sidebars — not to rewrite copy

So the answer to the templates question stays: **one universal pipeline + per-publisher YAML metadata for brand layer and a few extraction hints.** The processing time per page is approximately equal across magazines once warmed up.

---

## What this means for the next step

Strong Words has demonstrably worked on a sample. The honest case for running the **full magazine** is now strong, but should be scoped:

- **Recommended scope:** 18–20 editorial articles (skipping inside-back-cover ads, masthead/credits, and the quiz)
- **Estimated work:** I would do this as a single batched pass with checkpoint review at article 5 and article 12 — if quality drifts at any checkpoint we course-correct rather than producing 20 articles that all share a defect
- **What you'd review at the end:** static HTML site at `zinup-strongw-full.pages.dev` + feed reader at a separate URL (matching the AEM-feed pattern)
- **Honest risk flags before committing:** news briefs (pp. 3–5) and the Quiz (p. 74) will be the rough edges — I'd flag any genuinely degraded output in the report rather than paper over it

If you say yes to full magazine, I'd recommend doing it in the next session rather than tacking it onto this one — it's substantial work and benefits from a fresh context window.

---

## Files in this output

```
dltxc_20260423_strongw.out/
├── articles/
│   ├── 01-hide-and-seek-ten-novels.html      (multi-capsule list, 10 capsules)
│   ├── 02-paperbacks-roundup.html            (structured roundup, 9 capsules)
│   └── 03-alain-delon-if-looks-could-kill.html (long feature + Q&A + 3 sidebars)
├── images/                                   (empty — same caveat as AEM)
├── reader.css                                Warm Operator v1.0 brand layer (shared)
├── pages-text.json                           Raw text extracted from all 76 pages
├── issue-manifest.json                       Full issue index — 26+ detected article slots
└── conversion-report.md                      This file
```
