Two Pages and Eleven Words: How Pagefind Exposed a Years-Long Crawlability Bug

Two pages. Eleven words.

That's what Pagefind indexed the first time I ran it against 8-Bit Oracle's production build. Two pages. Eleven words. Out of 256 pages of hexagram commentary, FAQ content, and divination tools.

I'd been building this site for over a year. It had structured data, OpenGraph images, a sitemap. It looked fine in a browser. Google could find the homepage. Everything seemed indexed.

I'd actually noticed the crawlability issues before — pages not appearing in search results, content missing from Google's cached versions. But I'd attributed it to a fundamental limitation of Next.js App Router's RSC streaming architecture. Server components stream content as RSC payload in <script> tags, and I assumed crawlers simply couldn't parse that format. It felt like a framework-level trade-off I'd have to live with.

It wasn't. The framework was fine. My auth gate was the problem.

The invisible site

Adding Pagefind — a static site search indexer — was supposed to be a small quality-of-life feature. Let users search hexagrams, commentary, FAQ answers. Wire it up, point it at the build output, done.

Instead it became a diagnostic tool. Pagefind indexes what's actually in the HTML. Not what React hydrates client-side. Not what's in RSC payload <script> tags. The HTML. And the HTML was empty.

The root cause was a pattern I'd written early in the project and never questioned: an auth gate in UnifiedAuthProvider that checked isHydrated from a Zustand store. During SSG, the store never hydrates. isHydrated stays false. The provider returns null. A wrapping <Suspense> boundary catches the null and renders a fallback:

<div className="min-h-screen" />

Every page. Every hexagram. Every piece of commentary I'd written, every FAQ answer, every variant interpretation across four languages. All of it, at the SSG layer, was a single empty div.

The content existed — buried in RSC payload scripts that browsers execute on hydration. Humans visiting the site never noticed because hydration happened fast. But crawlers, search engines, and now Pagefind saw exactly what was in the HTML: nothing.

Two layers of invisibility

Fixing the auth gate was necessary but not sufficient. Removing the Suspense/isHydrated pattern meant SSG could render the page structure — but several pages still rendered their substantive content only through client-side interactive components.

The FAQ page? All Q&A pairs lived inside a <FAQAccordion> client component. SSG HTML: empty.

Plum Blossom divination? Entirely a <PlumBlossomClient> component. SSG HTML: a page title and nothing else.

Hexagram commentary variants — the Sheldrake, Foundation, Mucha interpretations that form the core intellectual content of the site? Rendered inside <CommentaryPerspectives>, a tabbed client component. SSG HTML: tabs with no text.

The site had two layers of content invisibility:

The auth gate — blocking all SSG rendering at the provider level
Client-only content — interactive components that rendered text only after hydration

The fix

Three changes, each addressing a different facet of the problem:

Remove the auth gate. The if (!isHydrated) return null pattern in UnifiedAuthProvider was protecting against a flash of unauthenticated content. But the cost — making every page invisible to crawlers — was catastrophically disproportionate. The fix: remove the gate and the wrapping Suspense boundary, let SSG render the full component tree. The auth context already provides safe defaults (user: null, isLoading: true) — components that depend on auth state already handled that gracefully.

Generate static params for content pages. Several content pages — about, FAQ, features, privacy, terms, patterns, plum blossom — were missing generateStaticParams, meaning they weren't pre-rendered at build time. The hexagram pages already had static params, but their content was invisible due to the auth gate. Adding generateStaticParams across all content pages ensured Pagefind had a complete set of HTML files to index.

Add server-rendered content alongside interactive components. For each client-only component, I added a hidden <div className="sr-only"> containing the same text, server-rendered. FAQ questions and answers from the schema data. Plum Blossom descriptions from i18n translations. All hexagram variant commentaries with data-pagefind-weight="10" to boost search relevance.

This sr-only pattern is worth pausing on. It's the same technique used for screen reader accessibility — content that's present in the DOM and available to assistive technology (and indexers) but visually hidden. The interactive components remain the user-facing experience. The hidden divs ensure the content exists at the HTML level.

11 → 6,000 → 50,000

The fix came in stages, and Pagefind tracked the score.

After removing the auth gate and adding static params to content pages: 6,121 words across 256 pages. A 500x improvement over eleven words. The page structure was finally visible — titles, navigation, metadata. But still wrong — because the hexagram commentary, the bulk of the site's content, was still locked inside <CommentaryPerspectives> tabs.

Each hexagram has extensive commentary: judgment, image, practical integration, digital artifact, historical context, citations. Multiple variant interpretations — Sheldrake, Foundation, Mucha. Across 64 hexagrams, 4 locales, and several variants each, that's tens of thousands of words of carefully written content. All of it rendered by a client component. All of it invisible at build time.

The single change that moved the needle from 6k to 50k was adding sr-only server-rendered <DFWCommentary> alongside the interactive tabs:

<div data-pagefind-weight="10" className="sr-only" aria-hidden="true">
  {variants.map((v, i) => (
    <DFWCommentary key={v.slug || i} data={v.data} locale={locale} />
  ))}
</div>

DFWCommentary has no "use client" directive — when rendered directly in a server component page, it outputs HTML at build time. The interactive <CommentaryPerspectives> tabs remain the user-facing experience. But now the same text exists in the DOM for indexers, and Pagefind found all of it.

50,080 words across 292 pages. The site's actual content, finally visible at the HTML layer.

I'd been building in public for over a year with a fundamental crawlability defect, misdiagnosing it as a framework limitation rather than a bug in my own code. I was testing the way humans use the site — in a browser, after hydration. The SSG output, the thing search engines and indexers actually see, was a shell.

The lesson

There's a gap between "works in a browser" and "exists in HTML." Modern frameworks make this gap easy to create and hard to detect. Client components, Suspense boundaries, auth gates, hydration patterns — each one is a potential point where content becomes invisible to anything that reads raw HTML.

Pagefind found the bug because it's honest. It indexes what's there. If your build output is empty divs, it tells you: two pages, eleven words.

If you're building a content-heavy site with Next.js or any SSR/SSG framework, run a static indexer against your build output. Not to add search (though that's nice). To verify that the content you think you're publishing is actually there.

The middleware also needed a fix — Pagefind's search index files were being caught by the locale redirect middleware and returning 404s. Sometimes the bug is architectural. Sometimes it's a regex. Usually it's both.