← Back to Blog

Speed Up GPT-J-6B: From Minutes to Seconds with GGML Conversion

November 26, 2023

A guide on dramatically improving GPT-J-6B inference speed. I'll show you how I converted a Hugging Face model from its standard format to GGML, slashing wait times from minutes to seconds on my own machine. Covers the conversion process, memory needs, and running the optimized model.

Loading...

Related Posts

Teaching Taste to an Agent: 4,096 Classical Chinese Verses, 7 Composition Rules, and the Skill That Paints Them

On turning Han dynasty oracular poetry into AI-generated ink paintings by encoding art direction as a reusable agent skill. Also: why content filters hate ancient China, and what 'painterly language' means when the painter is a diffusion model.

February 14, 2026

The Junzi Hypothesis: What If Alignment Is a Seed, Not a Cage?

A raw brainstorm on whether initial model weights could predispose toward naturally aligned behavior - and why the Confucian junzi might be a better alignment target than 'helpful and harmless.'

January 25, 2026

Whisper Cantonese Transcription: Force zh, Not yue

Counter-intuitive finding: forcing Cantonese (yue) on Whisper causes decoder collapse. Force Chinese (zh) instead for coherent, searchable transcripts—even for Cantonese content.

January 13, 2026
← Back to Blog
Projects·Blog·About·8-Bit Oracle·Press