Building a Closed-Loop Content Engine
AI Disclaimer27 April, 2026
I built in a few hours a closed-loop template meme remixing engine. Feed it a corpus of images, it extracts the custom data model, and generates new memes that preserve the aura, with a human-in-the-loop local webapp ranking system that makes each generation progressively better.
Scraping the corpus
I used Bright Data's API to pull roughly 9,000 images from a specific public account. Cost me about $15 and I prepaid some OpenAI credits to experiment with the new gpt-image-2 for generation.
Extracting information
Each image runs through an extraction pass that pulls out its structural DNA into a strict JSON schema. The fields that matter:
- `format` — image_macro, chat_screenshot, vintage_art_text, etc.
- `humor_type` — ironic, post_ironic, anti_humor, depressive_confession...
- `joke_mechanic` — bait_and_switch, tonal_mismatch, quote_with_unrelated_image...
- `remix_template` — a fill-in-the-blank pattern, e.g. `"{character} doing {normal_activity} in {unhinged_context}"`
- `why_it_works` — one or two plain sentences
- `subversion_score` — 1 (conventional) to 5 (radical)
Plus OCR'd text, visual elements paired with their role in the joke, and cultural references. The full schema has ~80 enum values across format/humor/mechanic.
A real extraction from a niche meme page looks like:
{
"format": "vintage_art_recontext",
"humor_type": "depressive_confession",
"joke_mechanic": "quote_with_unrelated_image",
"remix_template": "{renaissance_painting} captioned with {modern_dysfunction}",
"why_it_works": "Prestige imagery validates a low-status confession.",
"subversion_score": 3
}
The loop
Extracted DNA across the corpus gets compiled into a voice profile — a document the generator reads before every run. Recurring themes, preferred joke mechanics with account-specific flavor, caption grammar quirks with verbatim examples, taboo topology (what it crosses vs. avoids), cultural anchors, visual register.
Generation: pick a random meme as source, generate N remix candidates conditioned on the voice profile, score them with an LLM judge, send the winner to image generation.
Then humans rate the output in a local webapp. The core rubric evolved by watching failure modes:
| Dimension | Scale | What it catches |
|---|---|---|
| Caption quality | 1-5 | Standalone funny? |
| Overall | 1-10 | Would this fit on the account? |
| Dud source | flag | Source wasn't worth remixing |
| Natural language | text | My notes on the result |
The engine reads the human labels and mutates the judge prompt based on score distributions — penalizing the failure modes that keep recurring, reinforcing what's working. The judge gets sharper. The system converges toward the voice.
An afterthought that the system could be exposed to its own engagement analytics and that data as a new metric for success in the rubric as a scheduled job.
Finding use for Gemini CLI
Better models produce better memes. I didn't want to burn API credits.
I have a barely used Gemini subscription so I asked Claude to add support for a child process of the Gemini CLI with verbose JSON flags, piping in large text prompts. It's slow, but sufficient.
The insights
It's quite an experience to systematically curate and analyze content, it makes you pause and wonder about the societal effects of image diffusion models and LLMs producing and distributing content at a massive scale for any topic. It's quite dangerous when it get's spread on social networks. Now the only bottleneck is the distribution.
Hope?
Personally I imagine there could be some future of interception of network requests via a local proxy middleware and filter the content by a local fast LLM. A new brain rot shield. Of course this idea will only used by power users and most consumers will unfortunately be consuming real slop but time will tell what the future of content holds...
