Building a Closed-Loop Content Engine

AI Disclaimer

27 April, 2026

Building a Closed-Loop Content Engine

In a few hours, I built a closed-loop content engine. Feed it a corpus of images, and it extracts the data model and generates new memes that preserve the aura, with a human-in-the-loop local webapp ranking system that makes each generation batch progressively better.

Scraping the corpus

I used Bright Data's API to pull roughly 9,000 images from a page. It cost me about $15, and I prepaid some OpenAI credits to experiment with the new gpt-image-2 for generation.

Extracting information

Each image runs through an extraction pass that pulls out its structural DNA into a strict JSON schema.

- `format` — image_macro, chat_screenshot, vintage_art_text, etc.
- `humor_type` — ironic, post_ironic, anti_humor, depressive_confession...
- `joke_mechanic` — bait_and_switch, tonal_mismatch, quote_with_unrelated_image...
- `remix_template` — a fill-in-the-blank pattern, e.g. `"{character} doing {normal_activity} in {unhinged_context}"`
- `why_it_works` — one or two plain sentences
- `subversion_score` — 1 (conventional) to 5 (radical)

It also captures OCR'd text, elements paired with their role in the joke, and cultural references. The full schema has ~80 enum values across format/humor/mechanic.

A real extraction from a niche meme page looks like:

{
  "format": "vintage_art_recontext",
  "humor_type": "depressive_confession",
  "joke_mechanic": "quote_with_unrelated_image",
  "remix_template": "{renaissance_painting} captioned with {modern_dysfunction}",
  "why_it_works": "Prestige imagery validates a low-status confession.",
  "subversion_score": 3
}

The loop

Extracted data across the corpus gets compiled into a voice profile — a markdown document that is the system prompt for every run. Recurring themes, preferred joke mechanics with account-specific flavor, caption grammar quirks with examples, cultural references and visual register.

Generation: pick a random meme as source, generate N remix candidates conditioned on the voice profile, score them with an LLM judge, send the winner to image generation.

Then humans rate the output in a local webapp. The core rubric evolved by watching failure modes:

Dimension Scale What it catches
Caption quality 1-5 Standalone funny?
Overall 1-10 Would this fit on the account?
Dud source flag Source wasn't worth remixing
Natural language text My notes on the result

The engine reads the human labels and mutates the judge prompt based on score distributions — penalizing the failure modes that keep recurring, reinforcing what's working. The judge gets sharper. The system converges toward the voice.

The insights and hope

It's quite an experience to systematically curate and analyze content; it makes you pause and wonder about the societal effects of AI-generated content on social media. Personally, I hope there could be some future where content are intercepted via a local proxy or a local LLM in the browser to filter the out the low effort junk. A brain rot shield. Time will tell!