diff --git a/AI_HYBRID_GENERATION_WORKFLOW.md b/AI_HYBRID_GENERATION_WORKFLOW.md new file mode 100644 index 0000000..9654fd7 --- /dev/null +++ b/AI_HYBRID_GENERATION_WORKFLOW.md @@ -0,0 +1,262 @@ +# AI Hybrid Generation Workflow + +## Goal + +Allow admins to generate either: + +- a single AI question +- or multiple AI questions in one run + +without losing control over the quality of each generated item. + +The system should support both precision workflows and exploration workflows. + +## Core Principle + +Generation request and generated items must be treated as different things. + +That means: + +1. One admin action creates a **generation run** +2. One generation run can produce one or many **generated variants** +3. Each generated variant remains an individually reviewable item + +This is the cleanest way to support both single and bulk generation. + +## Why This Is Better + +Admins do not always have the same intent. + +### Precision mode + +The admin wants: + +- one strong output +- high control +- easy review + +This is best served by single generation. + +### Exploration mode + +The admin wants: + +- multiple candidates +- idea exploration +- later curation + +This is best served by bulk generation. + +A rigid one-size-fits-all generation flow is worse for both modes. + +## Recommended Model + +### Parent / Basis Question + +The canonical source or promoted basis item. + +### Generation Run + +Represents one AI request. + +Suggested fields: + +- parent question id +- source question version id +- target difficulty +- requested count +- model +- prompt version +- created by +- created at +- optional operator notes + +### Generated Variant + +Each output item from the generation run. + +Suggested fields: + +- generation run id +- parent question id +- source version id +- difficulty +- status +- stem +- options +- answer +- explanation +- review notes +- reviewer +- reviewed at + +## Required Lifecycle + +Each generated item must be individually manageable. + +Suggested statuses: + +- `draft` +- `approved` +- `rejected` +- `archived` +- `stale` + +This is required even when a run generates many items at once. + +## UX Principle + +Do not treat bulk output as one indivisible package. + +Bulk generation should be: + +- one producer action +- many independently reviewable outputs + +This means the admin can: + +- approve 2 items +- reject 1 item +- archive 1 item +- regenerate only one item + +from the same generation run. + +## Recommended Admin UX + +Inside the parent question page: + +### Generation Form + +- target difficulty +- model +- count +- optional notes or style instructions +- generate button + +### Guidance Text + +The system should guide, not over-restrict. + +Recommended copy: + +- “You can generate one or many variants in one run.” +- “Recommended: 1–3 variants per run for better consistency and easier review.” +- “Larger runs may reduce cost per item but increase overlap, correlated mistakes, and review effort.” + +### Result View + +After generation, show each item separately with actions: + +- approve +- reject +- archive +- edit +- regenerate this item +- compare with parent + +## Recommendation vs Restriction + +The product should not hard-limit normal admin workflow at very low counts like 2 or 3. + +Instead: + +- provide recommendation text in the UI +- allow single and bulk generation +- preserve admin control + +However, the backend should still apply a technical safety ceiling. + +Example: + +- no UX hard limit at 2 or 3 +- backend safety limit at something like 20 or 50 + +This is not a workflow restriction. It is abuse and cost protection. + +## Recommended Count Guidance + +### 1 item + +Best for: + +- high quality +- careful review +- final production-ready generation + +### 2–3 items + +Best default. + +Good balance of: + +- cost +- quality +- review effort + +### 4–8 items + +Useful for exploration. + +Tradeoff: + +- more candidate variety +- heavier review burden +- higher chance of repeated structure and correlated errors + +### More than 8 items + +Should still be allowed if product policy permits, but treated as exploration mode. + +The UI should warn that: + +- review effort increases +- quality consistency may drop +- variants may become repetitive + +## Cost and Quality Insight + +Bulk generation can reduce cost per item because: + +- parent context is sent once +- prompt overhead is amortized + +But quality risk increases because: + +- errors can repeat across all outputs in a run +- structure can become too similar +- weaker prompts can produce multiple low-quality siblings + +So the best product design is: + +- permit bulk +- recommend lower counts +- review outputs individually + +## Recommended Policy + +1. Allow both single and bulk generation +2. Keep generated items reviewable one-by-one +3. Store lineage through generation run metadata +4. Provide UI recommendations instead of rigid low hard caps +5. Enforce only a high backend safety cap +6. Keep parent question as the operational center of the workflow + +## Product Direction + +The ideal flow is: + +1. Admin opens parent question +2. Admin chooses difficulty, model, and count +3. System creates one generation run +4. System creates one or many generated child variants +5. Admin reviews each child separately +6. Admin approves, rejects, archives, or regenerates per item + +This gives: + +- low friction +- high control +- strong auditability +- better quality governance +- flexibility for different admin intents