# IRT-Powered Adaptive Question Bank System

## Final Project Brief \& Technical Specification

**Project Name:** IRT Bank Soal (Adaptive Question Bank with AI Generation)
**Client:** Sejoli Tryout Multi-Website Platform
**Tech Stack:** FastAPI + PostgreSQL + SQLAlchemy + FastAPI Admin + OpenRouter AI
**Deployment:** aaPanel VPS (Python Manager + PgSQL Manager)
**Version:** 1.2.0 Final (Hybrid CTT+IRT + Dynamic Normalization)
**Last Updated:** March 21, 2026, 9:31 AM WIB

***

## 🎯 Executive Summary

Sistem bank soal adaptif **hybrid** yang FULLY COMPATIBLE dengan Excel klien existing, dengan enhancement untuk:

- **Classical Test Theory (CTT)** - EXACT formula dari screenshot Excel klien (p, bobot, NM, NN)
- **Item Response Theory (IRT)** - Modern adaptive testing dengan theta estimation
- **AI Generation** - Auto-generate soal variants Mudah/Sulit via OpenRouter (Qwen3 Coder 480B)
- **Dynamic Normalization** - Rataan/SB calculated real-time atau manual input
- **Multi-Website Support** - 1 backend untuk N WordPress sites (Mat SD, Bahasa SMA, dll)
- **Non-Destructive** - 100% backward compatible dengan cara kerja klien sekarang

**Core Capabilities:**

1. Dual Scoring Mode: CTT (p, bobot) \& IRT (θ, b) berjalan paralel
2. Screenshot Compatible: Import langsung dari Excel klien (p=140/458)
3. Exact Formula Match: Implementasi persis formula Excel klien
4. Dynamic Normalization: Auto-calculate rataan/SB atau static mode
5. AI Question Generation: Generate Mudah/Sulit dari basis Sedang (CTT)
6. Full Audit Trail: Track CTT→IRT transition per item

***

## 📋 Exact Client Formulas (From Excel Analysis)

### STEP 1: Tingkat Kesukaran (TK) per Soal

```
Formula: p = Σ Benar / Total Peserta

Excel: =D464/$A$463
├─ D464 = Jumlah siswa yang jawab benar soal 1
└─ A463 = Total peserta (e.g., 458)

Example: p = 140/458 = 0.3057 → "Sedang"
```


### STEP 2: Bobot per Soal

```
Formula: Bobot = 1 - p

Excel: =1-D4

Example: Bobot = 1 - 0.3057 = 0.6943

Interpretation:
- Soal mudah (p=0.8) → bobot=0.2 (nilai rendah)
- Soal sulit (p=0.1) → bobot=0.9 (nilai tinggi)
```


### STEP 3: Total Benar per Siswa

```
Formula: Total_Benar = COUNT(jawaban benar)

Excel: =SUM(D454:W454)  [20 soal]

Example: Siswa benar 15 soal → Total_Benar = 15
```


### STEP 4: Total Bobot Earned per Siswa

```
Formula: Total_Bobot = Σ (bobot_soal × jawaban_siswa)

Excel: =SUMPRODUCT($D$5:$W$5, D454:W454)
├─ $D$5:$W$5 = Array bobot [0.69, 0.85, 0.42, ...]
└─ D454:W454 = Jawaban [1, 1, 0, 1, ...]

Example:
  Soal 1: bobot=0.69 × jawaban=1 → 0.69
  Soal 2: bobot=0.85 × jawaban=1 → 0.85
  Soal 3: bobot=0.42 × jawaban=0 → 0.00
  ...
  Total_Bobot = 12.5
```


### STEP 5: Nilai Mentah (NM) [0-1000 scale]

```
Formula: NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000

Excel: =(Y454/$X$5)*1000
├─ Y454 = Total bobot siswa (e.g., 12.5)
└─ $X$5 = Total bobot maksimum (sum semua bobot, 18.3)

Example: NM = (12.5 / 18.3) × 1000 = 683
Range: 0-1000 (percentage-like scale)
```


### STEP 6: Nilai Nasional (NN) - Z-Score Normalized

```
Formula: NN = 500 + 100 × ((NM - Rataan) / SB)

Excel: =500+(100*((Z454-500)/100))

Components:
- 500 = Target mean (center point)
- 100 = Target standard deviation
- Rataan = Actual mean of NM from all participants
- SB = Actual standard deviation of NM

⚠️  CURRENT CLIENT ISSUE:
Rataan = 500 (hardcoded) → NN = 500 + (NM - 500) = NM
SB = 100 (hardcoded)
Result: NO actual normalization (NN always equals NM)

✅ OUR FIX: Dynamic calculation with 3 modes
```


### Kategori Kesulitan (CTT Standard)

```
Tingkat Kesukaran (p):
p < 0.30   → Sukar  (Difficult)
0.30 ≤ p ≤ 0.70 → Sedang (Medium)
p > 0.70   → Mudah (Easy)

Bobot Implications:
p=0.09 → Bobot=0.91 (Sukar, high weight)
p=0.50 → Bobot=0.50 (Sedang, medium weight)
p=0.85 → Bobot=0.15 (Mudah, low weight)
```


***

## 🔄 CTT vs IRT: Understanding Both Approaches

### Classical Test Theory (CTT) - Client Method

**Kelebihan CTT:**

- Mudah dipahami admin/guru
- Tidak butuh banyak data (minimal 100 siswa)
- Compatible dengan sistem existing
- Cepat dihitung
- Formula transparent (visible in Excel)

**Keterbatasan CTT:**

- Sample-dependent (p berubah tiap kelompok)
- Tidak adaptive (soal fixed order)
- Butuh soal baru tiap tes (tidak bisa reuse efisien)
- Normalization issue (jika rataan/SB hardcoded)


### Item Response Theory (IRT) - Modern Adaptive

**Core Formula (1PL Rasch):**

```
P(θ) = 1 / (1 + e^-(θ - b))

θ = Kemampuan user (-3 to +3)
b = Kesulitan item (-3 to +3)

θ = -2 (lemah)  → P(correct) di b=-1 = 73%
θ = 0 (average) → P(correct) di b=0 = 50%
θ = +2 (kuat)   → P(correct) di b=+2 = 50%
```

**Kelebihan IRT:**

- Item-invariant (b tetap meski kelompok berbeda)
- Adaptive (pilih soal sesuai kemampuan real-time)
- Reuse efficient (1000 user, tiap slot 3 variant cukup)
- Akurat lebih cepat (15 soal IRT = 30 soal CTT)

**Keterbatasan IRT:**

- Butuh kalibrasi (min 100-500 responses per item)
- Kompleks untuk admin non-psikometri
- Butuh sistem adaptive (tidak bisa paper-based)


### Hybrid Solution (This System)

| Aspek | CTT Mode (Start) | Hybrid Mode (Transition) | IRT Mode (Goal) |
| :-- | :-- | :-- | :-- |
| **Admin Input** | p-value dari screenshot | Edit p atau b, sync otomatis | Edit b, p calculated |
| **Item Selection** | Fixed order slot 1-30 | Mixed (CTT fixed + IRT adaptive) | Fully adaptive CAT |
| **Scoring** | NM → NN (screenshot) | Paralel CTT \& IRT scores | θ → NN mapped |
| **Normalization** | Static atau Dynamic | Choose per tryout | Dynamic recommended |
| **AI Generation** | Dari p basis | Dari p atau b | Dari b calibrated |
| **Reuse** | Minimal | Moderate (cache variants) | Maximum (infinite pool) |


***

## 🏗️ System Architecture

### High-Level Flow (Hybrid + Dynamic Normalization)

```
┌─────────────────────────────────────────┐
│  WP Site 1 (Mat SD)  │  WP Site 2 (Bahasa SMA)
│  Sejoli Tryout       │  Sejoli Tryout
│  CTT Mode: Fixed     │  IRT Mode: Adaptive
│  website_id=1        │  website_id=2
└─────────────────────────────────────────┘
           │                    │
           └────────┬───────────┘
                    │ REST API
                    │ POST /next_item
                    │ {mode: "ctt"|"irt"|"hybrid"}
                    ▼
    ┌──────────────────────────────┐
    │  FastAPI Backend (aaPanel)   │
    ├──────────────────────────────┤
    │ Hybrid Scoring Engine        │
    │ ├─ CTT: NM from p-bobot      │
    │ ├─ IRT: θ from responses     │
    │ ├─ Normalization: Dynamic    │
    │ └─ Return primary + secondary│
    │                              │
    │ Dynamic Normalization Engine │
    │ ├─ Rataan = AVG(all NM)      │
    │ ├─ SB = STDEV(all NM)        │
    │ ├─ Mode switch: Static→Dynamic
    │ └─ Real-time update per user │
    │                              │
    │ Item Selection Strategy      │
    │ ├─ CTT: Slot order (1→2→3)   │
    │ ├─ IRT: CAT (b ≈ θ)          │
    │ └─ Hybrid: First 10 CTT, IRT │
    └────────────┬─────────────────┘
                 │
                 ▼
    ┌──────────────────────────────┐
    │  PostgreSQL Database         │
    ├──────────────────────────────┤
    │ items (ADDED: ctt_p, bobot)  │
    │ user_answers (ADDED: nm, nn) │
    │ tryout_config (ADDED: modes) │
    │ tryout_stats (NEW: stats)    │
    └──────────────────────────────┘
```


***

## 💾 Database Schema (v1.2 Final)

### Table: tryout_config

```sql
CREATE TABLE tryout_config (
    id SERIAL PRIMARY KEY,
    website_id INTEGER NOT NULL,
    tryout_id INTEGER NOT NULL,
    
    -- Mode Control
    scoring_mode VARCHAR(20) DEFAULT 'ctt',  -- 'ctt', 'irt', 'hybrid'
    selection_mode VARCHAR(20) DEFAULT 'fixed', -- 'fixed', 'adaptive', 'hybrid'
    
    -- CTT Settings
    min_peserta_for_ctt INTEGER DEFAULT 100,
    
    -- Normalization Settings
    normalization_mode VARCHAR(20) DEFAULT 'static', -- 'static', 'dynamic', 'hybrid'
    static_rataan FLOAT DEFAULT 500,
    static_sb FLOAT DEFAULT 100,
    min_sample_for_dynamic INTEGER DEFAULT 100,
    
    -- IRT Settings
    enable_irt_when_calibrated BOOLEAN DEFAULT FALSE,
    min_calibration_sample INTEGER DEFAULT 200,
    theta_estimation_method VARCHAR(20) DEFAULT 'mle', -- 'mle', 'eap', 'map'
    
    -- Transition Settings
    hybrid_transition_slot INTEGER DEFAULT 10,
    fallback_to_ctt_on_error BOOLEAN DEFAULT TRUE,
    
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    
    UNIQUE(website_id, tryout_id)
);
```


### Table: tryout_stats

```sql
CREATE TABLE tryout_stats (
    id SERIAL PRIMARY KEY,
    website_id INTEGER NOT NULL,
    tryout_id INTEGER NOT NULL,
    
    -- Running Statistics
    participant_count INTEGER DEFAULT 0,
    total_nm_sum FLOAT DEFAULT 0,            -- Σ all NM scores
    total_nm_sq_sum FLOAT DEFAULT 0,         -- Σ (NM^2) for variance calc
    
    -- Calculated Values (updated on each new participant)
    current_rataan FLOAT,                    -- AVG(all NM)
    current_sb FLOAT,                        -- STDEV(all NM)
    min_nm FLOAT,
    max_nm FLOAT,
    
    -- Metadata
    last_calculated_at TIMESTAMPTZ,
    last_participant_id INTEGER,
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    
    UNIQUE(website_id, tryout_id)
);

CREATE INDEX idx_tryout_stats_lookup ON tryout_stats(website_id, tryout_id);
```


### Table: user_answers

```sql
CREATE TABLE user_answers (
    id SERIAL PRIMARY KEY,
    wp_user_id INTEGER NOT NULL,
    website_id INTEGER NOT NULL,
    tryout_id INTEGER NOT NULL,
    slot INTEGER NOT NULL,
    level VARCHAR(20) NOT NULL,
    item_id INTEGER NOT NULL,
    
    -- Response Data
    response INTEGER NOT NULL,             -- 0=incorrect, 1=correct
    time_spent INTEGER,
    
    -- CTT Scoring
    ctt_bobot_earned FLOAT,                -- Bobot if correct, 0 if wrong
    ctt_total_bobot_cumulative FLOAT,      -- Running Σ bobot earned
    ctt_nm FLOAT,                          -- Nilai Mentah (0-1000)
    ctt_nn FLOAT,                          -- Nilai Nasional (normalized)
    
    -- Normalization Applied
    rataan_used FLOAT,                     -- Rataan value at this calculation
    sb_used FLOAT,                         -- SB value at this calculation
    normalization_mode_used VARCHAR(20),   -- 'static', 'dynamic', 'hybrid'
    
    -- IRT Scoring
    irt_theta FLOAT,                       -- Ability estimate at this point
    irt_theta_se FLOAT,                    -- Standard error
    irt_information FLOAT,                 -- Information value at this item
    
    -- Metadata
    scoring_mode_used VARCHAR(20),         -- 'ctt', 'irt', 'hybrid'
    answered_at TIMESTAMPTZ DEFAULT NOW(),
    
    FOREIGN KEY (item_id) REFERENCES items(id) ON DELETE CASCADE,
    UNIQUE(wp_user_id, website_id, tryout_id, slot, level)
);

CREATE INDEX idx_user_answers_lookup ON user_answers(wp_user_id, website_id, tryout_id);
CREATE INDEX idx_user_answers_scoring ON user_answers(scoring_mode_used, ctt_nn, irt_theta);
```


### Table: items

```sql
CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    website_id INTEGER NOT NULL,
    tryout_id INTEGER NOT NULL,
    slot INTEGER NOT NULL,
    level VARCHAR(20) NOT NULL,            -- 'Mudah', 'Sedang', 'Sulit'
    stem TEXT NOT NULL,
    options JSONB NOT NULL,
    correct CHAR(1) NOT NULL,
    explanation TEXT,
    
    -- CTT Parameters (Screenshot Compatible)
    ctt_p FLOAT,                           -- Proportion correct (0.09 from screenshot)
    ctt_bobot FLOAT,                       -- 1 - p (0.91)
    ctt_category VARCHAR(20),              -- 'Sukar', 'Sedang', 'Mudah'
    
    -- IRT Parameters (Adaptive)
    irt_b FLOAT DEFAULT 0.0,               -- Difficulty (-3 to +3)
    irt_a FLOAT DEFAULT 1.0,               -- Discrimination (optional)
    irt_c FLOAT DEFAULT 0.25,              -- Guessing (optional)
    
    -- Calibration Status
    calibrated BOOLEAN DEFAULT FALSE,      -- TRUE when 100+ responses analyzed
    calibration_sample_size INTEGER DEFAULT 0,
    calibration_date TIMESTAMPTZ,
    
    -- Legacy Fields
    generated_by VARCHAR(10) NOT NULL,     -- 'admin' or 'ai'
    ai_model VARCHAR(50),
    basis_item_id INTEGER,
    category_id INTEGER,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW(),
    
    FOREIGN KEY (basis_item_id) REFERENCES items(id) ON DELETE SET NULL
);

CREATE INDEX idx_items_lookup ON items(website_id, tryout_id, slot, level);
CREATE INDEX idx_items_calibrated ON items(calibrated, calibration_sample_size);
CREATE INDEX idx_items_ctt ON items(ctt_p, ctt_category);
```


***

## 🎯 AI Question Generation (OpenRouter)

### Recommended Models (OpenRouter Free Tier)

| Model | Kenapa Cocok | Cost |
| :-- | :-- | :-- |
| **Qwen3 Coder 480B** | Math/reasoning expert, generate soal + solusi akurat, control difficulty | Free |
| **Llama 3.3 70B Instruct** | Multilingual (Indonesia), Bloom's Taxonomy, recall→analyze | Free |
| **DeepSeek R1/Math** | Math specialist (algebra/geo), outperform frontier models | Low (\$0.1/1M tokens) |

### AI Generation Workflow

**Context:** User 123, Tryout A, Slot 2 (Attempt 2)

1. Python API hitung θ → perlu "Sulit"
2. Check DB: Ada soal Sulit slot 2? ❌
3. AI Generate:

```
POST OpenRouter {
  model: 'qwen3-coder-480b',
  prompt: "Generate 1 soal Mat SD level Sulit mirip [basis_soal]..."
}
```

4. Parse response → INSERT items (website_id=1, level=Sulit, generated_by='ai')
5. Serve soal baru ke frontend

### Prompt Template (Standardized)

```
Context: Tryout {tryout_id} slot {slot} level {Sulit/Mudah}.
Basis soal: {basis_stem}.
Generate: 1 soal baru {level} dengan:
- Stem: 1 kalimat jelas
- Options: A B C D, 1 benar, 3 distractor logis
- Jawaban: huruf + penjelasan singkat
Bahasa: Indonesia, topik: {category}
```


### Reuse Strategy (Perfect for Scale)

```
User123, Tryout A, Slot 2, Attempt 1: Soal Sedang (statik)
User123, Tryout A, Slot 2, Attempt 2: AI generate → Soal Sulit (simpan DB)

User456, Tryout A, Slot 2, Attempt 2: Check if exist
  IF ada Soal Sulit → REUSE (cache hit!)
  ELSE → AI generate baru

Scenario 1000 users × 3 attempts:
- Static: 1000 × 30 × 3 = 90,000 soal unik (impossible)
- With AI + Reuse: ~30 static + 60 AI variants = 90 total (99.9% reuse!)
```


***

## 🔧 CTT Scoring Engine Implementation

```python
import numpy as np
from typing import List, Dict
from models import Item, TryoutConfig, TryoutStats
from datetime import datetime

def calculate_ctt_score_exact(
    responses: List[Dict], 
    items: List[Item], 
    config: TryoutConfig,
    db: Session
) -> Dict:
    """
    Calculate CTT score using EXACT client Excel formula
    
    Formula breakdown:
    1. p = Σ Benar / Total Peserta (per soal)
    2. Bobot = 1 - p
    3. Total_Bobot_Siswa = SUMPRODUCT(bobot_array, jawaban_array)
    4. NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
    5. NN = 500 + 100 × ((NM - Rataan) / SB)
    """
    
    # STEP 1: Calculate total bobot earned (SUMPRODUCT equivalent)
    total_bobot_earned = 0.0
    total_bobot_max = 0.0
    total_benar = 0
    
    for response, item in zip(responses, items):
        bobot = item.ctt_bobot  # Pre-calculated as 1 - p
        total_bobot_max += bobot
        
        if response['correct'] == 1:
            total_bobot_earned += bobot
            total_benar += 1
    
    # STEP 2: Calculate NM (Nilai Mentah)
    if total_bobot_max == 0:
        nm = 0.0
    else:
        nm = (total_bobot_earned / total_bobot_max) * 1000
    
    # STEP 3: Get Rataan and SB based on normalization mode
    rataan, sb, norm_mode = get_normalization_params(
        config, 
        db, 
        nm  # Current NM to add to stats
    )
    
    # STEP 4: Calculate NN (Nilai Nasional)
    if sb == 0 or sb is None:
        nn = 500.0
    else:
        nn = 500 + 100 * ((nm - rataan) / sb)
    
    # Clip NN to reasonable range
    nn = float(np.clip(nn, 0, 1000))
    
    return {
        "mode": "ctt",
        "total_benar": total_benar,
        "total_bobot_earned": round(total_bobot_earned, 2),
        "total_bobot_max": round(total_bobot_max, 2),
        "nm": round(nm, 1),
        "nn": round(nn, 1),
        "rataan_used": round(rataan, 2),
        "sb_used": round(sb, 2),
        "normalization_mode": norm_mode,
        "breakdown": {
            "percentage": round((total_bobot_earned / total_bobot_max) * 100, 1) if total_bobot_max > 0 else 0
        }
    }


def get_normalization_params(
    config: TryoutConfig, 
    db: Session,
    current_nm: float
) -> tuple[float, float, str]:
    """
    Get rataan and SB based on normalization mode
    Returns: (rataan, sb, mode_used)
    """
    
    # Get or create stats
    stats = db.query(TryoutStats).filter_by(
        website_id=config.website_id,
        tryout_id=config.tryout_id
    ).first()
    
    if not stats:
        stats = TryoutStats(
            website_id=config.website_id,
            tryout_id=config.tryout_id,
            participant_count=0,
            total_nm_sum=0,
            total_nm_sq_sum=0
        )
        db.add(stats)
        db.commit()
    
    # Update running stats with current NM
    stats.participant_count += 1
    stats.total_nm_sum += current_nm
    stats.total_nm_sq_sum += (current_nm ** 2)
    
    # Calculate dynamic rataan and SB
    n = stats.participant_count
    if n > 1:
        mean = stats.total_nm_sum / n
        variance = (stats.total_nm_sq_sum / n) - (mean ** 2)
        std_dev = np.sqrt(max(0, variance))
        
        stats.current_rataan = mean
        stats.current_sb = std_dev
        stats.last_calculated_at = datetime.utcnow()
    else:
        # First participant, use static
        stats.current_rataan = config.static_rataan
        stats.current_sb = config.static_sb
    
    db.commit()
    
    # Determine which values to use based on mode
    if config.normalization_mode == 'static':
        return (
            config.static_rataan,
            config.static_sb,
            'static'
        )
    
    elif config.normalization_mode == 'dynamic':
        if stats.participant_count >= 2:
            return (
                stats.current_rataan,
                stats.current_sb,
                'dynamic'
            )
        else:
            return (
                config.static_rataan,
                config.static_sb,
                'static_fallback'
            )
    
    elif config.normalization_mode == 'hybrid':
        if stats.participant_count >= config.min_sample_for_dynamic:
            return (
                stats.current_rataan,
                stats.current_sb,
                'hybrid_dynamic'
            )
        else:
            return (
                config.static_rataan,
                config.static_sb,
                'hybrid_static'
            )
    
    else:
        return (config.static_rataan, config.static_sb, 'static')
```


***

## 📊 IRT Theta Estimation (MLE)

```python
from scipy.optimize import minimize
import numpy as np

def estimate_theta_mle(responses: List[int], items: List[Item]) -> float:
    """
    Estimate ability (theta) using Maximum Likelihood Estimation
    
    1PL Rasch Model: P(θ) = 1 / (1 + e^-(θ - b))
    
    Args:
        responses: [1, 0, 1, 1, 0, ...] correct/incorrect
        items: [Item(irt_b=-0.5), Item(irt_b=0.2), ...]
    
    Returns:
        theta estimate
    """
    
    def neg_log_likelihood(theta_val):
        ll = 0
        for response, item in zip(responses, items):
            b = item.irt_b if item.irt_b else 0
            # P(θ) = 1 / (1 + e^-(θ - b))
            p = 1 / (1 + np.exp(-(theta_val - b)))
            # Log-likelihood
            if response == 1:
                ll += np.log(max(p, 1e-10))  # Avoid log(0)
            else:
                ll += np.log(max(1 - p, 1e-10))
        return -ll  # Negative for minimization
    
    # Initial guess: middle of scale
    theta_init = 0
    
    # Optimize
    result = minimize(
        neg_log_likelihood, 
        x0=[theta_init], 
        method='L-BFGS-B',
        bounds=[(-3, 3)]  # Reasonable theta range
    )
    
    theta_estimate = float(result.x[0])
    return theta_estimate


def estimate_theta_se(theta: float, items: List[Item]) -> float:
    """
    Calculate standard error of theta estimate
    Using Fisher information
    """
    information = 0
    for item in items:
        b = item.irt_b if item.irt_b else 0
        p = 1 / (1 + np.exp(-(theta - b)))
        information += p * (1 - p)  # Fisher information for 1PL
    
    if information > 0:
        se = 1 / np.sqrt(information)
    else:
        se = float('inf')
    
    return se
```


***

## 🗂️ API Endpoints (v1.2 Final)

### 1. Next Item (Adaptive Selection)

```
POST /api/v1/session/{session_id}/next_item

Request:
{
  "mode": "ctt" | "irt" | "hybrid",
  "current_responses": [
    {"item_id": 1, "correct": 1},
    {"item_id": 2, "correct": 0}
  ]
}

Response:
{
  "item_id": 45,
  "slot": 3,
  "level": "Sedang",
  "stem": "...",
  "options": {"A": "...", "B": "...", "C": "...", "D": "...", "E": "..."},
  "item_source": "admin" | "ai",
  "selection_method": "fixed_order" | "adaptive_ctt" | "adaptive_irt"
}
```


### 2. Complete Session (Scoring)

```
POST /api/v1/session/{session_id}/complete

Response:
{
  "status": "completed",
  "primary_score": {
    "mode": "ctt",
    "total_benar": 15,
    "total_bobot_earned": 12.5,
    "total_bobot_max": 18.3,
    "nm": 683.0,
    "nn": 618.2,
    "rataan_used": 483.5,
    "sb_used": 112.3,
    "normalization_mode": "dynamic"
  },
  "secondary_score": {
    "mode": "irt",
    "theta": 0.85,
    "theta_se": 0.42,
    "nn_equivalent": 592.5
  },
  "comparison": {
    "nn_difference": 25.7,
    "agreement": "moderate"
  }
}
```


### 3. Get Tryout Config (with Normalization)

```
GET /api/v1/tryout/{tryout_id}/config

Response:
{
  "tryout_id": 123,
  "scoring_mode": "ctt",
  "normalization_mode": "dynamic",
  "static_rataan": 500,
  "static_sb": 100,
  "current_stats": {
    "participant_count": 245,
    "current_rataan": 483.5,
    "current_sb": 112.3,
    "min_nm": 125.0,
    "max_nm": 892.0
  },
  "calibration_status": {
    "total_items": 20,
    "calibrated_items": 8,
    "calibration_percentage": 40
  }
}
```


### 4. Update Normalization Settings

```
PUT /api/v1/tryout/{tryout_id}/normalization

Request:
{
  "normalization_mode": "hybrid",
  "static_rataan": 500,
  "static_sb": 100,
  "min_sample_for_dynamic": 100
}

Response:
{
  "status": "updated",
  "normalization_mode": "hybrid",
  "current_participant_count": 45,
  "will_switch_to_dynamic_at": 100,
  "using_mode": "static"
}
```


***

## 📥 Excel Import (OpenCode Ready)

```python
import pandas as pd
import openpyxl
from models import Item, TryoutConfig

def import_excel_tryout(
    excel_file: str,
    website_id: int,
    tryout_id: int,
    sheet_name: str = "CONTOH",
    db: Session
) -> Dict:
    """
    Import from client Excel exactly like PERHITUNGAN-SKOR-TO-3.xlsx
    
    Excel structure:
    - Row 1: Headers
    - Row 2: Answer key (KUNCI)
    - Row 4: TK (p values) formulas
    - Row 5: BOBOT formulas
    - Row 6+: Student responses
    """
    
    wb = openpyxl.load_workbook(excel_file, data_only=False)
    ws = wb[sheet_name]
    
    # Extract answer key from Row 2
    answer_key = {}
    for col in range(4, ws.max_column + 1):
        key_cell = ws.cell(2, col).value
        if key_cell and key_cell != "KUNCI":
            slot_num = col - 3
            answer_key[slot_num] = key_cell.strip().upper()
    
    # Extract TK (p values) from Row 4 - get CALCULATED values
    wb_data = openpyxl.load_workbook(excel_file, data_only=True)
    ws_data = wb_data[sheet_name]
    
    p_values = {}
    for col in range(4, ws.max_column + 1):
        slot_num = col - 3
        if slot_num in answer_key:
            p_cell = ws_data.cell(4, col).value
            if p_cell and isinstance(p_cell, (int, float)):
                p_values[slot_num] = float(p_cell)
    
    # Calculate bobot (1 - p)
    bobot_values = {slot: 1 - p for slot, p in p_values.items()}
    
    # Categorize difficulty
    def categorize_difficulty(p: float) -> tuple[str, str]:
        if p < 0.30:
            return ("Sukar", "Sulit")
        elif p > 0.70:
            return ("Mudah", "Mudah")
        else:
            return ("Sedang", "Sedang")
    
    # Create items
    items_created = 0
    for slot_num, correct_ans in answer_key.items():
        p = p_values.get(slot_num, 0.5)
        bobot = bobot_values.get(slot_num, 0.5)
        ctt_cat, level = categorize_difficulty(p)
        
        # Convert p to IRT b
        b = ctt_p_to_irt_b(p)
        
        item = Item(
            website_id=website_id,
            tryout_id=tryout_id,
            slot=slot_num,
            level=level,
            stem=f"[Import dari Excel - Soal {slot_num}]",
            options={"A": "[Option A]", "B": "[Option B]", "C": "[Option C]", "D": "[Option D]", "E": "[Option E]"},
            correct=correct_ans,
            explanation="",
            ctt_p=p,
            ctt_bobot=bobot,
            ctt_category=ctt_cat,
            irt_b=b,
            calibrated=False,
            calibration_sample_size=0,
            generated_by='admin',
            category_id=None
        )
        db.add(item)
        items_created += 1
    
    db.commit()
    
    # Configure tryout normalization
    config = TryoutConfig(
        website_id=website_id,
        tryout_id=tryout_id,
        scoring_mode='ctt',
        selection_mode='fixed',
        normalization_mode='static',
        static_rataan=500,
        static_sb=100,
        min_sample_for_dynamic=100
    )
    db.add(config)
    db.commit()
    
    return {
        "items_created": items_created,
        "normalization_configured": "static (rataan=500, SB=100)"
    }


def ctt_p_to_irt_b(p: float) -> float:
    """
    Convert CTT p-value to IRT b parameter
    Linear approximation: b ≈ -ln((1-p)/p)
    """
    if p <= 0 or p >= 1:
        p = 0.5
    b = -np.log((1 - p) / p)
    return float(b)
```


***

## 🚀 Migration Path (Non-Destructive)

### Phase 1: Import Existing Data (Week 1)

```
1. Export current Sejoli Tryout data to Excel
2. Run import script:
   python manage.py import_excel_tryout \
     --file="PERHITUNGAN-SKOR-TO-3.xlsx" \
     --sheet="CONTOH" \
     --website_id=1 \
     --tryout_id=123
   
3. Verify:
   - All items have ctt_p, ctt_bobot
   - IRT b auto-calculated from p
   - calibrated=False for all
   
4. Configure tryout:
   - scoring_mode='ctt'
   - selection_mode='fixed'
   - normalization_mode='static' (like client now)
```


### Phase 2: Collect Calibration Data (Week 2-4)

```
1. Students use tryout normally (CTT mode, static normalization)
2. Backend logs all responses
3. Monitor calibration progress
4. Collect running statistics for dynamic normalization
```


### Phase 3: Enable Dynamic Normalization (Week 5)

```
1. Check participant count: 100+ completed?
2. Update tryout_config:
   - normalization_mode='hybrid'
   - min_sample_for_dynamic=100
3. Test with 10-20 new students
4. Verify distribution normalized to mean=500, sd=100
```


### Phase 4: Enable IRT Adaptive (Week 6+)

```
1. After 90%+ items calibrated + 1000+ total responses
2. Update to full IRT:
   - scoring_mode='irt'
   - selection_mode='adaptive'
   - normalization_mode='dynamic'
3. Enable AI generation for Mudah/Sulit variants
```


***

## ✅ Success Metrics

### Technical KPIs

1. **Formula Accuracy**: CTT scores match client Excel 100%
2. **Normalization Stability**: SB within 5% of expected after 100 users
3. **Calibration Coverage**: >80% items calibrated
4. **Score Agreement**: CTT vs IRT NN difference <20 points
5. **Fallback Rate**: <5% IRT→CTT fallbacks per session

### Educational KPIs

1. **Measurement Precision**: IRT SE <0.5 after 15 items
2. **Normalization Quality**: Distribution skewness <0.5
3. **Adaptive Efficiency**: 30% reduction in test length (IRT vs CTT)
4. **Student Satisfaction**: >80% prefer adaptive mode
5. **Admin Adoption**: >70% tryouts use hybrid within 3 months

***

## 📋 Complexity Estimation

| Komponen | Effort (Days) | Notes |
| :-- | :-- | :-- |
| Setup FastAPI + PG + Alembic | 3 | Boilerplate |
| Core scoring (CTT/IRT hybrid) | 10 | Math-heavy |
| Dynamic normalization | 5 | Running stats |
| AI generation (OpenRouter) | 5 | API integration |
| Reuse logic + item selection | 8 | Algorithm |
| Admin UI (FastAPI Admin) | 5 | Auto-generated |
| Excel import | 3 | Formula parsing |
| WP integration | 4 | REST API |
| Testing + docs | 7 | Quality |
| Buffer | 5 | Contingency |
| **TOTAL** | **45 days** | **0.8x Sejoli Rebuild** |


***

## 📚 Glossary

- **p (TK)**: Proportion correct / Tingkat Kesukaran (CTT difficulty)
- **Bobot**: 1-p weight (CTT scoring weight)
- **NM**: Nilai Mentah (raw score 0-1000)
- **NN**: Nilai Nasional (normalized 500±100)
- **Rataan**: Mean of NM scores
- **SB**: Simpangan Baku (standard deviation of NM)
- **θ (theta)**: IRT ability (-3 to +3)
- **b**: IRT difficulty (-3 to +3)
- **SE**: Standard error (precision)
- **CAT**: Computerized Adaptive Testing
- **EM**: Expectation-Maximization (calibration method)
- **MLE**: Maximum Likelihood Estimation

***

## 🔗 File References

- **Excel Client:** `PERHITUNGAN-SKOR-TO-3.xlsx` (screenshot reference for formulas)
- **DB Schema:** PostgreSQL with Alembic migrations
- **API:** FastAPI with OpenAPI docs
- **Admin:** FastAPI Admin (auto-generated CRUD)

***

## 📝 Key Guarantees

✅ Existing CTT data safe, IRT adoption gradual, reversible anytime
✅ 100% compatible with client Excel formulas
✅ Dynamic normalization optional (can keep static mode)
✅ Zero data loss during transitions
✅ Non-destructive (Sejoli Tryout tetap jalan, external enhance)

***

**Document Version:** 1.2.0 Final
**Last Updated:** March 21, 2026, 9:31 AM WIB
**Status:** Ready for Implementation via OpenCode 🚀

**By:** Dwindi Ramadhana
**For:** Sejoli Tryout Multi-Website Platform