1110 lines
30 KiB
Markdown
1110 lines
30 KiB
Markdown
# IRT-Powered Adaptive Question Bank System
|
||
|
||
## Final Project Brief \& Technical Specification
|
||
|
||
**Project Name:** IRT Bank Soal (Adaptive Question Bank with AI Generation)
|
||
**Client:** Sejoli Tryout Multi-Website Platform
|
||
**Tech Stack:** FastAPI + PostgreSQL + SQLAlchemy + FastAPI Admin + OpenRouter AI
|
||
**Deployment:** aaPanel VPS (Python Manager + PgSQL Manager)
|
||
**Version:** 1.2.0 Final (Hybrid CTT+IRT + Dynamic Normalization)
|
||
**Last Updated:** March 21, 2026, 9:31 AM WIB
|
||
|
||
***
|
||
|
||
## 🎯 Executive Summary
|
||
|
||
Sistem bank soal adaptif **hybrid** yang FULLY COMPATIBLE dengan Excel klien existing, dengan enhancement untuk:
|
||
|
||
- **Classical Test Theory (CTT)** - EXACT formula dari screenshot Excel klien (p, bobot, NM, NN)
|
||
- **Item Response Theory (IRT)** - Modern adaptive testing dengan theta estimation
|
||
- **AI Generation** - Auto-generate soal variants Mudah/Sulit via OpenRouter (Qwen3 Coder 480B)
|
||
- **Dynamic Normalization** - Rataan/SB calculated real-time atau manual input
|
||
- **Multi-Website Support** - 1 backend untuk N WordPress sites (Mat SD, Bahasa SMA, dll)
|
||
- **Non-Destructive** - 100% backward compatible dengan cara kerja klien sekarang
|
||
|
||
**Core Capabilities:**
|
||
|
||
1. Dual Scoring Mode: CTT (p, bobot) \& IRT (θ, b) berjalan paralel
|
||
2. Screenshot Compatible: Import langsung dari Excel klien (p=140/458)
|
||
3. Exact Formula Match: Implementasi persis formula Excel klien
|
||
4. Dynamic Normalization: Auto-calculate rataan/SB atau static mode
|
||
5. AI Question Generation: Generate Mudah/Sulit dari basis Sedang (CTT)
|
||
6. Full Audit Trail: Track CTT→IRT transition per item
|
||
|
||
***
|
||
|
||
## 📋 Exact Client Formulas (From Excel Analysis)
|
||
|
||
### STEP 1: Tingkat Kesukaran (TK) per Soal
|
||
|
||
```
|
||
Formula: p = Σ Benar / Total Peserta
|
||
|
||
Excel: =D464/$A$463
|
||
├─ D464 = Jumlah siswa yang jawab benar soal 1
|
||
└─ A463 = Total peserta (e.g., 458)
|
||
|
||
Example: p = 140/458 = 0.3057 → "Sedang"
|
||
```
|
||
|
||
|
||
### STEP 2: Bobot per Soal
|
||
|
||
```
|
||
Formula: Bobot = 1 - p
|
||
|
||
Excel: =1-D4
|
||
|
||
Example: Bobot = 1 - 0.3057 = 0.6943
|
||
|
||
Interpretation:
|
||
- Soal mudah (p=0.8) → bobot=0.2 (nilai rendah)
|
||
- Soal sulit (p=0.1) → bobot=0.9 (nilai tinggi)
|
||
```
|
||
|
||
|
||
### STEP 3: Total Benar per Siswa
|
||
|
||
```
|
||
Formula: Total_Benar = COUNT(jawaban benar)
|
||
|
||
Excel: =SUM(D454:W454) [20 soal]
|
||
|
||
Example: Siswa benar 15 soal → Total_Benar = 15
|
||
```
|
||
|
||
|
||
### STEP 4: Total Bobot Earned per Siswa
|
||
|
||
```
|
||
Formula: Total_Bobot = Σ (bobot_soal × jawaban_siswa)
|
||
|
||
Excel: =SUMPRODUCT($D$5:$W$5, D454:W454)
|
||
├─ $D$5:$W$5 = Array bobot [0.69, 0.85, 0.42, ...]
|
||
└─ D454:W454 = Jawaban [1, 1, 0, 1, ...]
|
||
|
||
Example:
|
||
Soal 1: bobot=0.69 × jawaban=1 → 0.69
|
||
Soal 2: bobot=0.85 × jawaban=1 → 0.85
|
||
Soal 3: bobot=0.42 × jawaban=0 → 0.00
|
||
...
|
||
Total_Bobot = 12.5
|
||
```
|
||
|
||
|
||
### STEP 5: Nilai Mentah (NM) [0-1000 scale]
|
||
|
||
```
|
||
Formula: NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
|
||
|
||
Excel: =(Y454/$X$5)*1000
|
||
├─ Y454 = Total bobot siswa (e.g., 12.5)
|
||
└─ $X$5 = Total bobot maksimum (sum semua bobot, 18.3)
|
||
|
||
Example: NM = (12.5 / 18.3) × 1000 = 683
|
||
Range: 0-1000 (percentage-like scale)
|
||
```
|
||
|
||
|
||
### STEP 6: Nilai Nasional (NN) - Z-Score Normalized
|
||
|
||
```
|
||
Formula: NN = 500 + 100 × ((NM - Rataan) / SB)
|
||
|
||
Excel: =500+(100*((Z454-500)/100))
|
||
|
||
Components:
|
||
- 500 = Target mean (center point)
|
||
- 100 = Target standard deviation
|
||
- Rataan = Actual mean of NM from all participants
|
||
- SB = Actual standard deviation of NM
|
||
|
||
⚠️ CURRENT CLIENT ISSUE:
|
||
Rataan = 500 (hardcoded) → NN = 500 + (NM - 500) = NM
|
||
SB = 100 (hardcoded)
|
||
Result: NO actual normalization (NN always equals NM)
|
||
|
||
✅ OUR FIX: Dynamic calculation with 3 modes
|
||
```
|
||
|
||
|
||
### Kategori Kesulitan (CTT Standard)
|
||
|
||
```
|
||
Tingkat Kesukaran (p):
|
||
p < 0.30 → Sukar (Difficult)
|
||
0.30 ≤ p ≤ 0.70 → Sedang (Medium)
|
||
p > 0.70 → Mudah (Easy)
|
||
|
||
Bobot Implications:
|
||
p=0.09 → Bobot=0.91 (Sukar, high weight)
|
||
p=0.50 → Bobot=0.50 (Sedang, medium weight)
|
||
p=0.85 → Bobot=0.15 (Mudah, low weight)
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 🔄 CTT vs IRT: Understanding Both Approaches
|
||
|
||
### Classical Test Theory (CTT) - Client Method
|
||
|
||
**Kelebihan CTT:**
|
||
|
||
- Mudah dipahami admin/guru
|
||
- Tidak butuh banyak data (minimal 100 siswa)
|
||
- Compatible dengan sistem existing
|
||
- Cepat dihitung
|
||
- Formula transparent (visible in Excel)
|
||
|
||
**Keterbatasan CTT:**
|
||
|
||
- Sample-dependent (p berubah tiap kelompok)
|
||
- Tidak adaptive (soal fixed order)
|
||
- Butuh soal baru tiap tes (tidak bisa reuse efisien)
|
||
- Normalization issue (jika rataan/SB hardcoded)
|
||
|
||
|
||
### Item Response Theory (IRT) - Modern Adaptive
|
||
|
||
**Core Formula (1PL Rasch):**
|
||
|
||
```
|
||
P(θ) = 1 / (1 + e^-(θ - b))
|
||
|
||
θ = Kemampuan user (-3 to +3)
|
||
b = Kesulitan item (-3 to +3)
|
||
|
||
θ = -2 (lemah) → P(correct) di b=-1 = 73%
|
||
θ = 0 (average) → P(correct) di b=0 = 50%
|
||
θ = +2 (kuat) → P(correct) di b=+2 = 50%
|
||
```
|
||
|
||
**Kelebihan IRT:**
|
||
|
||
- Item-invariant (b tetap meski kelompok berbeda)
|
||
- Adaptive (pilih soal sesuai kemampuan real-time)
|
||
- Reuse efficient (1000 user, tiap slot 3 variant cukup)
|
||
- Akurat lebih cepat (15 soal IRT = 30 soal CTT)
|
||
|
||
**Keterbatasan IRT:**
|
||
|
||
- Butuh kalibrasi (min 100-500 responses per item)
|
||
- Kompleks untuk admin non-psikometri
|
||
- Butuh sistem adaptive (tidak bisa paper-based)
|
||
|
||
|
||
### Hybrid Solution (This System)
|
||
|
||
| Aspek | CTT Mode (Start) | Hybrid Mode (Transition) | IRT Mode (Goal) |
|
||
| :-- | :-- | :-- | :-- |
|
||
| **Admin Input** | p-value dari screenshot | Edit p atau b, sync otomatis | Edit b, p calculated |
|
||
| **Item Selection** | Fixed order slot 1-30 | Mixed (CTT fixed + IRT adaptive) | Fully adaptive CAT |
|
||
| **Scoring** | NM → NN (screenshot) | Paralel CTT \& IRT scores | θ → NN mapped |
|
||
| **Normalization** | Static atau Dynamic | Choose per tryout | Dynamic recommended |
|
||
| **AI Generation** | Dari p basis | Dari p atau b | Dari b calibrated |
|
||
| **Reuse** | Minimal | Moderate (cache variants) | Maximum (infinite pool) |
|
||
|
||
|
||
***
|
||
|
||
## 🏗️ System Architecture
|
||
|
||
### High-Level Flow (Hybrid + Dynamic Normalization)
|
||
|
||
```
|
||
┌─────────────────────────────────────────┐
|
||
│ WP Site 1 (Mat SD) │ WP Site 2 (Bahasa SMA)
|
||
│ Sejoli Tryout │ Sejoli Tryout
|
||
│ CTT Mode: Fixed │ IRT Mode: Adaptive
|
||
│ website_id=1 │ website_id=2
|
||
└─────────────────────────────────────────┘
|
||
│ │
|
||
└────────┬───────────┘
|
||
│ REST API
|
||
│ POST /next_item
|
||
│ {mode: "ctt"|"irt"|"hybrid"}
|
||
▼
|
||
┌──────────────────────────────┐
|
||
│ FastAPI Backend (aaPanel) │
|
||
├──────────────────────────────┤
|
||
│ Hybrid Scoring Engine │
|
||
│ ├─ CTT: NM from p-bobot │
|
||
│ ├─ IRT: θ from responses │
|
||
│ ├─ Normalization: Dynamic │
|
||
│ └─ Return primary + secondary│
|
||
│ │
|
||
│ Dynamic Normalization Engine │
|
||
│ ├─ Rataan = AVG(all NM) │
|
||
│ ├─ SB = STDEV(all NM) │
|
||
│ ├─ Mode switch: Static→Dynamic
|
||
│ └─ Real-time update per user │
|
||
│ │
|
||
│ Item Selection Strategy │
|
||
│ ├─ CTT: Slot order (1→2→3) │
|
||
│ ├─ IRT: CAT (b ≈ θ) │
|
||
│ └─ Hybrid: First 10 CTT, IRT │
|
||
└────────────┬─────────────────┘
|
||
│
|
||
▼
|
||
┌──────────────────────────────┐
|
||
│ PostgreSQL Database │
|
||
├──────────────────────────────┤
|
||
│ items (ADDED: ctt_p, bobot) │
|
||
│ user_answers (ADDED: nm, nn) │
|
||
│ tryout_config (ADDED: modes) │
|
||
│ tryout_stats (NEW: stats) │
|
||
└──────────────────────────────┘
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 💾 Database Schema (v1.2 Final)
|
||
|
||
### Table: tryout_config
|
||
|
||
```sql
|
||
CREATE TABLE tryout_config (
|
||
id SERIAL PRIMARY KEY,
|
||
website_id INTEGER NOT NULL,
|
||
tryout_id INTEGER NOT NULL,
|
||
|
||
-- Mode Control
|
||
scoring_mode VARCHAR(20) DEFAULT 'ctt', -- 'ctt', 'irt', 'hybrid'
|
||
selection_mode VARCHAR(20) DEFAULT 'fixed', -- 'fixed', 'adaptive', 'hybrid'
|
||
|
||
-- CTT Settings
|
||
min_peserta_for_ctt INTEGER DEFAULT 100,
|
||
|
||
-- Normalization Settings
|
||
normalization_mode VARCHAR(20) DEFAULT 'static', -- 'static', 'dynamic', 'hybrid'
|
||
static_rataan FLOAT DEFAULT 500,
|
||
static_sb FLOAT DEFAULT 100,
|
||
min_sample_for_dynamic INTEGER DEFAULT 100,
|
||
|
||
-- IRT Settings
|
||
enable_irt_when_calibrated BOOLEAN DEFAULT FALSE,
|
||
min_calibration_sample INTEGER DEFAULT 200,
|
||
theta_estimation_method VARCHAR(20) DEFAULT 'mle', -- 'mle', 'eap', 'map'
|
||
|
||
-- Transition Settings
|
||
hybrid_transition_slot INTEGER DEFAULT 10,
|
||
fallback_to_ctt_on_error BOOLEAN DEFAULT TRUE,
|
||
|
||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||
updated_at TIMESTAMPTZ DEFAULT NOW(),
|
||
|
||
UNIQUE(website_id, tryout_id)
|
||
);
|
||
```
|
||
|
||
|
||
### Table: tryout_stats
|
||
|
||
```sql
|
||
CREATE TABLE tryout_stats (
|
||
id SERIAL PRIMARY KEY,
|
||
website_id INTEGER NOT NULL,
|
||
tryout_id INTEGER NOT NULL,
|
||
|
||
-- Running Statistics
|
||
participant_count INTEGER DEFAULT 0,
|
||
total_nm_sum FLOAT DEFAULT 0, -- Σ all NM scores
|
||
total_nm_sq_sum FLOAT DEFAULT 0, -- Σ (NM^2) for variance calc
|
||
|
||
-- Calculated Values (updated on each new participant)
|
||
current_rataan FLOAT, -- AVG(all NM)
|
||
current_sb FLOAT, -- STDEV(all NM)
|
||
min_nm FLOAT,
|
||
max_nm FLOAT,
|
||
|
||
-- Metadata
|
||
last_calculated_at TIMESTAMPTZ,
|
||
last_participant_id INTEGER,
|
||
updated_at TIMESTAMPTZ DEFAULT NOW(),
|
||
|
||
UNIQUE(website_id, tryout_id)
|
||
);
|
||
|
||
CREATE INDEX idx_tryout_stats_lookup ON tryout_stats(website_id, tryout_id);
|
||
```
|
||
|
||
|
||
### Table: user_answers
|
||
|
||
```sql
|
||
CREATE TABLE user_answers (
|
||
id SERIAL PRIMARY KEY,
|
||
wp_user_id INTEGER NOT NULL,
|
||
website_id INTEGER NOT NULL,
|
||
tryout_id INTEGER NOT NULL,
|
||
slot INTEGER NOT NULL,
|
||
level VARCHAR(20) NOT NULL,
|
||
item_id INTEGER NOT NULL,
|
||
|
||
-- Response Data
|
||
response INTEGER NOT NULL, -- 0=incorrect, 1=correct
|
||
time_spent INTEGER,
|
||
|
||
-- CTT Scoring
|
||
ctt_bobot_earned FLOAT, -- Bobot if correct, 0 if wrong
|
||
ctt_total_bobot_cumulative FLOAT, -- Running Σ bobot earned
|
||
ctt_nm FLOAT, -- Nilai Mentah (0-1000)
|
||
ctt_nn FLOAT, -- Nilai Nasional (normalized)
|
||
|
||
-- Normalization Applied
|
||
rataan_used FLOAT, -- Rataan value at this calculation
|
||
sb_used FLOAT, -- SB value at this calculation
|
||
normalization_mode_used VARCHAR(20), -- 'static', 'dynamic', 'hybrid'
|
||
|
||
-- IRT Scoring
|
||
irt_theta FLOAT, -- Ability estimate at this point
|
||
irt_theta_se FLOAT, -- Standard error
|
||
irt_information FLOAT, -- Information value at this item
|
||
|
||
-- Metadata
|
||
scoring_mode_used VARCHAR(20), -- 'ctt', 'irt', 'hybrid'
|
||
answered_at TIMESTAMPTZ DEFAULT NOW(),
|
||
|
||
FOREIGN KEY (item_id) REFERENCES items(id) ON DELETE CASCADE,
|
||
UNIQUE(wp_user_id, website_id, tryout_id, slot, level)
|
||
);
|
||
|
||
CREATE INDEX idx_user_answers_lookup ON user_answers(wp_user_id, website_id, tryout_id);
|
||
CREATE INDEX idx_user_answers_scoring ON user_answers(scoring_mode_used, ctt_nn, irt_theta);
|
||
```
|
||
|
||
|
||
### Table: items
|
||
|
||
```sql
|
||
CREATE TABLE items (
|
||
id SERIAL PRIMARY KEY,
|
||
website_id INTEGER NOT NULL,
|
||
tryout_id INTEGER NOT NULL,
|
||
slot INTEGER NOT NULL,
|
||
level VARCHAR(20) NOT NULL, -- 'Mudah', 'Sedang', 'Sulit'
|
||
stem TEXT NOT NULL,
|
||
options JSONB NOT NULL,
|
||
correct CHAR(1) NOT NULL,
|
||
explanation TEXT,
|
||
|
||
-- CTT Parameters (Screenshot Compatible)
|
||
ctt_p FLOAT, -- Proportion correct (0.09 from screenshot)
|
||
ctt_bobot FLOAT, -- 1 - p (0.91)
|
||
ctt_category VARCHAR(20), -- 'Sukar', 'Sedang', 'Mudah'
|
||
|
||
-- IRT Parameters (Adaptive)
|
||
irt_b FLOAT DEFAULT 0.0, -- Difficulty (-3 to +3)
|
||
irt_a FLOAT DEFAULT 1.0, -- Discrimination (optional)
|
||
irt_c FLOAT DEFAULT 0.25, -- Guessing (optional)
|
||
|
||
-- Calibration Status
|
||
calibrated BOOLEAN DEFAULT FALSE, -- TRUE when 100+ responses analyzed
|
||
calibration_sample_size INTEGER DEFAULT 0,
|
||
calibration_date TIMESTAMPTZ,
|
||
|
||
-- Legacy Fields
|
||
generated_by VARCHAR(10) NOT NULL, -- 'admin' or 'ai'
|
||
ai_model VARCHAR(50),
|
||
basis_item_id INTEGER,
|
||
category_id INTEGER,
|
||
created_at TIMESTAMPTZ DEFAULT NOW(),
|
||
updated_at TIMESTAMPTZ DEFAULT NOW(),
|
||
|
||
FOREIGN KEY (basis_item_id) REFERENCES items(id) ON DELETE SET NULL
|
||
);
|
||
|
||
CREATE INDEX idx_items_lookup ON items(website_id, tryout_id, slot, level);
|
||
CREATE INDEX idx_items_calibrated ON items(calibrated, calibration_sample_size);
|
||
CREATE INDEX idx_items_ctt ON items(ctt_p, ctt_category);
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 🎯 AI Question Generation (OpenRouter)
|
||
|
||
### Recommended Models (OpenRouter Free Tier)
|
||
|
||
| Model | Kenapa Cocok | Cost |
|
||
| :-- | :-- | :-- |
|
||
| **Qwen3 Coder 480B** | Math/reasoning expert, generate soal + solusi akurat, control difficulty | Free |
|
||
| **Llama 3.3 70B Instruct** | Multilingual (Indonesia), Bloom's Taxonomy, recall→analyze | Free |
|
||
| **DeepSeek R1/Math** | Math specialist (algebra/geo), outperform frontier models | Low (\$0.1/1M tokens) |
|
||
|
||
### AI Generation Workflow
|
||
|
||
**Context:** User 123, Tryout A, Slot 2 (Attempt 2)
|
||
|
||
1. Python API hitung θ → perlu "Sulit"
|
||
2. Check DB: Ada soal Sulit slot 2? ❌
|
||
3. AI Generate:
|
||
|
||
```
|
||
POST OpenRouter {
|
||
model: 'qwen3-coder-480b',
|
||
prompt: "Generate 1 soal Mat SD level Sulit mirip [basis_soal]..."
|
||
}
|
||
```
|
||
|
||
4. Parse response → INSERT items (website_id=1, level=Sulit, generated_by='ai')
|
||
5. Serve soal baru ke frontend
|
||
|
||
### Prompt Template (Standardized)
|
||
|
||
```
|
||
Context: Tryout {tryout_id} slot {slot} level {Sulit/Mudah}.
|
||
Basis soal: {basis_stem}.
|
||
Generate: 1 soal baru {level} dengan:
|
||
- Stem: 1 kalimat jelas
|
||
- Options: A B C D, 1 benar, 3 distractor logis
|
||
- Jawaban: huruf + penjelasan singkat
|
||
Bahasa: Indonesia, topik: {category}
|
||
```
|
||
|
||
|
||
### Reuse Strategy (Perfect for Scale)
|
||
|
||
```
|
||
User123, Tryout A, Slot 2, Attempt 1: Soal Sedang (statik)
|
||
User123, Tryout A, Slot 2, Attempt 2: AI generate → Soal Sulit (simpan DB)
|
||
|
||
User456, Tryout A, Slot 2, Attempt 2: Check if exist
|
||
IF ada Soal Sulit → REUSE (cache hit!)
|
||
ELSE → AI generate baru
|
||
|
||
Scenario 1000 users × 3 attempts:
|
||
- Static: 1000 × 30 × 3 = 90,000 soal unik (impossible)
|
||
- With AI + Reuse: ~30 static + 60 AI variants = 90 total (99.9% reuse!)
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 🔧 CTT Scoring Engine Implementation
|
||
|
||
```python
|
||
import numpy as np
|
||
from typing import List, Dict
|
||
from models import Item, TryoutConfig, TryoutStats
|
||
from datetime import datetime
|
||
|
||
def calculate_ctt_score_exact(
|
||
responses: List[Dict],
|
||
items: List[Item],
|
||
config: TryoutConfig,
|
||
db: Session
|
||
) -> Dict:
|
||
"""
|
||
Calculate CTT score using EXACT client Excel formula
|
||
|
||
Formula breakdown:
|
||
1. p = Σ Benar / Total Peserta (per soal)
|
||
2. Bobot = 1 - p
|
||
3. Total_Bobot_Siswa = SUMPRODUCT(bobot_array, jawaban_array)
|
||
4. NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
|
||
5. NN = 500 + 100 × ((NM - Rataan) / SB)
|
||
"""
|
||
|
||
# STEP 1: Calculate total bobot earned (SUMPRODUCT equivalent)
|
||
total_bobot_earned = 0.0
|
||
total_bobot_max = 0.0
|
||
total_benar = 0
|
||
|
||
for response, item in zip(responses, items):
|
||
bobot = item.ctt_bobot # Pre-calculated as 1 - p
|
||
total_bobot_max += bobot
|
||
|
||
if response['correct'] == 1:
|
||
total_bobot_earned += bobot
|
||
total_benar += 1
|
||
|
||
# STEP 2: Calculate NM (Nilai Mentah)
|
||
if total_bobot_max == 0:
|
||
nm = 0.0
|
||
else:
|
||
nm = (total_bobot_earned / total_bobot_max) * 1000
|
||
|
||
# STEP 3: Get Rataan and SB based on normalization mode
|
||
rataan, sb, norm_mode = get_normalization_params(
|
||
config,
|
||
db,
|
||
nm # Current NM to add to stats
|
||
)
|
||
|
||
# STEP 4: Calculate NN (Nilai Nasional)
|
||
if sb == 0 or sb is None:
|
||
nn = 500.0
|
||
else:
|
||
nn = 500 + 100 * ((nm - rataan) / sb)
|
||
|
||
# Clip NN to reasonable range
|
||
nn = float(np.clip(nn, 0, 1000))
|
||
|
||
return {
|
||
"mode": "ctt",
|
||
"total_benar": total_benar,
|
||
"total_bobot_earned": round(total_bobot_earned, 2),
|
||
"total_bobot_max": round(total_bobot_max, 2),
|
||
"nm": round(nm, 1),
|
||
"nn": round(nn, 1),
|
||
"rataan_used": round(rataan, 2),
|
||
"sb_used": round(sb, 2),
|
||
"normalization_mode": norm_mode,
|
||
"breakdown": {
|
||
"percentage": round((total_bobot_earned / total_bobot_max) * 100, 1) if total_bobot_max > 0 else 0
|
||
}
|
||
}
|
||
|
||
|
||
def get_normalization_params(
|
||
config: TryoutConfig,
|
||
db: Session,
|
||
current_nm: float
|
||
) -> tuple[float, float, str]:
|
||
"""
|
||
Get rataan and SB based on normalization mode
|
||
Returns: (rataan, sb, mode_used)
|
||
"""
|
||
|
||
# Get or create stats
|
||
stats = db.query(TryoutStats).filter_by(
|
||
website_id=config.website_id,
|
||
tryout_id=config.tryout_id
|
||
).first()
|
||
|
||
if not stats:
|
||
stats = TryoutStats(
|
||
website_id=config.website_id,
|
||
tryout_id=config.tryout_id,
|
||
participant_count=0,
|
||
total_nm_sum=0,
|
||
total_nm_sq_sum=0
|
||
)
|
||
db.add(stats)
|
||
db.commit()
|
||
|
||
# Update running stats with current NM
|
||
stats.participant_count += 1
|
||
stats.total_nm_sum += current_nm
|
||
stats.total_nm_sq_sum += (current_nm ** 2)
|
||
|
||
# Calculate dynamic rataan and SB
|
||
n = stats.participant_count
|
||
if n > 1:
|
||
mean = stats.total_nm_sum / n
|
||
variance = (stats.total_nm_sq_sum / n) - (mean ** 2)
|
||
std_dev = np.sqrt(max(0, variance))
|
||
|
||
stats.current_rataan = mean
|
||
stats.current_sb = std_dev
|
||
stats.last_calculated_at = datetime.utcnow()
|
||
else:
|
||
# First participant, use static
|
||
stats.current_rataan = config.static_rataan
|
||
stats.current_sb = config.static_sb
|
||
|
||
db.commit()
|
||
|
||
# Determine which values to use based on mode
|
||
if config.normalization_mode == 'static':
|
||
return (
|
||
config.static_rataan,
|
||
config.static_sb,
|
||
'static'
|
||
)
|
||
|
||
elif config.normalization_mode == 'dynamic':
|
||
if stats.participant_count >= 2:
|
||
return (
|
||
stats.current_rataan,
|
||
stats.current_sb,
|
||
'dynamic'
|
||
)
|
||
else:
|
||
return (
|
||
config.static_rataan,
|
||
config.static_sb,
|
||
'static_fallback'
|
||
)
|
||
|
||
elif config.normalization_mode == 'hybrid':
|
||
if stats.participant_count >= config.min_sample_for_dynamic:
|
||
return (
|
||
stats.current_rataan,
|
||
stats.current_sb,
|
||
'hybrid_dynamic'
|
||
)
|
||
else:
|
||
return (
|
||
config.static_rataan,
|
||
config.static_sb,
|
||
'hybrid_static'
|
||
)
|
||
|
||
else:
|
||
return (config.static_rataan, config.static_sb, 'static')
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 📊 IRT Theta Estimation (MLE)
|
||
|
||
```python
|
||
from scipy.optimize import minimize
|
||
import numpy as np
|
||
|
||
def estimate_theta_mle(responses: List[int], items: List[Item]) -> float:
|
||
"""
|
||
Estimate ability (theta) using Maximum Likelihood Estimation
|
||
|
||
1PL Rasch Model: P(θ) = 1 / (1 + e^-(θ - b))
|
||
|
||
Args:
|
||
responses: [1, 0, 1, 1, 0, ...] correct/incorrect
|
||
items: [Item(irt_b=-0.5), Item(irt_b=0.2), ...]
|
||
|
||
Returns:
|
||
theta estimate
|
||
"""
|
||
|
||
def neg_log_likelihood(theta_val):
|
||
ll = 0
|
||
for response, item in zip(responses, items):
|
||
b = item.irt_b if item.irt_b else 0
|
||
# P(θ) = 1 / (1 + e^-(θ - b))
|
||
p = 1 / (1 + np.exp(-(theta_val - b)))
|
||
# Log-likelihood
|
||
if response == 1:
|
||
ll += np.log(max(p, 1e-10)) # Avoid log(0)
|
||
else:
|
||
ll += np.log(max(1 - p, 1e-10))
|
||
return -ll # Negative for minimization
|
||
|
||
# Initial guess: middle of scale
|
||
theta_init = 0
|
||
|
||
# Optimize
|
||
result = minimize(
|
||
neg_log_likelihood,
|
||
x0=[theta_init],
|
||
method='L-BFGS-B',
|
||
bounds=[(-3, 3)] # Reasonable theta range
|
||
)
|
||
|
||
theta_estimate = float(result.x[0])
|
||
return theta_estimate
|
||
|
||
|
||
def estimate_theta_se(theta: float, items: List[Item]) -> float:
|
||
"""
|
||
Calculate standard error of theta estimate
|
||
Using Fisher information
|
||
"""
|
||
information = 0
|
||
for item in items:
|
||
b = item.irt_b if item.irt_b else 0
|
||
p = 1 / (1 + np.exp(-(theta - b)))
|
||
information += p * (1 - p) # Fisher information for 1PL
|
||
|
||
if information > 0:
|
||
se = 1 / np.sqrt(information)
|
||
else:
|
||
se = float('inf')
|
||
|
||
return se
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 🗂️ API Endpoints (v1.2 Final)
|
||
|
||
### 1. Next Item (Adaptive Selection)
|
||
|
||
```
|
||
POST /api/v1/session/{session_id}/next_item
|
||
|
||
Request:
|
||
{
|
||
"mode": "ctt" | "irt" | "hybrid",
|
||
"current_responses": [
|
||
{"item_id": 1, "correct": 1},
|
||
{"item_id": 2, "correct": 0}
|
||
]
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"item_id": 45,
|
||
"slot": 3,
|
||
"level": "Sedang",
|
||
"stem": "...",
|
||
"options": {"A": "...", "B": "...", "C": "...", "D": "...", "E": "..."},
|
||
"item_source": "admin" | "ai",
|
||
"selection_method": "fixed_order" | "adaptive_ctt" | "adaptive_irt"
|
||
}
|
||
```
|
||
|
||
|
||
### 2. Complete Session (Scoring)
|
||
|
||
```
|
||
POST /api/v1/session/{session_id}/complete
|
||
|
||
Response:
|
||
{
|
||
"status": "completed",
|
||
"primary_score": {
|
||
"mode": "ctt",
|
||
"total_benar": 15,
|
||
"total_bobot_earned": 12.5,
|
||
"total_bobot_max": 18.3,
|
||
"nm": 683.0,
|
||
"nn": 618.2,
|
||
"rataan_used": 483.5,
|
||
"sb_used": 112.3,
|
||
"normalization_mode": "dynamic"
|
||
},
|
||
"secondary_score": {
|
||
"mode": "irt",
|
||
"theta": 0.85,
|
||
"theta_se": 0.42,
|
||
"nn_equivalent": 592.5
|
||
},
|
||
"comparison": {
|
||
"nn_difference": 25.7,
|
||
"agreement": "moderate"
|
||
}
|
||
}
|
||
```
|
||
|
||
|
||
### 3. Get Tryout Config (with Normalization)
|
||
|
||
```
|
||
GET /api/v1/tryout/{tryout_id}/config
|
||
|
||
Response:
|
||
{
|
||
"tryout_id": 123,
|
||
"scoring_mode": "ctt",
|
||
"normalization_mode": "dynamic",
|
||
"static_rataan": 500,
|
||
"static_sb": 100,
|
||
"current_stats": {
|
||
"participant_count": 245,
|
||
"current_rataan": 483.5,
|
||
"current_sb": 112.3,
|
||
"min_nm": 125.0,
|
||
"max_nm": 892.0
|
||
},
|
||
"calibration_status": {
|
||
"total_items": 20,
|
||
"calibrated_items": 8,
|
||
"calibration_percentage": 40
|
||
}
|
||
}
|
||
```
|
||
|
||
|
||
### 4. Update Normalization Settings
|
||
|
||
```
|
||
PUT /api/v1/tryout/{tryout_id}/normalization
|
||
|
||
Request:
|
||
{
|
||
"normalization_mode": "hybrid",
|
||
"static_rataan": 500,
|
||
"static_sb": 100,
|
||
"min_sample_for_dynamic": 100
|
||
}
|
||
|
||
Response:
|
||
{
|
||
"status": "updated",
|
||
"normalization_mode": "hybrid",
|
||
"current_participant_count": 45,
|
||
"will_switch_to_dynamic_at": 100,
|
||
"using_mode": "static"
|
||
}
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 📥 Excel Import (OpenCode Ready)
|
||
|
||
```python
|
||
import pandas as pd
|
||
import openpyxl
|
||
from models import Item, TryoutConfig
|
||
|
||
def import_excel_tryout(
|
||
excel_file: str,
|
||
website_id: int,
|
||
tryout_id: int,
|
||
sheet_name: str = "CONTOH",
|
||
db: Session
|
||
) -> Dict:
|
||
"""
|
||
Import from client Excel exactly like PERHITUNGAN-SKOR-TO-3.xlsx
|
||
|
||
Excel structure:
|
||
- Row 1: Headers
|
||
- Row 2: Answer key (KUNCI)
|
||
- Row 4: TK (p values) formulas
|
||
- Row 5: BOBOT formulas
|
||
- Row 6+: Student responses
|
||
"""
|
||
|
||
wb = openpyxl.load_workbook(excel_file, data_only=False)
|
||
ws = wb[sheet_name]
|
||
|
||
# Extract answer key from Row 2
|
||
answer_key = {}
|
||
for col in range(4, ws.max_column + 1):
|
||
key_cell = ws.cell(2, col).value
|
||
if key_cell and key_cell != "KUNCI":
|
||
slot_num = col - 3
|
||
answer_key[slot_num] = key_cell.strip().upper()
|
||
|
||
# Extract TK (p values) from Row 4 - get CALCULATED values
|
||
wb_data = openpyxl.load_workbook(excel_file, data_only=True)
|
||
ws_data = wb_data[sheet_name]
|
||
|
||
p_values = {}
|
||
for col in range(4, ws.max_column + 1):
|
||
slot_num = col - 3
|
||
if slot_num in answer_key:
|
||
p_cell = ws_data.cell(4, col).value
|
||
if p_cell and isinstance(p_cell, (int, float)):
|
||
p_values[slot_num] = float(p_cell)
|
||
|
||
# Calculate bobot (1 - p)
|
||
bobot_values = {slot: 1 - p for slot, p in p_values.items()}
|
||
|
||
# Categorize difficulty
|
||
def categorize_difficulty(p: float) -> tuple[str, str]:
|
||
if p < 0.30:
|
||
return ("Sukar", "Sulit")
|
||
elif p > 0.70:
|
||
return ("Mudah", "Mudah")
|
||
else:
|
||
return ("Sedang", "Sedang")
|
||
|
||
# Create items
|
||
items_created = 0
|
||
for slot_num, correct_ans in answer_key.items():
|
||
p = p_values.get(slot_num, 0.5)
|
||
bobot = bobot_values.get(slot_num, 0.5)
|
||
ctt_cat, level = categorize_difficulty(p)
|
||
|
||
# Convert p to IRT b
|
||
b = ctt_p_to_irt_b(p)
|
||
|
||
item = Item(
|
||
website_id=website_id,
|
||
tryout_id=tryout_id,
|
||
slot=slot_num,
|
||
level=level,
|
||
stem=f"[Import dari Excel - Soal {slot_num}]",
|
||
options={"A": "[Option A]", "B": "[Option B]", "C": "[Option C]", "D": "[Option D]", "E": "[Option E]"},
|
||
correct=correct_ans,
|
||
explanation="",
|
||
ctt_p=p,
|
||
ctt_bobot=bobot,
|
||
ctt_category=ctt_cat,
|
||
irt_b=b,
|
||
calibrated=False,
|
||
calibration_sample_size=0,
|
||
generated_by='admin',
|
||
category_id=None
|
||
)
|
||
db.add(item)
|
||
items_created += 1
|
||
|
||
db.commit()
|
||
|
||
# Configure tryout normalization
|
||
config = TryoutConfig(
|
||
website_id=website_id,
|
||
tryout_id=tryout_id,
|
||
scoring_mode='ctt',
|
||
selection_mode='fixed',
|
||
normalization_mode='static',
|
||
static_rataan=500,
|
||
static_sb=100,
|
||
min_sample_for_dynamic=100
|
||
)
|
||
db.add(config)
|
||
db.commit()
|
||
|
||
return {
|
||
"items_created": items_created,
|
||
"normalization_configured": "static (rataan=500, SB=100)"
|
||
}
|
||
|
||
|
||
def ctt_p_to_irt_b(p: float) -> float:
|
||
"""
|
||
Convert CTT p-value to IRT b parameter
|
||
Linear approximation: b ≈ -ln((1-p)/p)
|
||
"""
|
||
if p <= 0 or p >= 1:
|
||
p = 0.5
|
||
b = -np.log((1 - p) / p)
|
||
return float(b)
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## 🚀 Migration Path (Non-Destructive)
|
||
|
||
### Phase 1: Import Existing Data (Week 1)
|
||
|
||
```
|
||
1. Export current Sejoli Tryout data to Excel
|
||
2. Run import script:
|
||
python manage.py import_excel_tryout \
|
||
--file="PERHITUNGAN-SKOR-TO-3.xlsx" \
|
||
--sheet="CONTOH" \
|
||
--website_id=1 \
|
||
--tryout_id=123
|
||
|
||
3. Verify:
|
||
- All items have ctt_p, ctt_bobot
|
||
- IRT b auto-calculated from p
|
||
- calibrated=False for all
|
||
|
||
4. Configure tryout:
|
||
- scoring_mode='ctt'
|
||
- selection_mode='fixed'
|
||
- normalization_mode='static' (like client now)
|
||
```
|
||
|
||
|
||
### Phase 2: Collect Calibration Data (Week 2-4)
|
||
|
||
```
|
||
1. Students use tryout normally (CTT mode, static normalization)
|
||
2. Backend logs all responses
|
||
3. Monitor calibration progress
|
||
4. Collect running statistics for dynamic normalization
|
||
```
|
||
|
||
|
||
### Phase 3: Enable Dynamic Normalization (Week 5)
|
||
|
||
```
|
||
1. Check participant count: 100+ completed?
|
||
2. Update tryout_config:
|
||
- normalization_mode='hybrid'
|
||
- min_sample_for_dynamic=100
|
||
3. Test with 10-20 new students
|
||
4. Verify distribution normalized to mean=500, sd=100
|
||
```
|
||
|
||
|
||
### Phase 4: Enable IRT Adaptive (Week 6+)
|
||
|
||
```
|
||
1. After 90%+ items calibrated + 1000+ total responses
|
||
2. Update to full IRT:
|
||
- scoring_mode='irt'
|
||
- selection_mode='adaptive'
|
||
- normalization_mode='dynamic'
|
||
3. Enable AI generation for Mudah/Sulit variants
|
||
```
|
||
|
||
|
||
***
|
||
|
||
## ✅ Success Metrics
|
||
|
||
### Technical KPIs
|
||
|
||
1. **Formula Accuracy**: CTT scores match client Excel 100%
|
||
2. **Normalization Stability**: SB within 5% of expected after 100 users
|
||
3. **Calibration Coverage**: >80% items calibrated
|
||
4. **Score Agreement**: CTT vs IRT NN difference <20 points
|
||
5. **Fallback Rate**: <5% IRT→CTT fallbacks per session
|
||
|
||
### Educational KPIs
|
||
|
||
1. **Measurement Precision**: IRT SE <0.5 after 15 items
|
||
2. **Normalization Quality**: Distribution skewness <0.5
|
||
3. **Adaptive Efficiency**: 30% reduction in test length (IRT vs CTT)
|
||
4. **Student Satisfaction**: >80% prefer adaptive mode
|
||
5. **Admin Adoption**: >70% tryouts use hybrid within 3 months
|
||
|
||
***
|
||
|
||
## 📋 Complexity Estimation
|
||
|
||
| Komponen | Effort (Days) | Notes |
|
||
| :-- | :-- | :-- |
|
||
| Setup FastAPI + PG + Alembic | 3 | Boilerplate |
|
||
| Core scoring (CTT/IRT hybrid) | 10 | Math-heavy |
|
||
| Dynamic normalization | 5 | Running stats |
|
||
| AI generation (OpenRouter) | 5 | API integration |
|
||
| Reuse logic + item selection | 8 | Algorithm |
|
||
| Admin UI (FastAPI Admin) | 5 | Auto-generated |
|
||
| Excel import | 3 | Formula parsing |
|
||
| WP integration | 4 | REST API |
|
||
| Testing + docs | 7 | Quality |
|
||
| Buffer | 5 | Contingency |
|
||
| **TOTAL** | **45 days** | **0.8x Sejoli Rebuild** |
|
||
|
||
|
||
***
|
||
|
||
## 📚 Glossary
|
||
|
||
- **p (TK)**: Proportion correct / Tingkat Kesukaran (CTT difficulty)
|
||
- **Bobot**: 1-p weight (CTT scoring weight)
|
||
- **NM**: Nilai Mentah (raw score 0-1000)
|
||
- **NN**: Nilai Nasional (normalized 500±100)
|
||
- **Rataan**: Mean of NM scores
|
||
- **SB**: Simpangan Baku (standard deviation of NM)
|
||
- **θ (theta)**: IRT ability (-3 to +3)
|
||
- **b**: IRT difficulty (-3 to +3)
|
||
- **SE**: Standard error (precision)
|
||
- **CAT**: Computerized Adaptive Testing
|
||
- **EM**: Expectation-Maximization (calibration method)
|
||
- **MLE**: Maximum Likelihood Estimation
|
||
|
||
***
|
||
|
||
## 🔗 File References
|
||
|
||
- **Excel Client:** `PERHITUNGAN-SKOR-TO-3.xlsx` (screenshot reference for formulas)
|
||
- **DB Schema:** PostgreSQL with Alembic migrations
|
||
- **API:** FastAPI with OpenAPI docs
|
||
- **Admin:** FastAPI Admin (auto-generated CRUD)
|
||
|
||
***
|
||
|
||
## 📝 Key Guarantees
|
||
|
||
✅ Existing CTT data safe, IRT adoption gradual, reversible anytime
|
||
✅ 100% compatible with client Excel formulas
|
||
✅ Dynamic normalization optional (can keep static mode)
|
||
✅ Zero data loss during transitions
|
||
✅ Non-destructive (Sejoli Tryout tetap jalan, external enhance)
|
||
|
||
***
|
||
|
||
**Document Version:** 1.2.0 Final
|
||
**Last Updated:** March 21, 2026, 9:31 AM WIB
|
||
**Status:** Ready for Implementation via OpenCode 🚀
|
||
|
||
**By:** Dwindi Ramadhana
|
||
**For:** Sejoli Tryout Multi-Website Platform
|
||
|