Files
yellow-bank-soal/project-brief.md
Dwindi Ramadhana cf193d7ea0 first commit
2026-03-21 23:32:59 +07:00

1110 lines
30 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# IRT-Powered Adaptive Question Bank System
## Final Project Brief \& Technical Specification
**Project Name:** IRT Bank Soal (Adaptive Question Bank with AI Generation)
**Client:** Sejoli Tryout Multi-Website Platform
**Tech Stack:** FastAPI + PostgreSQL + SQLAlchemy + FastAPI Admin + OpenRouter AI
**Deployment:** aaPanel VPS (Python Manager + PgSQL Manager)
**Version:** 1.2.0 Final (Hybrid CTT+IRT + Dynamic Normalization)
**Last Updated:** March 21, 2026, 9:31 AM WIB
***
## 🎯 Executive Summary
Sistem bank soal adaptif **hybrid** yang FULLY COMPATIBLE dengan Excel klien existing, dengan enhancement untuk:
- **Classical Test Theory (CTT)** - EXACT formula dari screenshot Excel klien (p, bobot, NM, NN)
- **Item Response Theory (IRT)** - Modern adaptive testing dengan theta estimation
- **AI Generation** - Auto-generate soal variants Mudah/Sulit via OpenRouter (Qwen3 Coder 480B)
- **Dynamic Normalization** - Rataan/SB calculated real-time atau manual input
- **Multi-Website Support** - 1 backend untuk N WordPress sites (Mat SD, Bahasa SMA, dll)
- **Non-Destructive** - 100% backward compatible dengan cara kerja klien sekarang
**Core Capabilities:**
1. Dual Scoring Mode: CTT (p, bobot) \& IRT (θ, b) berjalan paralel
2. Screenshot Compatible: Import langsung dari Excel klien (p=140/458)
3. Exact Formula Match: Implementasi persis formula Excel klien
4. Dynamic Normalization: Auto-calculate rataan/SB atau static mode
5. AI Question Generation: Generate Mudah/Sulit dari basis Sedang (CTT)
6. Full Audit Trail: Track CTT→IRT transition per item
***
## 📋 Exact Client Formulas (From Excel Analysis)
### STEP 1: Tingkat Kesukaran (TK) per Soal
```
Formula: p = Σ Benar / Total Peserta
Excel: =D464/$A$463
├─ D464 = Jumlah siswa yang jawab benar soal 1
└─ A463 = Total peserta (e.g., 458)
Example: p = 140/458 = 0.3057 → "Sedang"
```
### STEP 2: Bobot per Soal
```
Formula: Bobot = 1 - p
Excel: =1-D4
Example: Bobot = 1 - 0.3057 = 0.6943
Interpretation:
- Soal mudah (p=0.8) → bobot=0.2 (nilai rendah)
- Soal sulit (p=0.1) → bobot=0.9 (nilai tinggi)
```
### STEP 3: Total Benar per Siswa
```
Formula: Total_Benar = COUNT(jawaban benar)
Excel: =SUM(D454:W454) [20 soal]
Example: Siswa benar 15 soal → Total_Benar = 15
```
### STEP 4: Total Bobot Earned per Siswa
```
Formula: Total_Bobot = Σ (bobot_soal × jawaban_siswa)
Excel: =SUMPRODUCT($D$5:$W$5, D454:W454)
├─ $D$5:$W$5 = Array bobot [0.69, 0.85, 0.42, ...]
└─ D454:W454 = Jawaban [1, 1, 0, 1, ...]
Example:
Soal 1: bobot=0.69 × jawaban=1 → 0.69
Soal 2: bobot=0.85 × jawaban=1 → 0.85
Soal 3: bobot=0.42 × jawaban=0 → 0.00
...
Total_Bobot = 12.5
```
### STEP 5: Nilai Mentah (NM) [0-1000 scale]
```
Formula: NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
Excel: =(Y454/$X$5)*1000
├─ Y454 = Total bobot siswa (e.g., 12.5)
└─ $X$5 = Total bobot maksimum (sum semua bobot, 18.3)
Example: NM = (12.5 / 18.3) × 1000 = 683
Range: 0-1000 (percentage-like scale)
```
### STEP 6: Nilai Nasional (NN) - Z-Score Normalized
```
Formula: NN = 500 + 100 × ((NM - Rataan) / SB)
Excel: =500+(100*((Z454-500)/100))
Components:
- 500 = Target mean (center point)
- 100 = Target standard deviation
- Rataan = Actual mean of NM from all participants
- SB = Actual standard deviation of NM
⚠️ CURRENT CLIENT ISSUE:
Rataan = 500 (hardcoded) → NN = 500 + (NM - 500) = NM
SB = 100 (hardcoded)
Result: NO actual normalization (NN always equals NM)
✅ OUR FIX: Dynamic calculation with 3 modes
```
### Kategori Kesulitan (CTT Standard)
```
Tingkat Kesukaran (p):
p < 0.30 → Sukar (Difficult)
0.30 ≤ p ≤ 0.70 → Sedang (Medium)
p > 0.70 → Mudah (Easy)
Bobot Implications:
p=0.09 → Bobot=0.91 (Sukar, high weight)
p=0.50 → Bobot=0.50 (Sedang, medium weight)
p=0.85 → Bobot=0.15 (Mudah, low weight)
```
***
## 🔄 CTT vs IRT: Understanding Both Approaches
### Classical Test Theory (CTT) - Client Method
**Kelebihan CTT:**
- Mudah dipahami admin/guru
- Tidak butuh banyak data (minimal 100 siswa)
- Compatible dengan sistem existing
- Cepat dihitung
- Formula transparent (visible in Excel)
**Keterbatasan CTT:**
- Sample-dependent (p berubah tiap kelompok)
- Tidak adaptive (soal fixed order)
- Butuh soal baru tiap tes (tidak bisa reuse efisien)
- Normalization issue (jika rataan/SB hardcoded)
### Item Response Theory (IRT) - Modern Adaptive
**Core Formula (1PL Rasch):**
```
P(θ) = 1 / (1 + e^-(θ - b))
θ = Kemampuan user (-3 to +3)
b = Kesulitan item (-3 to +3)
θ = -2 (lemah) → P(correct) di b=-1 = 73%
θ = 0 (average) → P(correct) di b=0 = 50%
θ = +2 (kuat) → P(correct) di b=+2 = 50%
```
**Kelebihan IRT:**
- Item-invariant (b tetap meski kelompok berbeda)
- Adaptive (pilih soal sesuai kemampuan real-time)
- Reuse efficient (1000 user, tiap slot 3 variant cukup)
- Akurat lebih cepat (15 soal IRT = 30 soal CTT)
**Keterbatasan IRT:**
- Butuh kalibrasi (min 100-500 responses per item)
- Kompleks untuk admin non-psikometri
- Butuh sistem adaptive (tidak bisa paper-based)
### Hybrid Solution (This System)
| Aspek | CTT Mode (Start) | Hybrid Mode (Transition) | IRT Mode (Goal) |
| :-- | :-- | :-- | :-- |
| **Admin Input** | p-value dari screenshot | Edit p atau b, sync otomatis | Edit b, p calculated |
| **Item Selection** | Fixed order slot 1-30 | Mixed (CTT fixed + IRT adaptive) | Fully adaptive CAT |
| **Scoring** | NM → NN (screenshot) | Paralel CTT \& IRT scores | θ → NN mapped |
| **Normalization** | Static atau Dynamic | Choose per tryout | Dynamic recommended |
| **AI Generation** | Dari p basis | Dari p atau b | Dari b calibrated |
| **Reuse** | Minimal | Moderate (cache variants) | Maximum (infinite pool) |
***
## 🏗️ System Architecture
### High-Level Flow (Hybrid + Dynamic Normalization)
```
┌─────────────────────────────────────────┐
│ WP Site 1 (Mat SD) │ WP Site 2 (Bahasa SMA)
│ Sejoli Tryout │ Sejoli Tryout
│ CTT Mode: Fixed │ IRT Mode: Adaptive
│ website_id=1 │ website_id=2
└─────────────────────────────────────────┘
│ │
└────────┬───────────┘
│ REST API
│ POST /next_item
│ {mode: "ctt"|"irt"|"hybrid"}
┌──────────────────────────────┐
│ FastAPI Backend (aaPanel) │
├──────────────────────────────┤
│ Hybrid Scoring Engine │
│ ├─ CTT: NM from p-bobot │
│ ├─ IRT: θ from responses │
│ ├─ Normalization: Dynamic │
│ └─ Return primary + secondary│
│ │
│ Dynamic Normalization Engine │
│ ├─ Rataan = AVG(all NM) │
│ ├─ SB = STDEV(all NM) │
│ ├─ Mode switch: Static→Dynamic
│ └─ Real-time update per user │
│ │
│ Item Selection Strategy │
│ ├─ CTT: Slot order (1→2→3) │
│ ├─ IRT: CAT (b ≈ θ) │
│ └─ Hybrid: First 10 CTT, IRT │
└────────────┬─────────────────┘
┌──────────────────────────────┐
│ PostgreSQL Database │
├──────────────────────────────┤
│ items (ADDED: ctt_p, bobot) │
│ user_answers (ADDED: nm, nn) │
│ tryout_config (ADDED: modes) │
│ tryout_stats (NEW: stats) │
└──────────────────────────────┘
```
***
## 💾 Database Schema (v1.2 Final)
### Table: tryout_config
```sql
CREATE TABLE tryout_config (
id SERIAL PRIMARY KEY,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
-- Mode Control
scoring_mode VARCHAR(20) DEFAULT 'ctt', -- 'ctt', 'irt', 'hybrid'
selection_mode VARCHAR(20) DEFAULT 'fixed', -- 'fixed', 'adaptive', 'hybrid'
-- CTT Settings
min_peserta_for_ctt INTEGER DEFAULT 100,
-- Normalization Settings
normalization_mode VARCHAR(20) DEFAULT 'static', -- 'static', 'dynamic', 'hybrid'
static_rataan FLOAT DEFAULT 500,
static_sb FLOAT DEFAULT 100,
min_sample_for_dynamic INTEGER DEFAULT 100,
-- IRT Settings
enable_irt_when_calibrated BOOLEAN DEFAULT FALSE,
min_calibration_sample INTEGER DEFAULT 200,
theta_estimation_method VARCHAR(20) DEFAULT 'mle', -- 'mle', 'eap', 'map'
-- Transition Settings
hybrid_transition_slot INTEGER DEFAULT 10,
fallback_to_ctt_on_error BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(website_id, tryout_id)
);
```
### Table: tryout_stats
```sql
CREATE TABLE tryout_stats (
id SERIAL PRIMARY KEY,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
-- Running Statistics
participant_count INTEGER DEFAULT 0,
total_nm_sum FLOAT DEFAULT 0, -- Σ all NM scores
total_nm_sq_sum FLOAT DEFAULT 0, -- Σ (NM^2) for variance calc
-- Calculated Values (updated on each new participant)
current_rataan FLOAT, -- AVG(all NM)
current_sb FLOAT, -- STDEV(all NM)
min_nm FLOAT,
max_nm FLOAT,
-- Metadata
last_calculated_at TIMESTAMPTZ,
last_participant_id INTEGER,
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(website_id, tryout_id)
);
CREATE INDEX idx_tryout_stats_lookup ON tryout_stats(website_id, tryout_id);
```
### Table: user_answers
```sql
CREATE TABLE user_answers (
id SERIAL PRIMARY KEY,
wp_user_id INTEGER NOT NULL,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
slot INTEGER NOT NULL,
level VARCHAR(20) NOT NULL,
item_id INTEGER NOT NULL,
-- Response Data
response INTEGER NOT NULL, -- 0=incorrect, 1=correct
time_spent INTEGER,
-- CTT Scoring
ctt_bobot_earned FLOAT, -- Bobot if correct, 0 if wrong
ctt_total_bobot_cumulative FLOAT, -- Running Σ bobot earned
ctt_nm FLOAT, -- Nilai Mentah (0-1000)
ctt_nn FLOAT, -- Nilai Nasional (normalized)
-- Normalization Applied
rataan_used FLOAT, -- Rataan value at this calculation
sb_used FLOAT, -- SB value at this calculation
normalization_mode_used VARCHAR(20), -- 'static', 'dynamic', 'hybrid'
-- IRT Scoring
irt_theta FLOAT, -- Ability estimate at this point
irt_theta_se FLOAT, -- Standard error
irt_information FLOAT, -- Information value at this item
-- Metadata
scoring_mode_used VARCHAR(20), -- 'ctt', 'irt', 'hybrid'
answered_at TIMESTAMPTZ DEFAULT NOW(),
FOREIGN KEY (item_id) REFERENCES items(id) ON DELETE CASCADE,
UNIQUE(wp_user_id, website_id, tryout_id, slot, level)
);
CREATE INDEX idx_user_answers_lookup ON user_answers(wp_user_id, website_id, tryout_id);
CREATE INDEX idx_user_answers_scoring ON user_answers(scoring_mode_used, ctt_nn, irt_theta);
```
### Table: items
```sql
CREATE TABLE items (
id SERIAL PRIMARY KEY,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
slot INTEGER NOT NULL,
level VARCHAR(20) NOT NULL, -- 'Mudah', 'Sedang', 'Sulit'
stem TEXT NOT NULL,
options JSONB NOT NULL,
correct CHAR(1) NOT NULL,
explanation TEXT,
-- CTT Parameters (Screenshot Compatible)
ctt_p FLOAT, -- Proportion correct (0.09 from screenshot)
ctt_bobot FLOAT, -- 1 - p (0.91)
ctt_category VARCHAR(20), -- 'Sukar', 'Sedang', 'Mudah'
-- IRT Parameters (Adaptive)
irt_b FLOAT DEFAULT 0.0, -- Difficulty (-3 to +3)
irt_a FLOAT DEFAULT 1.0, -- Discrimination (optional)
irt_c FLOAT DEFAULT 0.25, -- Guessing (optional)
-- Calibration Status
calibrated BOOLEAN DEFAULT FALSE, -- TRUE when 100+ responses analyzed
calibration_sample_size INTEGER DEFAULT 0,
calibration_date TIMESTAMPTZ,
-- Legacy Fields
generated_by VARCHAR(10) NOT NULL, -- 'admin' or 'ai'
ai_model VARCHAR(50),
basis_item_id INTEGER,
category_id INTEGER,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
FOREIGN KEY (basis_item_id) REFERENCES items(id) ON DELETE SET NULL
);
CREATE INDEX idx_items_lookup ON items(website_id, tryout_id, slot, level);
CREATE INDEX idx_items_calibrated ON items(calibrated, calibration_sample_size);
CREATE INDEX idx_items_ctt ON items(ctt_p, ctt_category);
```
***
## 🎯 AI Question Generation (OpenRouter)
### Recommended Models (OpenRouter Free Tier)
| Model | Kenapa Cocok | Cost |
| :-- | :-- | :-- |
| **Qwen3 Coder 480B** | Math/reasoning expert, generate soal + solusi akurat, control difficulty | Free |
| **Llama 3.3 70B Instruct** | Multilingual (Indonesia), Bloom's Taxonomy, recall→analyze | Free |
| **DeepSeek R1/Math** | Math specialist (algebra/geo), outperform frontier models | Low (\$0.1/1M tokens) |
### AI Generation Workflow
**Context:** User 123, Tryout A, Slot 2 (Attempt 2)
1. Python API hitung θ → perlu "Sulit"
2. Check DB: Ada soal Sulit slot 2? ❌
3. AI Generate:
```
POST OpenRouter {
model: 'qwen3-coder-480b',
prompt: "Generate 1 soal Mat SD level Sulit mirip [basis_soal]..."
}
```
4. Parse response → INSERT items (website_id=1, level=Sulit, generated_by='ai')
5. Serve soal baru ke frontend
### Prompt Template (Standardized)
```
Context: Tryout {tryout_id} slot {slot} level {Sulit/Mudah}.
Basis soal: {basis_stem}.
Generate: 1 soal baru {level} dengan:
- Stem: 1 kalimat jelas
- Options: A B C D, 1 benar, 3 distractor logis
- Jawaban: huruf + penjelasan singkat
Bahasa: Indonesia, topik: {category}
```
### Reuse Strategy (Perfect for Scale)
```
User123, Tryout A, Slot 2, Attempt 1: Soal Sedang (statik)
User123, Tryout A, Slot 2, Attempt 2: AI generate → Soal Sulit (simpan DB)
User456, Tryout A, Slot 2, Attempt 2: Check if exist
IF ada Soal Sulit → REUSE (cache hit!)
ELSE → AI generate baru
Scenario 1000 users × 3 attempts:
- Static: 1000 × 30 × 3 = 90,000 soal unik (impossible)
- With AI + Reuse: ~30 static + 60 AI variants = 90 total (99.9% reuse!)
```
***
## 🔧 CTT Scoring Engine Implementation
```python
import numpy as np
from typing import List, Dict
from models import Item, TryoutConfig, TryoutStats
from datetime import datetime
def calculate_ctt_score_exact(
responses: List[Dict],
items: List[Item],
config: TryoutConfig,
db: Session
) -> Dict:
"""
Calculate CTT score using EXACT client Excel formula
Formula breakdown:
1. p = Σ Benar / Total Peserta (per soal)
2. Bobot = 1 - p
3. Total_Bobot_Siswa = SUMPRODUCT(bobot_array, jawaban_array)
4. NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
5. NN = 500 + 100 × ((NM - Rataan) / SB)
"""
# STEP 1: Calculate total bobot earned (SUMPRODUCT equivalent)
total_bobot_earned = 0.0
total_bobot_max = 0.0
total_benar = 0
for response, item in zip(responses, items):
bobot = item.ctt_bobot # Pre-calculated as 1 - p
total_bobot_max += bobot
if response['correct'] == 1:
total_bobot_earned += bobot
total_benar += 1
# STEP 2: Calculate NM (Nilai Mentah)
if total_bobot_max == 0:
nm = 0.0
else:
nm = (total_bobot_earned / total_bobot_max) * 1000
# STEP 3: Get Rataan and SB based on normalization mode
rataan, sb, norm_mode = get_normalization_params(
config,
db,
nm # Current NM to add to stats
)
# STEP 4: Calculate NN (Nilai Nasional)
if sb == 0 or sb is None:
nn = 500.0
else:
nn = 500 + 100 * ((nm - rataan) / sb)
# Clip NN to reasonable range
nn = float(np.clip(nn, 0, 1000))
return {
"mode": "ctt",
"total_benar": total_benar,
"total_bobot_earned": round(total_bobot_earned, 2),
"total_bobot_max": round(total_bobot_max, 2),
"nm": round(nm, 1),
"nn": round(nn, 1),
"rataan_used": round(rataan, 2),
"sb_used": round(sb, 2),
"normalization_mode": norm_mode,
"breakdown": {
"percentage": round((total_bobot_earned / total_bobot_max) * 100, 1) if total_bobot_max > 0 else 0
}
}
def get_normalization_params(
config: TryoutConfig,
db: Session,
current_nm: float
) -> tuple[float, float, str]:
"""
Get rataan and SB based on normalization mode
Returns: (rataan, sb, mode_used)
"""
# Get or create stats
stats = db.query(TryoutStats).filter_by(
website_id=config.website_id,
tryout_id=config.tryout_id
).first()
if not stats:
stats = TryoutStats(
website_id=config.website_id,
tryout_id=config.tryout_id,
participant_count=0,
total_nm_sum=0,
total_nm_sq_sum=0
)
db.add(stats)
db.commit()
# Update running stats with current NM
stats.participant_count += 1
stats.total_nm_sum += current_nm
stats.total_nm_sq_sum += (current_nm ** 2)
# Calculate dynamic rataan and SB
n = stats.participant_count
if n > 1:
mean = stats.total_nm_sum / n
variance = (stats.total_nm_sq_sum / n) - (mean ** 2)
std_dev = np.sqrt(max(0, variance))
stats.current_rataan = mean
stats.current_sb = std_dev
stats.last_calculated_at = datetime.utcnow()
else:
# First participant, use static
stats.current_rataan = config.static_rataan
stats.current_sb = config.static_sb
db.commit()
# Determine which values to use based on mode
if config.normalization_mode == 'static':
return (
config.static_rataan,
config.static_sb,
'static'
)
elif config.normalization_mode == 'dynamic':
if stats.participant_count >= 2:
return (
stats.current_rataan,
stats.current_sb,
'dynamic'
)
else:
return (
config.static_rataan,
config.static_sb,
'static_fallback'
)
elif config.normalization_mode == 'hybrid':
if stats.participant_count >= config.min_sample_for_dynamic:
return (
stats.current_rataan,
stats.current_sb,
'hybrid_dynamic'
)
else:
return (
config.static_rataan,
config.static_sb,
'hybrid_static'
)
else:
return (config.static_rataan, config.static_sb, 'static')
```
***
## 📊 IRT Theta Estimation (MLE)
```python
from scipy.optimize import minimize
import numpy as np
def estimate_theta_mle(responses: List[int], items: List[Item]) -> float:
"""
Estimate ability (theta) using Maximum Likelihood Estimation
1PL Rasch Model: P(θ) = 1 / (1 + e^-(θ - b))
Args:
responses: [1, 0, 1, 1, 0, ...] correct/incorrect
items: [Item(irt_b=-0.5), Item(irt_b=0.2), ...]
Returns:
theta estimate
"""
def neg_log_likelihood(theta_val):
ll = 0
for response, item in zip(responses, items):
b = item.irt_b if item.irt_b else 0
# P(θ) = 1 / (1 + e^-(θ - b))
p = 1 / (1 + np.exp(-(theta_val - b)))
# Log-likelihood
if response == 1:
ll += np.log(max(p, 1e-10)) # Avoid log(0)
else:
ll += np.log(max(1 - p, 1e-10))
return -ll # Negative for minimization
# Initial guess: middle of scale
theta_init = 0
# Optimize
result = minimize(
neg_log_likelihood,
x0=[theta_init],
method='L-BFGS-B',
bounds=[(-3, 3)] # Reasonable theta range
)
theta_estimate = float(result.x[0])
return theta_estimate
def estimate_theta_se(theta: float, items: List[Item]) -> float:
"""
Calculate standard error of theta estimate
Using Fisher information
"""
information = 0
for item in items:
b = item.irt_b if item.irt_b else 0
p = 1 / (1 + np.exp(-(theta - b)))
information += p * (1 - p) # Fisher information for 1PL
if information > 0:
se = 1 / np.sqrt(information)
else:
se = float('inf')
return se
```
***
## 🗂️ API Endpoints (v1.2 Final)
### 1. Next Item (Adaptive Selection)
```
POST /api/v1/session/{session_id}/next_item
Request:
{
"mode": "ctt" | "irt" | "hybrid",
"current_responses": [
{"item_id": 1, "correct": 1},
{"item_id": 2, "correct": 0}
]
}
Response:
{
"item_id": 45,
"slot": 3,
"level": "Sedang",
"stem": "...",
"options": {"A": "...", "B": "...", "C": "...", "D": "...", "E": "..."},
"item_source": "admin" | "ai",
"selection_method": "fixed_order" | "adaptive_ctt" | "adaptive_irt"
}
```
### 2. Complete Session (Scoring)
```
POST /api/v1/session/{session_id}/complete
Response:
{
"status": "completed",
"primary_score": {
"mode": "ctt",
"total_benar": 15,
"total_bobot_earned": 12.5,
"total_bobot_max": 18.3,
"nm": 683.0,
"nn": 618.2,
"rataan_used": 483.5,
"sb_used": 112.3,
"normalization_mode": "dynamic"
},
"secondary_score": {
"mode": "irt",
"theta": 0.85,
"theta_se": 0.42,
"nn_equivalent": 592.5
},
"comparison": {
"nn_difference": 25.7,
"agreement": "moderate"
}
}
```
### 3. Get Tryout Config (with Normalization)
```
GET /api/v1/tryout/{tryout_id}/config
Response:
{
"tryout_id": 123,
"scoring_mode": "ctt",
"normalization_mode": "dynamic",
"static_rataan": 500,
"static_sb": 100,
"current_stats": {
"participant_count": 245,
"current_rataan": 483.5,
"current_sb": 112.3,
"min_nm": 125.0,
"max_nm": 892.0
},
"calibration_status": {
"total_items": 20,
"calibrated_items": 8,
"calibration_percentage": 40
}
}
```
### 4. Update Normalization Settings
```
PUT /api/v1/tryout/{tryout_id}/normalization
Request:
{
"normalization_mode": "hybrid",
"static_rataan": 500,
"static_sb": 100,
"min_sample_for_dynamic": 100
}
Response:
{
"status": "updated",
"normalization_mode": "hybrid",
"current_participant_count": 45,
"will_switch_to_dynamic_at": 100,
"using_mode": "static"
}
```
***
## 📥 Excel Import (OpenCode Ready)
```python
import pandas as pd
import openpyxl
from models import Item, TryoutConfig
def import_excel_tryout(
excel_file: str,
website_id: int,
tryout_id: int,
sheet_name: str = "CONTOH",
db: Session
) -> Dict:
"""
Import from client Excel exactly like PERHITUNGAN-SKOR-TO-3.xlsx
Excel structure:
- Row 1: Headers
- Row 2: Answer key (KUNCI)
- Row 4: TK (p values) formulas
- Row 5: BOBOT formulas
- Row 6+: Student responses
"""
wb = openpyxl.load_workbook(excel_file, data_only=False)
ws = wb[sheet_name]
# Extract answer key from Row 2
answer_key = {}
for col in range(4, ws.max_column + 1):
key_cell = ws.cell(2, col).value
if key_cell and key_cell != "KUNCI":
slot_num = col - 3
answer_key[slot_num] = key_cell.strip().upper()
# Extract TK (p values) from Row 4 - get CALCULATED values
wb_data = openpyxl.load_workbook(excel_file, data_only=True)
ws_data = wb_data[sheet_name]
p_values = {}
for col in range(4, ws.max_column + 1):
slot_num = col - 3
if slot_num in answer_key:
p_cell = ws_data.cell(4, col).value
if p_cell and isinstance(p_cell, (int, float)):
p_values[slot_num] = float(p_cell)
# Calculate bobot (1 - p)
bobot_values = {slot: 1 - p for slot, p in p_values.items()}
# Categorize difficulty
def categorize_difficulty(p: float) -> tuple[str, str]:
if p < 0.30:
return ("Sukar", "Sulit")
elif p > 0.70:
return ("Mudah", "Mudah")
else:
return ("Sedang", "Sedang")
# Create items
items_created = 0
for slot_num, correct_ans in answer_key.items():
p = p_values.get(slot_num, 0.5)
bobot = bobot_values.get(slot_num, 0.5)
ctt_cat, level = categorize_difficulty(p)
# Convert p to IRT b
b = ctt_p_to_irt_b(p)
item = Item(
website_id=website_id,
tryout_id=tryout_id,
slot=slot_num,
level=level,
stem=f"[Import dari Excel - Soal {slot_num}]",
options={"A": "[Option A]", "B": "[Option B]", "C": "[Option C]", "D": "[Option D]", "E": "[Option E]"},
correct=correct_ans,
explanation="",
ctt_p=p,
ctt_bobot=bobot,
ctt_category=ctt_cat,
irt_b=b,
calibrated=False,
calibration_sample_size=0,
generated_by='admin',
category_id=None
)
db.add(item)
items_created += 1
db.commit()
# Configure tryout normalization
config = TryoutConfig(
website_id=website_id,
tryout_id=tryout_id,
scoring_mode='ctt',
selection_mode='fixed',
normalization_mode='static',
static_rataan=500,
static_sb=100,
min_sample_for_dynamic=100
)
db.add(config)
db.commit()
return {
"items_created": items_created,
"normalization_configured": "static (rataan=500, SB=100)"
}
def ctt_p_to_irt_b(p: float) -> float:
"""
Convert CTT p-value to IRT b parameter
Linear approximation: b ≈ -ln((1-p)/p)
"""
if p <= 0 or p >= 1:
p = 0.5
b = -np.log((1 - p) / p)
return float(b)
```
***
## 🚀 Migration Path (Non-Destructive)
### Phase 1: Import Existing Data (Week 1)
```
1. Export current Sejoli Tryout data to Excel
2. Run import script:
python manage.py import_excel_tryout \
--file="PERHITUNGAN-SKOR-TO-3.xlsx" \
--sheet="CONTOH" \
--website_id=1 \
--tryout_id=123
3. Verify:
- All items have ctt_p, ctt_bobot
- IRT b auto-calculated from p
- calibrated=False for all
4. Configure tryout:
- scoring_mode='ctt'
- selection_mode='fixed'
- normalization_mode='static' (like client now)
```
### Phase 2: Collect Calibration Data (Week 2-4)
```
1. Students use tryout normally (CTT mode, static normalization)
2. Backend logs all responses
3. Monitor calibration progress
4. Collect running statistics for dynamic normalization
```
### Phase 3: Enable Dynamic Normalization (Week 5)
```
1. Check participant count: 100+ completed?
2. Update tryout_config:
- normalization_mode='hybrid'
- min_sample_for_dynamic=100
3. Test with 10-20 new students
4. Verify distribution normalized to mean=500, sd=100
```
### Phase 4: Enable IRT Adaptive (Week 6+)
```
1. After 90%+ items calibrated + 1000+ total responses
2. Update to full IRT:
- scoring_mode='irt'
- selection_mode='adaptive'
- normalization_mode='dynamic'
3. Enable AI generation for Mudah/Sulit variants
```
***
## ✅ Success Metrics
### Technical KPIs
1. **Formula Accuracy**: CTT scores match client Excel 100%
2. **Normalization Stability**: SB within 5% of expected after 100 users
3. **Calibration Coverage**: >80% items calibrated
4. **Score Agreement**: CTT vs IRT NN difference <20 points
5. **Fallback Rate**: <5% IRT→CTT fallbacks per session
### Educational KPIs
1. **Measurement Precision**: IRT SE <0.5 after 15 items
2. **Normalization Quality**: Distribution skewness <0.5
3. **Adaptive Efficiency**: 30% reduction in test length (IRT vs CTT)
4. **Student Satisfaction**: >80% prefer adaptive mode
5. **Admin Adoption**: >70% tryouts use hybrid within 3 months
***
## 📋 Complexity Estimation
| Komponen | Effort (Days) | Notes |
| :-- | :-- | :-- |
| Setup FastAPI + PG + Alembic | 3 | Boilerplate |
| Core scoring (CTT/IRT hybrid) | 10 | Math-heavy |
| Dynamic normalization | 5 | Running stats |
| AI generation (OpenRouter) | 5 | API integration |
| Reuse logic + item selection | 8 | Algorithm |
| Admin UI (FastAPI Admin) | 5 | Auto-generated |
| Excel import | 3 | Formula parsing |
| WP integration | 4 | REST API |
| Testing + docs | 7 | Quality |
| Buffer | 5 | Contingency |
| **TOTAL** | **45 days** | **0.8x Sejoli Rebuild** |
***
## 📚 Glossary
- **p (TK)**: Proportion correct / Tingkat Kesukaran (CTT difficulty)
- **Bobot**: 1-p weight (CTT scoring weight)
- **NM**: Nilai Mentah (raw score 0-1000)
- **NN**: Nilai Nasional (normalized 500±100)
- **Rataan**: Mean of NM scores
- **SB**: Simpangan Baku (standard deviation of NM)
- **θ (theta)**: IRT ability (-3 to +3)
- **b**: IRT difficulty (-3 to +3)
- **SE**: Standard error (precision)
- **CAT**: Computerized Adaptive Testing
- **EM**: Expectation-Maximization (calibration method)
- **MLE**: Maximum Likelihood Estimation
***
## 🔗 File References
- **Excel Client:** `PERHITUNGAN-SKOR-TO-3.xlsx` (screenshot reference for formulas)
- **DB Schema:** PostgreSQL with Alembic migrations
- **API:** FastAPI with OpenAPI docs
- **Admin:** FastAPI Admin (auto-generated CRUD)
***
## 📝 Key Guarantees
✅ Existing CTT data safe, IRT adoption gradual, reversible anytime
✅ 100% compatible with client Excel formulas
✅ Dynamic normalization optional (can keep static mode)
✅ Zero data loss during transitions
✅ Non-destructive (Sejoli Tryout tetap jalan, external enhance)
***
**Document Version:** 1.2.0 Final
**Last Updated:** March 21, 2026, 9:31 AM WIB
**Status:** Ready for Implementation via OpenCode 🚀
**By:** Dwindi Ramadhana
**For:** Sejoli Tryout Multi-Website Platform