30 KiB
IRT-Powered Adaptive Question Bank System
Final Project Brief & Technical Specification
Project Name: IRT Bank Soal (Adaptive Question Bank with AI Generation) Client: Sejoli Tryout Multi-Website Platform Tech Stack: FastAPI + PostgreSQL + SQLAlchemy + FastAPI Admin + OpenRouter AI Deployment: aaPanel VPS (Python Manager + PgSQL Manager) Version: 1.2.0 Final (Hybrid CTT+IRT + Dynamic Normalization) Last Updated: March 21, 2026, 9:31 AM WIB
🎯 Executive Summary
Sistem bank soal adaptif hybrid yang FULLY COMPATIBLE dengan Excel klien existing, dengan enhancement untuk:
- Classical Test Theory (CTT) - EXACT formula dari screenshot Excel klien (p, bobot, NM, NN)
- Item Response Theory (IRT) - Modern adaptive testing dengan theta estimation
- AI Generation - Auto-generate soal variants Mudah/Sulit via OpenRouter (Qwen3 Coder 480B)
- Dynamic Normalization - Rataan/SB calculated real-time atau manual input
- Multi-Website Support - 1 backend untuk N WordPress sites (Mat SD, Bahasa SMA, dll)
- Non-Destructive - 100% backward compatible dengan cara kerja klien sekarang
Core Capabilities:
- Dual Scoring Mode: CTT (p, bobot) & IRT (θ, b) berjalan paralel
- Screenshot Compatible: Import langsung dari Excel klien (p=140/458)
- Exact Formula Match: Implementasi persis formula Excel klien
- Dynamic Normalization: Auto-calculate rataan/SB atau static mode
- AI Question Generation: Generate Mudah/Sulit dari basis Sedang (CTT)
- Full Audit Trail: Track CTT→IRT transition per item
📋 Exact Client Formulas (From Excel Analysis)
STEP 1: Tingkat Kesukaran (TK) per Soal
Formula: p = Σ Benar / Total Peserta
Excel: =D464/$A$463
├─ D464 = Jumlah siswa yang jawab benar soal 1
└─ A463 = Total peserta (e.g., 458)
Example: p = 140/458 = 0.3057 → "Sedang"
STEP 2: Bobot per Soal
Formula: Bobot = 1 - p
Excel: =1-D4
Example: Bobot = 1 - 0.3057 = 0.6943
Interpretation:
- Soal mudah (p=0.8) → bobot=0.2 (nilai rendah)
- Soal sulit (p=0.1) → bobot=0.9 (nilai tinggi)
STEP 3: Total Benar per Siswa
Formula: Total_Benar = COUNT(jawaban benar)
Excel: =SUM(D454:W454) [20 soal]
Example: Siswa benar 15 soal → Total_Benar = 15
STEP 4: Total Bobot Earned per Siswa
Formula: Total_Bobot = Σ (bobot_soal × jawaban_siswa)
Excel: =SUMPRODUCT($D$5:$W$5, D454:W454)
├─ $D$5:$W$5 = Array bobot [0.69, 0.85, 0.42, ...]
└─ D454:W454 = Jawaban [1, 1, 0, 1, ...]
Example:
Soal 1: bobot=0.69 × jawaban=1 → 0.69
Soal 2: bobot=0.85 × jawaban=1 → 0.85
Soal 3: bobot=0.42 × jawaban=0 → 0.00
...
Total_Bobot = 12.5
STEP 5: Nilai Mentah (NM) [0-1000 scale]
Formula: NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
Excel: =(Y454/$X$5)*1000
├─ Y454 = Total bobot siswa (e.g., 12.5)
└─ $X$5 = Total bobot maksimum (sum semua bobot, 18.3)
Example: NM = (12.5 / 18.3) × 1000 = 683
Range: 0-1000 (percentage-like scale)
STEP 6: Nilai Nasional (NN) - Z-Score Normalized
Formula: NN = 500 + 100 × ((NM - Rataan) / SB)
Excel: =500+(100*((Z454-500)/100))
Components:
- 500 = Target mean (center point)
- 100 = Target standard deviation
- Rataan = Actual mean of NM from all participants
- SB = Actual standard deviation of NM
⚠️ CURRENT CLIENT ISSUE:
Rataan = 500 (hardcoded) → NN = 500 + (NM - 500) = NM
SB = 100 (hardcoded)
Result: NO actual normalization (NN always equals NM)
✅ OUR FIX: Dynamic calculation with 3 modes
Kategori Kesulitan (CTT Standard)
Tingkat Kesukaran (p):
p < 0.30 → Sukar (Difficult)
0.30 ≤ p ≤ 0.70 → Sedang (Medium)
p > 0.70 → Mudah (Easy)
Bobot Implications:
p=0.09 → Bobot=0.91 (Sukar, high weight)
p=0.50 → Bobot=0.50 (Sedang, medium weight)
p=0.85 → Bobot=0.15 (Mudah, low weight)
🔄 CTT vs IRT: Understanding Both Approaches
Classical Test Theory (CTT) - Client Method
Kelebihan CTT:
- Mudah dipahami admin/guru
- Tidak butuh banyak data (minimal 100 siswa)
- Compatible dengan sistem existing
- Cepat dihitung
- Formula transparent (visible in Excel)
Keterbatasan CTT:
- Sample-dependent (p berubah tiap kelompok)
- Tidak adaptive (soal fixed order)
- Butuh soal baru tiap tes (tidak bisa reuse efisien)
- Normalization issue (jika rataan/SB hardcoded)
Item Response Theory (IRT) - Modern Adaptive
Core Formula (1PL Rasch):
P(θ) = 1 / (1 + e^-(θ - b))
θ = Kemampuan user (-3 to +3)
b = Kesulitan item (-3 to +3)
θ = -2 (lemah) → P(correct) di b=-1 = 73%
θ = 0 (average) → P(correct) di b=0 = 50%
θ = +2 (kuat) → P(correct) di b=+2 = 50%
Kelebihan IRT:
- Item-invariant (b tetap meski kelompok berbeda)
- Adaptive (pilih soal sesuai kemampuan real-time)
- Reuse efficient (1000 user, tiap slot 3 variant cukup)
- Akurat lebih cepat (15 soal IRT = 30 soal CTT)
Keterbatasan IRT:
- Butuh kalibrasi (min 100-500 responses per item)
- Kompleks untuk admin non-psikometri
- Butuh sistem adaptive (tidak bisa paper-based)
Hybrid Solution (This System)
| Aspek | CTT Mode (Start) | Hybrid Mode (Transition) | IRT Mode (Goal) |
|---|---|---|---|
| Admin Input | p-value dari screenshot | Edit p atau b, sync otomatis | Edit b, p calculated |
| Item Selection | Fixed order slot 1-30 | Mixed (CTT fixed + IRT adaptive) | Fully adaptive CAT |
| Scoring | NM → NN (screenshot) | Paralel CTT & IRT scores | θ → NN mapped |
| Normalization | Static atau Dynamic | Choose per tryout | Dynamic recommended |
| AI Generation | Dari p basis | Dari p atau b | Dari b calibrated |
| Reuse | Minimal | Moderate (cache variants) | Maximum (infinite pool) |
🏗️ System Architecture
High-Level Flow (Hybrid + Dynamic Normalization)
┌─────────────────────────────────────────┐
│ WP Site 1 (Mat SD) │ WP Site 2 (Bahasa SMA)
│ Sejoli Tryout │ Sejoli Tryout
│ CTT Mode: Fixed │ IRT Mode: Adaptive
│ website_id=1 │ website_id=2
└─────────────────────────────────────────┘
│ │
└────────┬───────────┘
│ REST API
│ POST /next_item
│ {mode: "ctt"|"irt"|"hybrid"}
▼
┌──────────────────────────────┐
│ FastAPI Backend (aaPanel) │
├──────────────────────────────┤
│ Hybrid Scoring Engine │
│ ├─ CTT: NM from p-bobot │
│ ├─ IRT: θ from responses │
│ ├─ Normalization: Dynamic │
│ └─ Return primary + secondary│
│ │
│ Dynamic Normalization Engine │
│ ├─ Rataan = AVG(all NM) │
│ ├─ SB = STDEV(all NM) │
│ ├─ Mode switch: Static→Dynamic
│ └─ Real-time update per user │
│ │
│ Item Selection Strategy │
│ ├─ CTT: Slot order (1→2→3) │
│ ├─ IRT: CAT (b ≈ θ) │
│ └─ Hybrid: First 10 CTT, IRT │
└────────────┬─────────────────┘
│
▼
┌──────────────────────────────┐
│ PostgreSQL Database │
├──────────────────────────────┤
│ items (ADDED: ctt_p, bobot) │
│ user_answers (ADDED: nm, nn) │
│ tryout_config (ADDED: modes) │
│ tryout_stats (NEW: stats) │
└──────────────────────────────┘
💾 Database Schema (v1.2 Final)
Table: tryout_config
CREATE TABLE tryout_config (
id SERIAL PRIMARY KEY,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
-- Mode Control
scoring_mode VARCHAR(20) DEFAULT 'ctt', -- 'ctt', 'irt', 'hybrid'
selection_mode VARCHAR(20) DEFAULT 'fixed', -- 'fixed', 'adaptive', 'hybrid'
-- CTT Settings
min_peserta_for_ctt INTEGER DEFAULT 100,
-- Normalization Settings
normalization_mode VARCHAR(20) DEFAULT 'static', -- 'static', 'dynamic', 'hybrid'
static_rataan FLOAT DEFAULT 500,
static_sb FLOAT DEFAULT 100,
min_sample_for_dynamic INTEGER DEFAULT 100,
-- IRT Settings
enable_irt_when_calibrated BOOLEAN DEFAULT FALSE,
min_calibration_sample INTEGER DEFAULT 200,
theta_estimation_method VARCHAR(20) DEFAULT 'mle', -- 'mle', 'eap', 'map'
-- Transition Settings
hybrid_transition_slot INTEGER DEFAULT 10,
fallback_to_ctt_on_error BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(website_id, tryout_id)
);
Table: tryout_stats
CREATE TABLE tryout_stats (
id SERIAL PRIMARY KEY,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
-- Running Statistics
participant_count INTEGER DEFAULT 0,
total_nm_sum FLOAT DEFAULT 0, -- Σ all NM scores
total_nm_sq_sum FLOAT DEFAULT 0, -- Σ (NM^2) for variance calc
-- Calculated Values (updated on each new participant)
current_rataan FLOAT, -- AVG(all NM)
current_sb FLOAT, -- STDEV(all NM)
min_nm FLOAT,
max_nm FLOAT,
-- Metadata
last_calculated_at TIMESTAMPTZ,
last_participant_id INTEGER,
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(website_id, tryout_id)
);
CREATE INDEX idx_tryout_stats_lookup ON tryout_stats(website_id, tryout_id);
Table: user_answers
CREATE TABLE user_answers (
id SERIAL PRIMARY KEY,
wp_user_id INTEGER NOT NULL,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
slot INTEGER NOT NULL,
level VARCHAR(20) NOT NULL,
item_id INTEGER NOT NULL,
-- Response Data
response INTEGER NOT NULL, -- 0=incorrect, 1=correct
time_spent INTEGER,
-- CTT Scoring
ctt_bobot_earned FLOAT, -- Bobot if correct, 0 if wrong
ctt_total_bobot_cumulative FLOAT, -- Running Σ bobot earned
ctt_nm FLOAT, -- Nilai Mentah (0-1000)
ctt_nn FLOAT, -- Nilai Nasional (normalized)
-- Normalization Applied
rataan_used FLOAT, -- Rataan value at this calculation
sb_used FLOAT, -- SB value at this calculation
normalization_mode_used VARCHAR(20), -- 'static', 'dynamic', 'hybrid'
-- IRT Scoring
irt_theta FLOAT, -- Ability estimate at this point
irt_theta_se FLOAT, -- Standard error
irt_information FLOAT, -- Information value at this item
-- Metadata
scoring_mode_used VARCHAR(20), -- 'ctt', 'irt', 'hybrid'
answered_at TIMESTAMPTZ DEFAULT NOW(),
FOREIGN KEY (item_id) REFERENCES items(id) ON DELETE CASCADE,
UNIQUE(wp_user_id, website_id, tryout_id, slot, level)
);
CREATE INDEX idx_user_answers_lookup ON user_answers(wp_user_id, website_id, tryout_id);
CREATE INDEX idx_user_answers_scoring ON user_answers(scoring_mode_used, ctt_nn, irt_theta);
Table: items
CREATE TABLE items (
id SERIAL PRIMARY KEY,
website_id INTEGER NOT NULL,
tryout_id INTEGER NOT NULL,
slot INTEGER NOT NULL,
level VARCHAR(20) NOT NULL, -- 'Mudah', 'Sedang', 'Sulit'
stem TEXT NOT NULL,
options JSONB NOT NULL,
correct CHAR(1) NOT NULL,
explanation TEXT,
-- CTT Parameters (Screenshot Compatible)
ctt_p FLOAT, -- Proportion correct (0.09 from screenshot)
ctt_bobot FLOAT, -- 1 - p (0.91)
ctt_category VARCHAR(20), -- 'Sukar', 'Sedang', 'Mudah'
-- IRT Parameters (Adaptive)
irt_b FLOAT DEFAULT 0.0, -- Difficulty (-3 to +3)
irt_a FLOAT DEFAULT 1.0, -- Discrimination (optional)
irt_c FLOAT DEFAULT 0.25, -- Guessing (optional)
-- Calibration Status
calibrated BOOLEAN DEFAULT FALSE, -- TRUE when 100+ responses analyzed
calibration_sample_size INTEGER DEFAULT 0,
calibration_date TIMESTAMPTZ,
-- Legacy Fields
generated_by VARCHAR(10) NOT NULL, -- 'admin' or 'ai'
ai_model VARCHAR(50),
basis_item_id INTEGER,
category_id INTEGER,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
FOREIGN KEY (basis_item_id) REFERENCES items(id) ON DELETE SET NULL
);
CREATE INDEX idx_items_lookup ON items(website_id, tryout_id, slot, level);
CREATE INDEX idx_items_calibrated ON items(calibrated, calibration_sample_size);
CREATE INDEX idx_items_ctt ON items(ctt_p, ctt_category);
🎯 AI Question Generation (OpenRouter)
Recommended Models (OpenRouter Free Tier)
| Model | Kenapa Cocok | Cost |
|---|---|---|
| Qwen3 Coder 480B | Math/reasoning expert, generate soal + solusi akurat, control difficulty | Free |
| Llama 3.3 70B Instruct | Multilingual (Indonesia), Bloom's Taxonomy, recall→analyze | Free |
| DeepSeek R1/Math | Math specialist (algebra/geo), outperform frontier models | Low ($0.1/1M tokens) |
AI Generation Workflow
Context: User 123, Tryout A, Slot 2 (Attempt 2)
- Python API hitung θ → perlu "Sulit"
- Check DB: Ada soal Sulit slot 2? ❌
- AI Generate:
POST OpenRouter {
model: 'qwen3-coder-480b',
prompt: "Generate 1 soal Mat SD level Sulit mirip [basis_soal]..."
}
- Parse response → INSERT items (website_id=1, level=Sulit, generated_by='ai')
- Serve soal baru ke frontend
Prompt Template (Standardized)
Context: Tryout {tryout_id} slot {slot} level {Sulit/Mudah}.
Basis soal: {basis_stem}.
Generate: 1 soal baru {level} dengan:
- Stem: 1 kalimat jelas
- Options: A B C D, 1 benar, 3 distractor logis
- Jawaban: huruf + penjelasan singkat
Bahasa: Indonesia, topik: {category}
Reuse Strategy (Perfect for Scale)
User123, Tryout A, Slot 2, Attempt 1: Soal Sedang (statik)
User123, Tryout A, Slot 2, Attempt 2: AI generate → Soal Sulit (simpan DB)
User456, Tryout A, Slot 2, Attempt 2: Check if exist
IF ada Soal Sulit → REUSE (cache hit!)
ELSE → AI generate baru
Scenario 1000 users × 3 attempts:
- Static: 1000 × 30 × 3 = 90,000 soal unik (impossible)
- With AI + Reuse: ~30 static + 60 AI variants = 90 total (99.9% reuse!)
🔧 CTT Scoring Engine Implementation
import numpy as np
from typing import List, Dict
from models import Item, TryoutConfig, TryoutStats
from datetime import datetime
def calculate_ctt_score_exact(
responses: List[Dict],
items: List[Item],
config: TryoutConfig,
db: Session
) -> Dict:
"""
Calculate CTT score using EXACT client Excel formula
Formula breakdown:
1. p = Σ Benar / Total Peserta (per soal)
2. Bobot = 1 - p
3. Total_Bobot_Siswa = SUMPRODUCT(bobot_array, jawaban_array)
4. NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
5. NN = 500 + 100 × ((NM - Rataan) / SB)
"""
# STEP 1: Calculate total bobot earned (SUMPRODUCT equivalent)
total_bobot_earned = 0.0
total_bobot_max = 0.0
total_benar = 0
for response, item in zip(responses, items):
bobot = item.ctt_bobot # Pre-calculated as 1 - p
total_bobot_max += bobot
if response['correct'] == 1:
total_bobot_earned += bobot
total_benar += 1
# STEP 2: Calculate NM (Nilai Mentah)
if total_bobot_max == 0:
nm = 0.0
else:
nm = (total_bobot_earned / total_bobot_max) * 1000
# STEP 3: Get Rataan and SB based on normalization mode
rataan, sb, norm_mode = get_normalization_params(
config,
db,
nm # Current NM to add to stats
)
# STEP 4: Calculate NN (Nilai Nasional)
if sb == 0 or sb is None:
nn = 500.0
else:
nn = 500 + 100 * ((nm - rataan) / sb)
# Clip NN to reasonable range
nn = float(np.clip(nn, 0, 1000))
return {
"mode": "ctt",
"total_benar": total_benar,
"total_bobot_earned": round(total_bobot_earned, 2),
"total_bobot_max": round(total_bobot_max, 2),
"nm": round(nm, 1),
"nn": round(nn, 1),
"rataan_used": round(rataan, 2),
"sb_used": round(sb, 2),
"normalization_mode": norm_mode,
"breakdown": {
"percentage": round((total_bobot_earned / total_bobot_max) * 100, 1) if total_bobot_max > 0 else 0
}
}
def get_normalization_params(
config: TryoutConfig,
db: Session,
current_nm: float
) -> tuple[float, float, str]:
"""
Get rataan and SB based on normalization mode
Returns: (rataan, sb, mode_used)
"""
# Get or create stats
stats = db.query(TryoutStats).filter_by(
website_id=config.website_id,
tryout_id=config.tryout_id
).first()
if not stats:
stats = TryoutStats(
website_id=config.website_id,
tryout_id=config.tryout_id,
participant_count=0,
total_nm_sum=0,
total_nm_sq_sum=0
)
db.add(stats)
db.commit()
# Update running stats with current NM
stats.participant_count += 1
stats.total_nm_sum += current_nm
stats.total_nm_sq_sum += (current_nm ** 2)
# Calculate dynamic rataan and SB
n = stats.participant_count
if n > 1:
mean = stats.total_nm_sum / n
variance = (stats.total_nm_sq_sum / n) - (mean ** 2)
std_dev = np.sqrt(max(0, variance))
stats.current_rataan = mean
stats.current_sb = std_dev
stats.last_calculated_at = datetime.utcnow()
else:
# First participant, use static
stats.current_rataan = config.static_rataan
stats.current_sb = config.static_sb
db.commit()
# Determine which values to use based on mode
if config.normalization_mode == 'static':
return (
config.static_rataan,
config.static_sb,
'static'
)
elif config.normalization_mode == 'dynamic':
if stats.participant_count >= 2:
return (
stats.current_rataan,
stats.current_sb,
'dynamic'
)
else:
return (
config.static_rataan,
config.static_sb,
'static_fallback'
)
elif config.normalization_mode == 'hybrid':
if stats.participant_count >= config.min_sample_for_dynamic:
return (
stats.current_rataan,
stats.current_sb,
'hybrid_dynamic'
)
else:
return (
config.static_rataan,
config.static_sb,
'hybrid_static'
)
else:
return (config.static_rataan, config.static_sb, 'static')
📊 IRT Theta Estimation (MLE)
from scipy.optimize import minimize
import numpy as np
def estimate_theta_mle(responses: List[int], items: List[Item]) -> float:
"""
Estimate ability (theta) using Maximum Likelihood Estimation
1PL Rasch Model: P(θ) = 1 / (1 + e^-(θ - b))
Args:
responses: [1, 0, 1, 1, 0, ...] correct/incorrect
items: [Item(irt_b=-0.5), Item(irt_b=0.2), ...]
Returns:
theta estimate
"""
def neg_log_likelihood(theta_val):
ll = 0
for response, item in zip(responses, items):
b = item.irt_b if item.irt_b else 0
# P(θ) = 1 / (1 + e^-(θ - b))
p = 1 / (1 + np.exp(-(theta_val - b)))
# Log-likelihood
if response == 1:
ll += np.log(max(p, 1e-10)) # Avoid log(0)
else:
ll += np.log(max(1 - p, 1e-10))
return -ll # Negative for minimization
# Initial guess: middle of scale
theta_init = 0
# Optimize
result = minimize(
neg_log_likelihood,
x0=[theta_init],
method='L-BFGS-B',
bounds=[(-3, 3)] # Reasonable theta range
)
theta_estimate = float(result.x[0])
return theta_estimate
def estimate_theta_se(theta: float, items: List[Item]) -> float:
"""
Calculate standard error of theta estimate
Using Fisher information
"""
information = 0
for item in items:
b = item.irt_b if item.irt_b else 0
p = 1 / (1 + np.exp(-(theta - b)))
information += p * (1 - p) # Fisher information for 1PL
if information > 0:
se = 1 / np.sqrt(information)
else:
se = float('inf')
return se
🗂️ API Endpoints (v1.2 Final)
1. Next Item (Adaptive Selection)
POST /api/v1/session/{session_id}/next_item
Request:
{
"mode": "ctt" | "irt" | "hybrid",
"current_responses": [
{"item_id": 1, "correct": 1},
{"item_id": 2, "correct": 0}
]
}
Response:
{
"item_id": 45,
"slot": 3,
"level": "Sedang",
"stem": "...",
"options": {"A": "...", "B": "...", "C": "...", "D": "...", "E": "..."},
"item_source": "admin" | "ai",
"selection_method": "fixed_order" | "adaptive_ctt" | "adaptive_irt"
}
2. Complete Session (Scoring)
POST /api/v1/session/{session_id}/complete
Response:
{
"status": "completed",
"primary_score": {
"mode": "ctt",
"total_benar": 15,
"total_bobot_earned": 12.5,
"total_bobot_max": 18.3,
"nm": 683.0,
"nn": 618.2,
"rataan_used": 483.5,
"sb_used": 112.3,
"normalization_mode": "dynamic"
},
"secondary_score": {
"mode": "irt",
"theta": 0.85,
"theta_se": 0.42,
"nn_equivalent": 592.5
},
"comparison": {
"nn_difference": 25.7,
"agreement": "moderate"
}
}
3. Get Tryout Config (with Normalization)
GET /api/v1/tryout/{tryout_id}/config
Response:
{
"tryout_id": 123,
"scoring_mode": "ctt",
"normalization_mode": "dynamic",
"static_rataan": 500,
"static_sb": 100,
"current_stats": {
"participant_count": 245,
"current_rataan": 483.5,
"current_sb": 112.3,
"min_nm": 125.0,
"max_nm": 892.0
},
"calibration_status": {
"total_items": 20,
"calibrated_items": 8,
"calibration_percentage": 40
}
}
4. Update Normalization Settings
PUT /api/v1/tryout/{tryout_id}/normalization
Request:
{
"normalization_mode": "hybrid",
"static_rataan": 500,
"static_sb": 100,
"min_sample_for_dynamic": 100
}
Response:
{
"status": "updated",
"normalization_mode": "hybrid",
"current_participant_count": 45,
"will_switch_to_dynamic_at": 100,
"using_mode": "static"
}
📥 Excel Import (OpenCode Ready)
import pandas as pd
import openpyxl
from models import Item, TryoutConfig
def import_excel_tryout(
excel_file: str,
website_id: int,
tryout_id: int,
sheet_name: str = "CONTOH",
db: Session
) -> Dict:
"""
Import from client Excel exactly like PERHITUNGAN-SKOR-TO-3.xlsx
Excel structure:
- Row 1: Headers
- Row 2: Answer key (KUNCI)
- Row 4: TK (p values) formulas
- Row 5: BOBOT formulas
- Row 6+: Student responses
"""
wb = openpyxl.load_workbook(excel_file, data_only=False)
ws = wb[sheet_name]
# Extract answer key from Row 2
answer_key = {}
for col in range(4, ws.max_column + 1):
key_cell = ws.cell(2, col).value
if key_cell and key_cell != "KUNCI":
slot_num = col - 3
answer_key[slot_num] = key_cell.strip().upper()
# Extract TK (p values) from Row 4 - get CALCULATED values
wb_data = openpyxl.load_workbook(excel_file, data_only=True)
ws_data = wb_data[sheet_name]
p_values = {}
for col in range(4, ws.max_column + 1):
slot_num = col - 3
if slot_num in answer_key:
p_cell = ws_data.cell(4, col).value
if p_cell and isinstance(p_cell, (int, float)):
p_values[slot_num] = float(p_cell)
# Calculate bobot (1 - p)
bobot_values = {slot: 1 - p for slot, p in p_values.items()}
# Categorize difficulty
def categorize_difficulty(p: float) -> tuple[str, str]:
if p < 0.30:
return ("Sukar", "Sulit")
elif p > 0.70:
return ("Mudah", "Mudah")
else:
return ("Sedang", "Sedang")
# Create items
items_created = 0
for slot_num, correct_ans in answer_key.items():
p = p_values.get(slot_num, 0.5)
bobot = bobot_values.get(slot_num, 0.5)
ctt_cat, level = categorize_difficulty(p)
# Convert p to IRT b
b = ctt_p_to_irt_b(p)
item = Item(
website_id=website_id,
tryout_id=tryout_id,
slot=slot_num,
level=level,
stem=f"[Import dari Excel - Soal {slot_num}]",
options={"A": "[Option A]", "B": "[Option B]", "C": "[Option C]", "D": "[Option D]", "E": "[Option E]"},
correct=correct_ans,
explanation="",
ctt_p=p,
ctt_bobot=bobot,
ctt_category=ctt_cat,
irt_b=b,
calibrated=False,
calibration_sample_size=0,
generated_by='admin',
category_id=None
)
db.add(item)
items_created += 1
db.commit()
# Configure tryout normalization
config = TryoutConfig(
website_id=website_id,
tryout_id=tryout_id,
scoring_mode='ctt',
selection_mode='fixed',
normalization_mode='static',
static_rataan=500,
static_sb=100,
min_sample_for_dynamic=100
)
db.add(config)
db.commit()
return {
"items_created": items_created,
"normalization_configured": "static (rataan=500, SB=100)"
}
def ctt_p_to_irt_b(p: float) -> float:
"""
Convert CTT p-value to IRT b parameter
Linear approximation: b ≈ -ln((1-p)/p)
"""
if p <= 0 or p >= 1:
p = 0.5
b = -np.log((1 - p) / p)
return float(b)
🚀 Migration Path (Non-Destructive)
Phase 1: Import Existing Data (Week 1)
1. Export current Sejoli Tryout data to Excel
2. Run import script:
python manage.py import_excel_tryout \
--file="PERHITUNGAN-SKOR-TO-3.xlsx" \
--sheet="CONTOH" \
--website_id=1 \
--tryout_id=123
3. Verify:
- All items have ctt_p, ctt_bobot
- IRT b auto-calculated from p
- calibrated=False for all
4. Configure tryout:
- scoring_mode='ctt'
- selection_mode='fixed'
- normalization_mode='static' (like client now)
Phase 2: Collect Calibration Data (Week 2-4)
1. Students use tryout normally (CTT mode, static normalization)
2. Backend logs all responses
3. Monitor calibration progress
4. Collect running statistics for dynamic normalization
Phase 3: Enable Dynamic Normalization (Week 5)
1. Check participant count: 100+ completed?
2. Update tryout_config:
- normalization_mode='hybrid'
- min_sample_for_dynamic=100
3. Test with 10-20 new students
4. Verify distribution normalized to mean=500, sd=100
Phase 4: Enable IRT Adaptive (Week 6+)
1. After 90%+ items calibrated + 1000+ total responses
2. Update to full IRT:
- scoring_mode='irt'
- selection_mode='adaptive'
- normalization_mode='dynamic'
3. Enable AI generation for Mudah/Sulit variants
✅ Success Metrics
Technical KPIs
- Formula Accuracy: CTT scores match client Excel 100%
- Normalization Stability: SB within 5% of expected after 100 users
- Calibration Coverage: >80% items calibrated
- Score Agreement: CTT vs IRT NN difference <20 points
- Fallback Rate: <5% IRT→CTT fallbacks per session
Educational KPIs
- Measurement Precision: IRT SE <0.5 after 15 items
- Normalization Quality: Distribution skewness <0.5
- Adaptive Efficiency: 30% reduction in test length (IRT vs CTT)
- Student Satisfaction: >80% prefer adaptive mode
- Admin Adoption: >70% tryouts use hybrid within 3 months
📋 Complexity Estimation
| Komponen | Effort (Days) | Notes |
|---|---|---|
| Setup FastAPI + PG + Alembic | 3 | Boilerplate |
| Core scoring (CTT/IRT hybrid) | 10 | Math-heavy |
| Dynamic normalization | 5 | Running stats |
| AI generation (OpenRouter) | 5 | API integration |
| Reuse logic + item selection | 8 | Algorithm |
| Admin UI (FastAPI Admin) | 5 | Auto-generated |
| Excel import | 3 | Formula parsing |
| WP integration | 4 | REST API |
| Testing + docs | 7 | Quality |
| Buffer | 5 | Contingency |
| TOTAL | 45 days | 0.8x Sejoli Rebuild |
📚 Glossary
- p (TK): Proportion correct / Tingkat Kesukaran (CTT difficulty)
- Bobot: 1-p weight (CTT scoring weight)
- NM: Nilai Mentah (raw score 0-1000)
- NN: Nilai Nasional (normalized 500±100)
- Rataan: Mean of NM scores
- SB: Simpangan Baku (standard deviation of NM)
- θ (theta): IRT ability (-3 to +3)
- b: IRT difficulty (-3 to +3)
- SE: Standard error (precision)
- CAT: Computerized Adaptive Testing
- EM: Expectation-Maximization (calibration method)
- MLE: Maximum Likelihood Estimation
🔗 File References
- Excel Client:
PERHITUNGAN-SKOR-TO-3.xlsx(screenshot reference for formulas) - DB Schema: PostgreSQL with Alembic migrations
- API: FastAPI with OpenAPI docs
- Admin: FastAPI Admin (auto-generated CRUD)
📝 Key Guarantees
✅ Existing CTT data safe, IRT adoption gradual, reversible anytime ✅ 100% compatible with client Excel formulas ✅ Dynamic normalization optional (can keep static mode) ✅ Zero data loss during transitions ✅ Non-destructive (Sejoli Tryout tetap jalan, external enhance)
Document Version: 1.2.0 Final Last Updated: March 21, 2026, 9:31 AM WIB Status: Ready for Implementation via OpenCode 🚀
By: Dwindi Ramadhana For: Sejoli Tryout Multi-Website Platform