# IRT-Powered Adaptive Question Bank System ## Final Project Brief \& Technical Specification **Project Name:** IRT Bank Soal (Adaptive Question Bank with AI Generation) **Client:** Sejoli Tryout Multi-Website Platform **Tech Stack:** FastAPI + PostgreSQL + SQLAlchemy + FastAPI Admin + OpenRouter AI **Deployment:** aaPanel VPS (Python Manager + PgSQL Manager) **Version:** 1.2.0 Final (Hybrid CTT+IRT + Dynamic Normalization) **Last Updated:** March 21, 2026, 9:31 AM WIB *** ## 🎯 Executive Summary Sistem bank soal adaptif **hybrid** yang FULLY COMPATIBLE dengan Excel klien existing, dengan enhancement untuk: - **Classical Test Theory (CTT)** - EXACT formula dari screenshot Excel klien (p, bobot, NM, NN) - **Item Response Theory (IRT)** - Modern adaptive testing dengan theta estimation - **AI Generation** - Auto-generate soal variants Mudah/Sulit via OpenRouter (Qwen3 Coder 480B) - **Dynamic Normalization** - Rataan/SB calculated real-time atau manual input - **Multi-Website Support** - 1 backend untuk N WordPress sites (Mat SD, Bahasa SMA, dll) - **Non-Destructive** - 100% backward compatible dengan cara kerja klien sekarang **Core Capabilities:** 1. Dual Scoring Mode: CTT (p, bobot) \& IRT (ΞΈ, b) berjalan paralel 2. Screenshot Compatible: Import langsung dari Excel klien (p=140/458) 3. Exact Formula Match: Implementasi persis formula Excel klien 4. Dynamic Normalization: Auto-calculate rataan/SB atau static mode 5. AI Question Generation: Generate Mudah/Sulit dari basis Sedang (CTT) 6. Full Audit Trail: Track CTTβ†’IRT transition per item *** ## πŸ“‹ Exact Client Formulas (From Excel Analysis) ### STEP 1: Tingkat Kesukaran (TK) per Soal ``` Formula: p = Ξ£ Benar / Total Peserta Excel: =D464/$A$463 β”œβ”€ D464 = Jumlah siswa yang jawab benar soal 1 └─ A463 = Total peserta (e.g., 458) Example: p = 140/458 = 0.3057 β†’ "Sedang" ``` ### STEP 2: Bobot per Soal ``` Formula: Bobot = 1 - p Excel: =1-D4 Example: Bobot = 1 - 0.3057 = 0.6943 Interpretation: - Soal mudah (p=0.8) β†’ bobot=0.2 (nilai rendah) - Soal sulit (p=0.1) β†’ bobot=0.9 (nilai tinggi) ``` ### STEP 3: Total Benar per Siswa ``` Formula: Total_Benar = COUNT(jawaban benar) Excel: =SUM(D454:W454) [20 soal] Example: Siswa benar 15 soal β†’ Total_Benar = 15 ``` ### STEP 4: Total Bobot Earned per Siswa ``` Formula: Total_Bobot = Ξ£ (bobot_soal Γ— jawaban_siswa) Excel: =SUMPRODUCT($D$5:$W$5, D454:W454) β”œβ”€ $D$5:$W$5 = Array bobot [0.69, 0.85, 0.42, ...] └─ D454:W454 = Jawaban [1, 1, 0, 1, ...] Example: Soal 1: bobot=0.69 Γ— jawaban=1 β†’ 0.69 Soal 2: bobot=0.85 Γ— jawaban=1 β†’ 0.85 Soal 3: bobot=0.42 Γ— jawaban=0 β†’ 0.00 ... Total_Bobot = 12.5 ``` ### STEP 5: Nilai Mentah (NM) [0-1000 scale] ``` Formula: NM = (Total_Bobot_Siswa / Total_Bobot_Max) Γ— 1000 Excel: =(Y454/$X$5)*1000 β”œβ”€ Y454 = Total bobot siswa (e.g., 12.5) └─ $X$5 = Total bobot maksimum (sum semua bobot, 18.3) Example: NM = (12.5 / 18.3) Γ— 1000 = 683 Range: 0-1000 (percentage-like scale) ``` ### STEP 6: Nilai Nasional (NN) - Z-Score Normalized ``` Formula: NN = 500 + 100 Γ— ((NM - Rataan) / SB) Excel: =500+(100*((Z454-500)/100)) Components: - 500 = Target mean (center point) - 100 = Target standard deviation - Rataan = Actual mean of NM from all participants - SB = Actual standard deviation of NM ⚠️ CURRENT CLIENT ISSUE: Rataan = 500 (hardcoded) β†’ NN = 500 + (NM - 500) = NM SB = 100 (hardcoded) Result: NO actual normalization (NN always equals NM) βœ… OUR FIX: Dynamic calculation with 3 modes ``` ### Kategori Kesulitan (CTT Standard) ``` Tingkat Kesukaran (p): p < 0.30 β†’ Sukar (Difficult) 0.30 ≀ p ≀ 0.70 β†’ Sedang (Medium) p > 0.70 β†’ Mudah (Easy) Bobot Implications: p=0.09 β†’ Bobot=0.91 (Sukar, high weight) p=0.50 β†’ Bobot=0.50 (Sedang, medium weight) p=0.85 β†’ Bobot=0.15 (Mudah, low weight) ``` *** ## πŸ”„ CTT vs IRT: Understanding Both Approaches ### Classical Test Theory (CTT) - Client Method **Kelebihan CTT:** - Mudah dipahami admin/guru - Tidak butuh banyak data (minimal 100 siswa) - Compatible dengan sistem existing - Cepat dihitung - Formula transparent (visible in Excel) **Keterbatasan CTT:** - Sample-dependent (p berubah tiap kelompok) - Tidak adaptive (soal fixed order) - Butuh soal baru tiap tes (tidak bisa reuse efisien) - Normalization issue (jika rataan/SB hardcoded) ### Item Response Theory (IRT) - Modern Adaptive **Core Formula (1PL Rasch):** ``` P(ΞΈ) = 1 / (1 + e^-(ΞΈ - b)) ΞΈ = Kemampuan user (-3 to +3) b = Kesulitan item (-3 to +3) ΞΈ = -2 (lemah) β†’ P(correct) di b=-1 = 73% ΞΈ = 0 (average) β†’ P(correct) di b=0 = 50% ΞΈ = +2 (kuat) β†’ P(correct) di b=+2 = 50% ``` **Kelebihan IRT:** - Item-invariant (b tetap meski kelompok berbeda) - Adaptive (pilih soal sesuai kemampuan real-time) - Reuse efficient (1000 user, tiap slot 3 variant cukup) - Akurat lebih cepat (15 soal IRT = 30 soal CTT) **Keterbatasan IRT:** - Butuh kalibrasi (min 100-500 responses per item) - Kompleks untuk admin non-psikometri - Butuh sistem adaptive (tidak bisa paper-based) ### Hybrid Solution (This System) | Aspek | CTT Mode (Start) | Hybrid Mode (Transition) | IRT Mode (Goal) | | :-- | :-- | :-- | :-- | | **Admin Input** | p-value dari screenshot | Edit p atau b, sync otomatis | Edit b, p calculated | | **Item Selection** | Fixed order slot 1-30 | Mixed (CTT fixed + IRT adaptive) | Fully adaptive CAT | | **Scoring** | NM β†’ NN (screenshot) | Paralel CTT \& IRT scores | ΞΈ β†’ NN mapped | | **Normalization** | Static atau Dynamic | Choose per tryout | Dynamic recommended | | **AI Generation** | Dari p basis | Dari p atau b | Dari b calibrated | | **Reuse** | Minimal | Moderate (cache variants) | Maximum (infinite pool) | *** ## πŸ—οΈ System Architecture ### High-Level Flow (Hybrid + Dynamic Normalization) ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ WP Site 1 (Mat SD) β”‚ WP Site 2 (Bahasa SMA) β”‚ Sejoli Tryout β”‚ Sejoli Tryout β”‚ CTT Mode: Fixed β”‚ IRT Mode: Adaptive β”‚ website_id=1 β”‚ website_id=2 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ REST API β”‚ POST /next_item β”‚ {mode: "ctt"|"irt"|"hybrid"} β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ FastAPI Backend (aaPanel) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ Hybrid Scoring Engine β”‚ β”‚ β”œβ”€ CTT: NM from p-bobot β”‚ β”‚ β”œβ”€ IRT: ΞΈ from responses β”‚ β”‚ β”œβ”€ Normalization: Dynamic β”‚ β”‚ └─ Return primary + secondaryβ”‚ β”‚ β”‚ β”‚ Dynamic Normalization Engine β”‚ β”‚ β”œβ”€ Rataan = AVG(all NM) β”‚ β”‚ β”œβ”€ SB = STDEV(all NM) β”‚ β”‚ β”œβ”€ Mode switch: Staticβ†’Dynamic β”‚ └─ Real-time update per user β”‚ β”‚ β”‚ β”‚ Item Selection Strategy β”‚ β”‚ β”œβ”€ CTT: Slot order (1β†’2β†’3) β”‚ β”‚ β”œβ”€ IRT: CAT (b β‰ˆ ΞΈ) β”‚ β”‚ └─ Hybrid: First 10 CTT, IRT β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PostgreSQL Database β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ items (ADDED: ctt_p, bobot) β”‚ β”‚ user_answers (ADDED: nm, nn) β”‚ β”‚ tryout_config (ADDED: modes) β”‚ β”‚ tryout_stats (NEW: stats) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` *** ## πŸ’Ύ Database Schema (v1.2 Final) ### Table: tryout_config ```sql CREATE TABLE tryout_config ( id SERIAL PRIMARY KEY, website_id INTEGER NOT NULL, tryout_id INTEGER NOT NULL, -- Mode Control scoring_mode VARCHAR(20) DEFAULT 'ctt', -- 'ctt', 'irt', 'hybrid' selection_mode VARCHAR(20) DEFAULT 'fixed', -- 'fixed', 'adaptive', 'hybrid' -- CTT Settings min_peserta_for_ctt INTEGER DEFAULT 100, -- Normalization Settings normalization_mode VARCHAR(20) DEFAULT 'static', -- 'static', 'dynamic', 'hybrid' static_rataan FLOAT DEFAULT 500, static_sb FLOAT DEFAULT 100, min_sample_for_dynamic INTEGER DEFAULT 100, -- IRT Settings enable_irt_when_calibrated BOOLEAN DEFAULT FALSE, min_calibration_sample INTEGER DEFAULT 200, theta_estimation_method VARCHAR(20) DEFAULT 'mle', -- 'mle', 'eap', 'map' -- Transition Settings hybrid_transition_slot INTEGER DEFAULT 10, fallback_to_ctt_on_error BOOLEAN DEFAULT TRUE, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE(website_id, tryout_id) ); ``` ### Table: tryout_stats ```sql CREATE TABLE tryout_stats ( id SERIAL PRIMARY KEY, website_id INTEGER NOT NULL, tryout_id INTEGER NOT NULL, -- Running Statistics participant_count INTEGER DEFAULT 0, total_nm_sum FLOAT DEFAULT 0, -- Ξ£ all NM scores total_nm_sq_sum FLOAT DEFAULT 0, -- Ξ£ (NM^2) for variance calc -- Calculated Values (updated on each new participant) current_rataan FLOAT, -- AVG(all NM) current_sb FLOAT, -- STDEV(all NM) min_nm FLOAT, max_nm FLOAT, -- Metadata last_calculated_at TIMESTAMPTZ, last_participant_id INTEGER, updated_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE(website_id, tryout_id) ); CREATE INDEX idx_tryout_stats_lookup ON tryout_stats(website_id, tryout_id); ``` ### Table: user_answers ```sql CREATE TABLE user_answers ( id SERIAL PRIMARY KEY, wp_user_id INTEGER NOT NULL, website_id INTEGER NOT NULL, tryout_id INTEGER NOT NULL, slot INTEGER NOT NULL, level VARCHAR(20) NOT NULL, item_id INTEGER NOT NULL, -- Response Data response INTEGER NOT NULL, -- 0=incorrect, 1=correct time_spent INTEGER, -- CTT Scoring ctt_bobot_earned FLOAT, -- Bobot if correct, 0 if wrong ctt_total_bobot_cumulative FLOAT, -- Running Ξ£ bobot earned ctt_nm FLOAT, -- Nilai Mentah (0-1000) ctt_nn FLOAT, -- Nilai Nasional (normalized) -- Normalization Applied rataan_used FLOAT, -- Rataan value at this calculation sb_used FLOAT, -- SB value at this calculation normalization_mode_used VARCHAR(20), -- 'static', 'dynamic', 'hybrid' -- IRT Scoring irt_theta FLOAT, -- Ability estimate at this point irt_theta_se FLOAT, -- Standard error irt_information FLOAT, -- Information value at this item -- Metadata scoring_mode_used VARCHAR(20), -- 'ctt', 'irt', 'hybrid' answered_at TIMESTAMPTZ DEFAULT NOW(), FOREIGN KEY (item_id) REFERENCES items(id) ON DELETE CASCADE, UNIQUE(wp_user_id, website_id, tryout_id, slot, level) ); CREATE INDEX idx_user_answers_lookup ON user_answers(wp_user_id, website_id, tryout_id); CREATE INDEX idx_user_answers_scoring ON user_answers(scoring_mode_used, ctt_nn, irt_theta); ``` ### Table: items ```sql CREATE TABLE items ( id SERIAL PRIMARY KEY, website_id INTEGER NOT NULL, tryout_id INTEGER NOT NULL, slot INTEGER NOT NULL, level VARCHAR(20) NOT NULL, -- 'Mudah', 'Sedang', 'Sulit' stem TEXT NOT NULL, options JSONB NOT NULL, correct CHAR(1) NOT NULL, explanation TEXT, -- CTT Parameters (Screenshot Compatible) ctt_p FLOAT, -- Proportion correct (0.09 from screenshot) ctt_bobot FLOAT, -- 1 - p (0.91) ctt_category VARCHAR(20), -- 'Sukar', 'Sedang', 'Mudah' -- IRT Parameters (Adaptive) irt_b FLOAT DEFAULT 0.0, -- Difficulty (-3 to +3) irt_a FLOAT DEFAULT 1.0, -- Discrimination (optional) irt_c FLOAT DEFAULT 0.25, -- Guessing (optional) -- Calibration Status calibrated BOOLEAN DEFAULT FALSE, -- TRUE when 100+ responses analyzed calibration_sample_size INTEGER DEFAULT 0, calibration_date TIMESTAMPTZ, -- Legacy Fields generated_by VARCHAR(10) NOT NULL, -- 'admin' or 'ai' ai_model VARCHAR(50), basis_item_id INTEGER, category_id INTEGER, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), FOREIGN KEY (basis_item_id) REFERENCES items(id) ON DELETE SET NULL ); CREATE INDEX idx_items_lookup ON items(website_id, tryout_id, slot, level); CREATE INDEX idx_items_calibrated ON items(calibrated, calibration_sample_size); CREATE INDEX idx_items_ctt ON items(ctt_p, ctt_category); ``` *** ## 🎯 AI Question Generation (OpenRouter) ### Recommended Models (OpenRouter Free Tier) | Model | Kenapa Cocok | Cost | | :-- | :-- | :-- | | **Qwen3 Coder 480B** | Math/reasoning expert, generate soal + solusi akurat, control difficulty | Free | | **Llama 3.3 70B Instruct** | Multilingual (Indonesia), Bloom's Taxonomy, recallβ†’analyze | Free | | **DeepSeek R1/Math** | Math specialist (algebra/geo), outperform frontier models | Low (\$0.1/1M tokens) | ### AI Generation Workflow **Context:** User 123, Tryout A, Slot 2 (Attempt 2) 1. Python API hitung ΞΈ β†’ perlu "Sulit" 2. Check DB: Ada soal Sulit slot 2? ❌ 3. AI Generate: ``` POST OpenRouter { model: 'qwen3-coder-480b', prompt: "Generate 1 soal Mat SD level Sulit mirip [basis_soal]..." } ``` 4. Parse response β†’ INSERT items (website_id=1, level=Sulit, generated_by='ai') 5. Serve soal baru ke frontend ### Prompt Template (Standardized) ``` Context: Tryout {tryout_id} slot {slot} level {Sulit/Mudah}. Basis soal: {basis_stem}. Generate: 1 soal baru {level} dengan: - Stem: 1 kalimat jelas - Options: A B C D, 1 benar, 3 distractor logis - Jawaban: huruf + penjelasan singkat Bahasa: Indonesia, topik: {category} ``` ### Reuse Strategy (Perfect for Scale) ``` User123, Tryout A, Slot 2, Attempt 1: Soal Sedang (statik) User123, Tryout A, Slot 2, Attempt 2: AI generate β†’ Soal Sulit (simpan DB) User456, Tryout A, Slot 2, Attempt 2: Check if exist IF ada Soal Sulit β†’ REUSE (cache hit!) ELSE β†’ AI generate baru Scenario 1000 users Γ— 3 attempts: - Static: 1000 Γ— 30 Γ— 3 = 90,000 soal unik (impossible) - With AI + Reuse: ~30 static + 60 AI variants = 90 total (99.9% reuse!) ``` *** ## πŸ”§ CTT Scoring Engine Implementation ```python import numpy as np from typing import List, Dict from models import Item, TryoutConfig, TryoutStats from datetime import datetime def calculate_ctt_score_exact( responses: List[Dict], items: List[Item], config: TryoutConfig, db: Session ) -> Dict: """ Calculate CTT score using EXACT client Excel formula Formula breakdown: 1. p = Ξ£ Benar / Total Peserta (per soal) 2. Bobot = 1 - p 3. Total_Bobot_Siswa = SUMPRODUCT(bobot_array, jawaban_array) 4. NM = (Total_Bobot_Siswa / Total_Bobot_Max) Γ— 1000 5. NN = 500 + 100 Γ— ((NM - Rataan) / SB) """ # STEP 1: Calculate total bobot earned (SUMPRODUCT equivalent) total_bobot_earned = 0.0 total_bobot_max = 0.0 total_benar = 0 for response, item in zip(responses, items): bobot = item.ctt_bobot # Pre-calculated as 1 - p total_bobot_max += bobot if response['correct'] == 1: total_bobot_earned += bobot total_benar += 1 # STEP 2: Calculate NM (Nilai Mentah) if total_bobot_max == 0: nm = 0.0 else: nm = (total_bobot_earned / total_bobot_max) * 1000 # STEP 3: Get Rataan and SB based on normalization mode rataan, sb, norm_mode = get_normalization_params( config, db, nm # Current NM to add to stats ) # STEP 4: Calculate NN (Nilai Nasional) if sb == 0 or sb is None: nn = 500.0 else: nn = 500 + 100 * ((nm - rataan) / sb) # Clip NN to reasonable range nn = float(np.clip(nn, 0, 1000)) return { "mode": "ctt", "total_benar": total_benar, "total_bobot_earned": round(total_bobot_earned, 2), "total_bobot_max": round(total_bobot_max, 2), "nm": round(nm, 1), "nn": round(nn, 1), "rataan_used": round(rataan, 2), "sb_used": round(sb, 2), "normalization_mode": norm_mode, "breakdown": { "percentage": round((total_bobot_earned / total_bobot_max) * 100, 1) if total_bobot_max > 0 else 0 } } def get_normalization_params( config: TryoutConfig, db: Session, current_nm: float ) -> tuple[float, float, str]: """ Get rataan and SB based on normalization mode Returns: (rataan, sb, mode_used) """ # Get or create stats stats = db.query(TryoutStats).filter_by( website_id=config.website_id, tryout_id=config.tryout_id ).first() if not stats: stats = TryoutStats( website_id=config.website_id, tryout_id=config.tryout_id, participant_count=0, total_nm_sum=0, total_nm_sq_sum=0 ) db.add(stats) db.commit() # Update running stats with current NM stats.participant_count += 1 stats.total_nm_sum += current_nm stats.total_nm_sq_sum += (current_nm ** 2) # Calculate dynamic rataan and SB n = stats.participant_count if n > 1: mean = stats.total_nm_sum / n variance = (stats.total_nm_sq_sum / n) - (mean ** 2) std_dev = np.sqrt(max(0, variance)) stats.current_rataan = mean stats.current_sb = std_dev stats.last_calculated_at = datetime.utcnow() else: # First participant, use static stats.current_rataan = config.static_rataan stats.current_sb = config.static_sb db.commit() # Determine which values to use based on mode if config.normalization_mode == 'static': return ( config.static_rataan, config.static_sb, 'static' ) elif config.normalization_mode == 'dynamic': if stats.participant_count >= 2: return ( stats.current_rataan, stats.current_sb, 'dynamic' ) else: return ( config.static_rataan, config.static_sb, 'static_fallback' ) elif config.normalization_mode == 'hybrid': if stats.participant_count >= config.min_sample_for_dynamic: return ( stats.current_rataan, stats.current_sb, 'hybrid_dynamic' ) else: return ( config.static_rataan, config.static_sb, 'hybrid_static' ) else: return (config.static_rataan, config.static_sb, 'static') ``` *** ## πŸ“Š IRT Theta Estimation (MLE) ```python from scipy.optimize import minimize import numpy as np def estimate_theta_mle(responses: List[int], items: List[Item]) -> float: """ Estimate ability (theta) using Maximum Likelihood Estimation 1PL Rasch Model: P(ΞΈ) = 1 / (1 + e^-(ΞΈ - b)) Args: responses: [1, 0, 1, 1, 0, ...] correct/incorrect items: [Item(irt_b=-0.5), Item(irt_b=0.2), ...] Returns: theta estimate """ def neg_log_likelihood(theta_val): ll = 0 for response, item in zip(responses, items): b = item.irt_b if item.irt_b else 0 # P(ΞΈ) = 1 / (1 + e^-(ΞΈ - b)) p = 1 / (1 + np.exp(-(theta_val - b))) # Log-likelihood if response == 1: ll += np.log(max(p, 1e-10)) # Avoid log(0) else: ll += np.log(max(1 - p, 1e-10)) return -ll # Negative for minimization # Initial guess: middle of scale theta_init = 0 # Optimize result = minimize( neg_log_likelihood, x0=[theta_init], method='L-BFGS-B', bounds=[(-3, 3)] # Reasonable theta range ) theta_estimate = float(result.x[0]) return theta_estimate def estimate_theta_se(theta: float, items: List[Item]) -> float: """ Calculate standard error of theta estimate Using Fisher information """ information = 0 for item in items: b = item.irt_b if item.irt_b else 0 p = 1 / (1 + np.exp(-(theta - b))) information += p * (1 - p) # Fisher information for 1PL if information > 0: se = 1 / np.sqrt(information) else: se = float('inf') return se ``` *** ## πŸ—‚οΈ API Endpoints (v1.2 Final) ### 1. Next Item (Adaptive Selection) ``` POST /api/v1/session/{session_id}/next_item Request: { "mode": "ctt" | "irt" | "hybrid", "current_responses": [ {"item_id": 1, "correct": 1}, {"item_id": 2, "correct": 0} ] } Response: { "item_id": 45, "slot": 3, "level": "Sedang", "stem": "...", "options": {"A": "...", "B": "...", "C": "...", "D": "...", "E": "..."}, "item_source": "admin" | "ai", "selection_method": "fixed_order" | "adaptive_ctt" | "adaptive_irt" } ``` ### 2. Complete Session (Scoring) ``` POST /api/v1/session/{session_id}/complete Response: { "status": "completed", "primary_score": { "mode": "ctt", "total_benar": 15, "total_bobot_earned": 12.5, "total_bobot_max": 18.3, "nm": 683.0, "nn": 618.2, "rataan_used": 483.5, "sb_used": 112.3, "normalization_mode": "dynamic" }, "secondary_score": { "mode": "irt", "theta": 0.85, "theta_se": 0.42, "nn_equivalent": 592.5 }, "comparison": { "nn_difference": 25.7, "agreement": "moderate" } } ``` ### 3. Get Tryout Config (with Normalization) ``` GET /api/v1/tryout/{tryout_id}/config Response: { "tryout_id": 123, "scoring_mode": "ctt", "normalization_mode": "dynamic", "static_rataan": 500, "static_sb": 100, "current_stats": { "participant_count": 245, "current_rataan": 483.5, "current_sb": 112.3, "min_nm": 125.0, "max_nm": 892.0 }, "calibration_status": { "total_items": 20, "calibrated_items": 8, "calibration_percentage": 40 } } ``` ### 4. Update Normalization Settings ``` PUT /api/v1/tryout/{tryout_id}/normalization Request: { "normalization_mode": "hybrid", "static_rataan": 500, "static_sb": 100, "min_sample_for_dynamic": 100 } Response: { "status": "updated", "normalization_mode": "hybrid", "current_participant_count": 45, "will_switch_to_dynamic_at": 100, "using_mode": "static" } ``` *** ## πŸ“₯ Excel Import (OpenCode Ready) ```python import pandas as pd import openpyxl from models import Item, TryoutConfig def import_excel_tryout( excel_file: str, website_id: int, tryout_id: int, sheet_name: str = "CONTOH", db: Session ) -> Dict: """ Import from client Excel exactly like PERHITUNGAN-SKOR-TO-3.xlsx Excel structure: - Row 1: Headers - Row 2: Answer key (KUNCI) - Row 4: TK (p values) formulas - Row 5: BOBOT formulas - Row 6+: Student responses """ wb = openpyxl.load_workbook(excel_file, data_only=False) ws = wb[sheet_name] # Extract answer key from Row 2 answer_key = {} for col in range(4, ws.max_column + 1): key_cell = ws.cell(2, col).value if key_cell and key_cell != "KUNCI": slot_num = col - 3 answer_key[slot_num] = key_cell.strip().upper() # Extract TK (p values) from Row 4 - get CALCULATED values wb_data = openpyxl.load_workbook(excel_file, data_only=True) ws_data = wb_data[sheet_name] p_values = {} for col in range(4, ws.max_column + 1): slot_num = col - 3 if slot_num in answer_key: p_cell = ws_data.cell(4, col).value if p_cell and isinstance(p_cell, (int, float)): p_values[slot_num] = float(p_cell) # Calculate bobot (1 - p) bobot_values = {slot: 1 - p for slot, p in p_values.items()} # Categorize difficulty def categorize_difficulty(p: float) -> tuple[str, str]: if p < 0.30: return ("Sukar", "Sulit") elif p > 0.70: return ("Mudah", "Mudah") else: return ("Sedang", "Sedang") # Create items items_created = 0 for slot_num, correct_ans in answer_key.items(): p = p_values.get(slot_num, 0.5) bobot = bobot_values.get(slot_num, 0.5) ctt_cat, level = categorize_difficulty(p) # Convert p to IRT b b = ctt_p_to_irt_b(p) item = Item( website_id=website_id, tryout_id=tryout_id, slot=slot_num, level=level, stem=f"[Import dari Excel - Soal {slot_num}]", options={"A": "[Option A]", "B": "[Option B]", "C": "[Option C]", "D": "[Option D]", "E": "[Option E]"}, correct=correct_ans, explanation="", ctt_p=p, ctt_bobot=bobot, ctt_category=ctt_cat, irt_b=b, calibrated=False, calibration_sample_size=0, generated_by='admin', category_id=None ) db.add(item) items_created += 1 db.commit() # Configure tryout normalization config = TryoutConfig( website_id=website_id, tryout_id=tryout_id, scoring_mode='ctt', selection_mode='fixed', normalization_mode='static', static_rataan=500, static_sb=100, min_sample_for_dynamic=100 ) db.add(config) db.commit() return { "items_created": items_created, "normalization_configured": "static (rataan=500, SB=100)" } def ctt_p_to_irt_b(p: float) -> float: """ Convert CTT p-value to IRT b parameter Linear approximation: b β‰ˆ -ln((1-p)/p) """ if p <= 0 or p >= 1: p = 0.5 b = -np.log((1 - p) / p) return float(b) ``` *** ## πŸš€ Migration Path (Non-Destructive) ### Phase 1: Import Existing Data (Week 1) ``` 1. Export current Sejoli Tryout data to Excel 2. Run import script: python manage.py import_excel_tryout \ --file="PERHITUNGAN-SKOR-TO-3.xlsx" \ --sheet="CONTOH" \ --website_id=1 \ --tryout_id=123 3. Verify: - All items have ctt_p, ctt_bobot - IRT b auto-calculated from p - calibrated=False for all 4. Configure tryout: - scoring_mode='ctt' - selection_mode='fixed' - normalization_mode='static' (like client now) ``` ### Phase 2: Collect Calibration Data (Week 2-4) ``` 1. Students use tryout normally (CTT mode, static normalization) 2. Backend logs all responses 3. Monitor calibration progress 4. Collect running statistics for dynamic normalization ``` ### Phase 3: Enable Dynamic Normalization (Week 5) ``` 1. Check participant count: 100+ completed? 2. Update tryout_config: - normalization_mode='hybrid' - min_sample_for_dynamic=100 3. Test with 10-20 new students 4. Verify distribution normalized to mean=500, sd=100 ``` ### Phase 4: Enable IRT Adaptive (Week 6+) ``` 1. After 90%+ items calibrated + 1000+ total responses 2. Update to full IRT: - scoring_mode='irt' - selection_mode='adaptive' - normalization_mode='dynamic' 3. Enable AI generation for Mudah/Sulit variants ``` *** ## βœ… Success Metrics ### Technical KPIs 1. **Formula Accuracy**: CTT scores match client Excel 100% 2. **Normalization Stability**: SB within 5% of expected after 100 users 3. **Calibration Coverage**: >80% items calibrated 4. **Score Agreement**: CTT vs IRT NN difference <20 points 5. **Fallback Rate**: <5% IRTβ†’CTT fallbacks per session ### Educational KPIs 1. **Measurement Precision**: IRT SE <0.5 after 15 items 2. **Normalization Quality**: Distribution skewness <0.5 3. **Adaptive Efficiency**: 30% reduction in test length (IRT vs CTT) 4. **Student Satisfaction**: >80% prefer adaptive mode 5. **Admin Adoption**: >70% tryouts use hybrid within 3 months *** ## πŸ“‹ Complexity Estimation | Komponen | Effort (Days) | Notes | | :-- | :-- | :-- | | Setup FastAPI + PG + Alembic | 3 | Boilerplate | | Core scoring (CTT/IRT hybrid) | 10 | Math-heavy | | Dynamic normalization | 5 | Running stats | | AI generation (OpenRouter) | 5 | API integration | | Reuse logic + item selection | 8 | Algorithm | | Admin UI (FastAPI Admin) | 5 | Auto-generated | | Excel import | 3 | Formula parsing | | WP integration | 4 | REST API | | Testing + docs | 7 | Quality | | Buffer | 5 | Contingency | | **TOTAL** | **45 days** | **0.8x Sejoli Rebuild** | *** ## πŸ“š Glossary - **p (TK)**: Proportion correct / Tingkat Kesukaran (CTT difficulty) - **Bobot**: 1-p weight (CTT scoring weight) - **NM**: Nilai Mentah (raw score 0-1000) - **NN**: Nilai Nasional (normalized 500Β±100) - **Rataan**: Mean of NM scores - **SB**: Simpangan Baku (standard deviation of NM) - **ΞΈ (theta)**: IRT ability (-3 to +3) - **b**: IRT difficulty (-3 to +3) - **SE**: Standard error (precision) - **CAT**: Computerized Adaptive Testing - **EM**: Expectation-Maximization (calibration method) - **MLE**: Maximum Likelihood Estimation *** ## πŸ”— File References - **Excel Client:** `PERHITUNGAN-SKOR-TO-3.xlsx` (screenshot reference for formulas) - **DB Schema:** PostgreSQL with Alembic migrations - **API:** FastAPI with OpenAPI docs - **Admin:** FastAPI Admin (auto-generated CRUD) *** ## πŸ“ Key Guarantees βœ… Existing CTT data safe, IRT adoption gradual, reversible anytime βœ… 100% compatible with client Excel formulas βœ… Dynamic normalization optional (can keep static mode) βœ… Zero data loss during transitions βœ… Non-destructive (Sejoli Tryout tetap jalan, external enhance) *** **Document Version:** 1.2.0 Final **Last Updated:** March 21, 2026, 9:31 AM WIB **Status:** Ready for Implementation via OpenCode πŸš€ **By:** Dwindi Ramadhana **For:** Sejoli Tryout Multi-Website Platform