# Product Requirements Document (PRD) ## IRT-Powered Adaptive Question Bank System **Document Version:** 1.1 **Date:** March 21, 2026 (Updated) **Product Name:** IRT Bank Soal (Adaptive Question Bank with AI Generation) **Client:** Sejoli Tryout Multi-Website Platform **Status:** Draft - Clarifications Incorporated --- ## Changelog ### v1.1 (March 21, 2026) - Added **AI Generation**: 1 request = 1 question, no approval workflow - Added **Admin Playground**: Admin can test AI generation without saving to DB - Updated **Normalization Control**: Optional manual/automatic mode, system handles auto when sufficient data - Updated **IRT → CTT Rollback**: Historical IRT scores preserved, CTT applied to new sessions only - Removed **Admin Permissions/Role-based Access**: Not needed (each admin per site via WordPress) - Updated **Custom Dashboards**: Use FastAPI Admin only (no custom dashboards) - Added **AI Generation Toggle**: Global on/off switch for cost control - Added **User-level Question Reuse**: Check if student already answered at difficulty level - Updated **Student UX**: Admin sees internal metrics, students see only primary score - Added **Data Retention**: Keep all data (no policy yet) - Added **Reporting Section**: Student performance, Item analysis, Calibration status, Tryout comparison - Updated **Admin Persona Note**: This project is backend tool for IRT/CTT calculation; WordPress handles static questions --- ## 1. Product Vision ### 1.1 Vision Statement To provide an adaptive, intelligent question bank system that seamlessly integrates with Sejoli's existing Excel-based workflow while introducing modern Item Response Theory (IRT) capabilities and AI-powered question generation, enabling more accurate and efficient student assessment. ### 1.1.1 Primary Goals - **100% Excel Compatibility**: Maintain exact formula compatibility with client's existing Excel workflow (CTT scoring with p, bobot, NM, NN) - **Gradual Modernization**: Enable smooth transition from Classical Test Theory (CTT) to Item Response Theory (IRT) - **Adaptive Assessment**: Provide Computerized Adaptive Testing (CAT) capabilities for more efficient and accurate measurement - **AI-Enhanced Content**: Automatically generate question variants (Mudah/Sulit) from base Sedang questions - **Multi-Site Support**: Single backend serving multiple WordPress-powered educational sites - **Non-Destructive**: Zero disruption to existing operations - all enhancements are additive ### 1.1.2 Success Metrics - **Technical**: CTT scores match client Excel 100%, IRT calibration >80% coverage - **Educational**: 30% reduction in test length with IRT vs CTT, measurement precision (SE < 0.5 after 15 items) - **Adoption**: >70% tryouts use hybrid mode within 3 months, >80% student satisfaction with adaptive mode - **Efficiency**: 99.9% question reuse rate via AI-generated variants --- ## 2. User Personas ### 2.1 Administrators (School/Guru) **Profile:** Non-technical education professionals managing tryouts **Pain Points:** - Excel-based scoring is manual and time-consuming - Static questions require constant new content creation - Difficulty normalization requires manual calculation - Limited ability to compare student performance across groups **Needs:** - Simple, transparent scoring formulas (CTT mode) - Easy Excel import/export workflow - Clear visualizations of student performance - Configurable normalization (static vs dynamic) - Optional advanced features (IRT) without complexity ### 2.2 Students **Profile:** Students taking tryouts for assessment **Pain Points:** - Fixed-length tests regardless of ability level - Question difficulty may not match their skill - Long testing sessions with low-value questions **Needs:** - Adaptive tests that match their ability level - Shorter, more efficient assessment - Clear feedback on strengths/weaknesses - Consistent scoring across attempts ### 2.3 Content Creators **Profile:** Staff creating and managing question banks **Pain Points:** - Creating 3 difficulty variants per question is time-consuming - Limited question pool for repeated assessments - Manual categorization of difficulty levels **Needs:** - AI-assisted question generation - Easy difficulty level adjustment - Reuse of base questions with variant generation - Bulk question management tools ### 2.4 Technical Administrators **Profile:** IT staff managing the platform **Pain Points:** - Multiple WordPress sites with separate databases - Difficulty scaling question pools - Maintenance of complex scoring systems **Needs:** - Centralized backend for multiple sites - Scalable architecture (AA-panel VPS) - REST API for WordPress integration - Automated calibration and normalization - **Note**: Each admin manages static questions within WordPress; this project provides the backend tool for IRT/CTT calculation and dynamic question selection --- ## 3. Functional Requirements ### 3.1 CTT Scoring (Classical Test Theory) **FR-1.1** System must calculate tingkat kesukaran (p) per question using exact client Excel formula: ``` p = Σ Benar / Total Peserta ``` **Acceptance Criteria:** - p-value calculated per question for each tryout - Values stored in database (items.ctt_p) - Results match client Excel to 4 decimal places **FR-1.2** System must calculate bobot (weight) per question: ``` Bobot = 1 - p ``` **Acceptance Criteria:** - Bobot calculated and stored (items.ctt_bobot) - Easy questions (p > 0.70) have low bobot (< 0.30) - Difficult questions (p < 0.30) have high bobot (> 0.70) **FR-1.3** System must calculate Nilai Mentah (NM) per student: ``` NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000 ``` **Acceptance Criteria:** - NM ranges 0-1000 - SUMPRODUCT equivalent implemented correctly - Results stored per response (user_answers.ctt_nm) **FR-1.4** System must calculate Nilai Nasional (NN) with normalization: ``` NN = 500 + 100 × ((NM - Rataan) / SB) ``` **Acceptance Criteria:** - NN normalized to mean=500, SD=100 - Support static (hardcoded rataan/SB) and dynamic (real-time) modes - NN clipped to 0-1000 range **FR-1.5** System must categorize question difficulty per CTT standards: - p < 0.30 → Sukar (Sulit) - 0.30 ≤ p ≤ 0.70 → Sedang - p > 0.70 → Mudah **Acceptance Criteria:** - Category assigned (items.ctt_category) - Used for level field (items.level) ### 3.2 IRT Scoring (Item Response Theory) **FR-2.1** System must implement 1PL Rasch model: ``` P(θ) = 1 / (1 + e^-(θ - b)) ``` **Acceptance Criteria:** - θ (ability) estimated per student - b (difficulty) calibrated per question - Ranges: θ, b ∈ [-3, +3] **FR-2.2** System must estimate θ using Maximum Likelihood Estimation (MLE) **Acceptance Criteria:** - Initial guess θ = 0 - Optimization bounds [-3, +3] - Standard error (SE) calculated using Fisher information **FR-2.3** System must calibrate b parameters from response data **Acceptance Criteria:** - Minimum 100-500 responses per item for calibration - Calibration status tracked (items.calibrated) - Auto-convert CTT p to initial b: `b ≈ -ln((1-p)/p)` **FR-2.4** System must map θ to NN for CTT comparison **Acceptance Criteria:** - θ ∈ [-3, +3] mapped to NN ∈ [0, 1000] - Formula: `NN = 500 + (θ / 3) × 500` - Secondary score returned in API responses ### 3.3 Hybrid Mode **FR-3.1** System must support dual scoring (CTT + IRT parallel) **Acceptance Criteria:** - Both scores calculated per response - Primary/secondary score returned - Admin can choose which to display **FR-3.2** System must support hybrid item selection **Acceptance Criteria:** - First N items: fixed order (CTT mode) - Remaining items: adaptive (IRT mode) - Configurable transition point (tryout_config.hybrid_transition_slot) **FR-3.3** System must support hybrid normalization **Acceptance Criteria:** - Static mode for small samples (< threshold) - Dynamic mode for large samples (≥ threshold) - Configurable threshold (tryout_config.min_sample_for_dynamic) ### 3.4 Dynamic Normalization **FR-4.1** System must maintain running statistics per tryout **Acceptance Criteria:** - Track: participant_count, total_nm_sum, total_nm_sq_sum - Update on each completed session - Stored in tryout_stats table **FR-4.2** System must calculate real-time rataan and SB **Acceptance Criteria:** - Rataan = mean(all NM) - SB = sqrt(variance(all NM)) - Updated incrementally (no full recalc) **FR-4.3** System must support optional normalization control (manual vs automatic) **Acceptance Criteria:** - Admin can choose manual mode (static normalization with hardcoded values) - Admin can choose automatic mode (dynamic normalization when sufficient data) - When automatic selected and sufficient data reached: system handles normalization automatically - Configurable threshold: min_sample_for_dynamic (default: 100) - Admin can switch between manual/automatic at any time - System displays current data readiness (participant count vs threshold) ### 3.5 AI Question Generation **FR-5.1** System must generate question variants via OpenRouter API **Acceptance Criteria:** - Generate Mudah variant from Sedang base - Generate Sulit variant from Sedang base - Generate same-level variant from Sedang base - Use Qwen3 Coder 480B or Llama 3.3 70B - **1 request = 1 question** (not batch generation) **FR-5.2** System must use standardized prompt template **Acceptance Criteria:** - Include context (tryout_id, slot, level) - Include basis soal for reference (provides topic/context) - Request 1 question with 4 options - Include explanation - Maintain same context, vary only difficulty level **FR-5.3** System must implement question reuse/caching with user-level tracking **Acceptance Criteria:** - Check DB for existing variant before generating - Check if student user_id already answered question at specific difficulty level - Reuse if found (same tryout_id, slot, level) - Generate only if cache miss OR user hasn't answered at this difficulty **FR-5.4** System must provide admin playground for AI testing **Acceptance Criteria:** - Admin can request AI generation without saving to database - Admin can re-request unlimited times until satisfied (no approval workflow) - Preview mode shows generated question before saving - Admin can edit content before saving - Purpose: Build admin trust in AI quality before enabling for students **FR-5.5** System must parse and store AI-generated questions **Acceptance Criteria:** - Parse stem, options, correct answer, explanation - Store in items table with generated_by='ai' - Link to basis_item_id - No approval workflow required for student tests **FR-5.6** System must support AI generation toggle **Acceptance Criteria:** - Global toggle to enable/disable AI generation (config.AI_generation_enabled) - When disabled: reuse DB questions regardless of repetition - When enabled: generate new variants if cache miss - Admin can toggle on/off based on cost/budget ### 3.6 Item Selection **FR-6.1** System must support fixed order selection (CTT mode) **Acceptance Criteria:** - Items delivered in slot order (1, 2, 3, ...) - No adaptive logic - Used when selection_mode='fixed' **FR-6.2** System must support adaptive selection (IRT mode) **Acceptance Criteria:** - Select item where b ≈ current θ - Prioritize calibrated items - Use item information to maximize precision **FR-6.3** System must support level-based selection (hybrid mode) **Acceptance Criteria:** - Select from specified level (Mudah/Sedang/Sulit) - Check if level variant exists in DB - Generate via AI if not exists ### 3.7 Excel Import **FR-7.1** System must import from client Excel format **Acceptance Criteria:** - Parse answer key (Row 2, KUNCI) - Extract calculated p-values (Row 4, data_only=True) - Extract bobot values (Row 5) - Import student responses (Row 6+) **FR-7.2** System must create items from Excel import **Acceptance Criteria:** - Create item per question slot - Set ctt_p, ctt_bobot, ctt_category - Auto-calculate irt_b from ctt_p - Set calibrated=False **FR-7.3** System must configure tryout from Excel import **Acceptance Criteria:** - Create tryout_config with CTT settings - Set normalization_mode='static' (default) - Set static_rataan=500, static_sb=100 ### 3.8 API Endpoints **FR-8.1** System must provide Next Item endpoint **Acceptance Criteria:** - POST /api/v1/session/{session_id}/next_item - Accept mode (ctt/irt/hybrid) - Accept current_responses array - Return item with selection_method metadata **FR-8.2** System must provide Complete Session endpoint **Acceptance Criteria:** - POST /api/v1/session/{session_id}/complete - Return primary_score (CTT or IRT) - Return secondary_score (parallel calculation) - Return comparison (NN difference, agreement) **FR-8.3** System must provide Get Tryout Config endpoint **Acceptance Criteria:** - GET /api/v1/tryout/{tryout_id}/config - Return scoring_mode, normalization_mode - Return current_stats (participant_count, rataan, SB) - Return calibration_status **FR-8.4** System must provide Update Normalization endpoint **Acceptance Criteria:** - PUT /api/v1/tryout/{tryout_id}/normalization - Accept normalization_mode update - Accept static_rataan, static_sb overrides - Return will_switch_to_dynamic_at threshold ### 3.9 Multi-Site Support **FR-9.1** System must support multiple WordPress sites **Acceptance Criteria:** - Each site has unique website_id - Shared backend, isolated data per site - API responses scoped to website_id **FR-9.2** System must support per-site configuration **Acceptance Criteria:** - Each (website_id, tryout_id) pair unique - Independent tryout_config per tryout - Independent tryout_stats per tryout --- ## 4. Non-Functional Requirements ### 4.1 Performance **NFR-4.1.1** Next Item API response time < 500ms **NFR-4.1.2** Complete Session API response time < 2s **NFR-4.1.3** AI question generation < 10s (OpenRouter timeout) **NFR-4.1.4** Support 1000 concurrent students ### 4.2 Scalability **NFR-4.2.1** Support 10,000+ items in database **NFR-4.2.2** Support 100,000+ student responses **NFR-4.2.3** Question reuse: 99.9% cache hit rate after initial generation **NFR-4.2.4** Horizontal scaling via PostgreSQL read replicas ### 4.3 Reliability **NFR-4.3.1** 99.9% uptime for tryout periods **NFR-4.3.2** Automatic fallback to CTT if IRT fails **NFR-4.3.3** Database transaction consistency **NFR-4.3.4** Graceful degradation if AI API unavailable ### 4.4 Security **NFR-4.4.1** API authentication via WordPress tokens **NFR-4.4.2** Website_id isolation (no cross-site data access) **NFR-4.4.3** Rate limiting per API key **NFR-4.4.4** Audit trail for all scoring changes ### 4.5 Compatibility **NFR-4.5.1** 100% formula match with client Excel **NFR-4.5.2** Non-destructive: zero data loss during transitions **NFR-4.5.3** Reversible: can disable IRT features anytime **NFR-4.5.4** WordPress REST API integration ### 4.6 Maintainability **NFR-4.6.1** FastAPI Admin auto-generated UI for CRUD **NFR-4.6.2** Alembic migrations for schema changes **NFR-4.6.3** Comprehensive API documentation (OpenAPI) **NFR-4.6.4** Logging for debugging scoring calculations --- ## 5. Data Requirements ### 5.1 Core Entities #### Items - **id**: Primary key - **website_id, tryout_id**: Composite key for multi-site - **slot, level**: Position and difficulty - **stem, options, correct, explanation**: Question content - **ctt_p, ctt_bobot, ctt_category**: CTT parameters - **irt_b, irt_a, irt_c**: IRT parameters - **calibrated, calibration_sample_size**: Calibration status - **generated_by, ai_model, basis_item_id**: AI generation metadata #### User Answers - **id**: Primary key - **wp_user_id, website_id, tryout_id, slot, level**: Composite key - **item_id, response**: Question and answer - **ctt_bobot_earned, ctt_total_bobot_cumulative, ctt_nm, ctt_nn**: CTT scores - **rataan_used, sb_used, normalization_mode_used**: Normalization metadata - **irt_theta, irt_theta_se, irt_information**: IRT scores - **scoring_mode_used**: Which mode was used #### Tryout Config - **id**: Primary key - **website_id, tryout_id**: Composite key - **scoring_mode**: 'ctt', 'irt', 'hybrid' - **selection_mode**: 'fixed', 'adaptive', 'hybrid' - **normalization_mode**: 'static', 'dynamic', 'hybrid' - **static_rataan, static_sb, min_sample_for_dynamic**: Normalization settings - **min_calibration_sample, theta_estimation_method**: IRT settings - **hybrid_transition_slot, fallback_to_ctt_on_error**: Transition settings #### Tryout Stats - **id**: Primary key - **website_id, tryout_id**: Composite key - **participant_count**: Number of completed sessions - **total_nm_sum, total_nm_sq_sum**: Running sums for mean/SD calc - **current_rataan, current_sb**: Calculated values - **min_nm, max_nm**: Score range - **last_calculated_at, last_participant_id**: Metadata ### 5.2 Data Relationships - Items → User Answers (1:N, CASCADE delete) - Items → Items (self-reference via basis_item_id for AI generation) - Tryout Config → User Answers (1:N via website_id, tryout_id) - Tryout Stats → User Answers (1:N via website_id, tryout_id) --- ## 6. Technical Constraints ### 6.1 Tech Stack (Fixed) - **Backend**: FastAPI (Python) - **Database**: PostgreSQL (via aaPanel PgSQL Manager) - **ORM**: SQLAlchemy - **Admin**: FastAPI Admin - **AI**: OpenRouter API (Qwen3 Coder 480B, Llama 3.3 70B) - **Deployment**: aaPanel VPS (Python Manager) ### 6.2 External Dependencies - **OpenRouter API**: Must handle rate limits, timeouts, errors - **WordPress**: REST API integration, authentication - **Excel**: openpyxl for import, pandas for data processing ### 6.3 Mathematical Constraints - **CTT**: Must use EXACT client formulas (p, bobot, NM, NN) - **IRT**: 1PL Rasch model only (no a, c parameters initially) - **Normalization**: Mean=500, SD=100 target - **Ranges**: θ, b ∈ [-3, +3], NM, NN ∈ [0, 1000] --- ## 7. User Stories ### 7.1 Administrator Stories **US-7.1.1** As an administrator, I want to import questions from Excel so that I can migrate existing content without manual entry. - Priority: High - Acceptance: FR-7.1, FR-7.2, FR-7.3 **US-7.1.2** As an administrator, I want to configure normalization mode (static/dynamic/hybrid) so that I can control how scores are normalized. - Priority: High - Acceptance: FR-4.3, FR-8.4 **US-7.1.3** As an administrator, I want to view calibration status so that I can know when IRT is ready for production. - Priority: Medium - Acceptance: FR-8.3 **US-7.1.4** As an administrator, I want to choose scoring mode (CTT/IRT/hybrid) so that I can gradually adopt advanced features. - Priority: High - Acceptance: FR-3.1, FR-3.2, FR-3.3 ### 7.2 Student Stories **US-7.2.1** As a student, I want to take adaptive tests so that I get questions matching my ability level. - Priority: High - Acceptance: FR-6.2, FR-2.1, FR-2.2 **US-7.2.2** As a student, I want to see my normalized score (NN) so that I can compare my performance with others. - Priority: High - Acceptance: FR-1.4, FR-4.2 **US-7.2.3** As a student, I want a seamless experience where any technical issues (IRT fallback, AI generation failures) are handled without interrupting my test. - Priority: High - Acceptance: Seamless fallback (student unaware of internal mode switching), no error messages visible to students ### 7.3 Content Creator Stories **US-7.3.1** As a content creator, I want to generate question variants via AI so that I don't have to manually create 3 difficulty levels. - Priority: High - Acceptance: FR-5.1, FR-5.2, FR-5.3, FR-5.4 **US-7.3.2** As a content creator, I want to reuse existing questions with different difficulty levels so that I can maximize question pool efficiency. - Priority: Medium - Acceptance: FR-5.3, FR-6.3 ### 7.4 Technical Administrator Stories **US-7.4.1** As a technical administrator, I want to manage multiple WordPress sites from one backend so that I don't have to duplicate infrastructure. - Priority: High - Acceptance: FR-9.1, FR-9.2 **US-7.4.2** As a technical administrator, I want to monitor calibration progress so that I can plan IRT rollout. - Priority: Medium - Acceptance: FR-2.3, FR-8.3 **US-7.4.3** As a technical administrator, I want access to internal scoring details (CTT vs IRT comparison, normalization metrics) for debugging and monitoring, while students only see primary scores. - Priority: Medium - Acceptance: Admin visibility of all internal metrics, student visibility limited to final NN score only --- ## 8. Success Criteria ### 8.1 Technical Validation - ✅ CTT scores match client Excel to 4 decimal places (100% formula accuracy) - ✅ Dynamic normalization produces mean=500±5, SD=100±5 after 100 users - ✅ IRT calibration covers >80% items with 500+ responses per item - ✅ CTT vs IRT NN difference <20 points (moderate agreement) - ✅ Fallback rate <5% (IRT → CTT on error) ### 8.2 Educational Validation - ✅ IRT measurement precision: SE <0.5 after 15 items - ✅ Normalization quality: Distribution skewness <0.5 - ✅ Adaptive efficiency: 30% reduction in test length (15 IRT = 30 CTT items for same precision) - ✅ Student satisfaction: >80% prefer adaptive mode in surveys - ✅ Admin adoption: >70% tryouts use hybrid mode within 3 months ### 8.3 Business Validation - ✅ Zero data loss during CTT→IRT transition - ✅ Reversible: Can disable IRT and revert to CTT anytime - ✅ Non-destructive: Existing Excel workflow remains functional - ✅ Cost efficiency: 99.9% question reuse vs 90,000 unique questions for 1000 users - ✅ Multi-site scalability: One backend supports unlimited WordPress sites --- ## 9. Risk Mitigation ### 9.1 Technical Risks | Risk | Impact | Probability | Mitigation | |------|--------|-------------|------------| | IRT calibration fails (insufficient data) | High | Medium | Fallback to CTT mode, enable hybrid transition | | OpenRouter API down/unavailable | Medium | Low | Cache questions, serve static variants | | Excel formula mismatch | High | Low | Unit tests with client Excel data | | Database performance degradation | Medium | Low | Indexing, read replicas, query optimization | ### 9.2 Business Risks | Risk | Impact | Probability | Mitigation | |------|--------|-------------|------------| | Administrators refuse to use IRT (too complex) | High | Medium | Hybrid mode with CTT-first UI | | Students dislike adaptive tests | Medium | Low | A/B testing, optional mode | | Excel workflow changes (client updates) | High | Low | Version control, flexible import parser | | Multi-site data isolation failure | Critical | Low | Website_id validation, RBAC | --- ## 10. Migration Strategy ### 10.1 Phase 1: Import Existing Data (Week 1) - Export current Sejoli Tryout data to Excel - Run import script to load items and configurations - Configure CTT mode with static normalization - Validate: CTT scores match Excel 100% ### 10.2 Phase 2: Collect Calibration Data (Week 2-4) - Students use tryout normally (CTT mode) - Backend logs all responses - Monitor calibration progress (items.calibrated status) - Collect running statistics (tryout_stats) ### 10.3 Phase 3: Enable Dynamic Normalization (Week 5) - Check participant count ≥ 100 - Update normalization_mode='hybrid' - Test with 10-20 new students - Verify: Normalized distribution has mean≈500, SD≈100 ### 10.4 Phase 4: Enable IRT Adaptive (Week 6+) - After 90% items calibrated + 1000+ responses - Update scoring_mode='irt', selection_mode='adaptive' - Enable AI generation for Mudah/Sulit variants - Monitor fallback rate, measurement precision ### 10.5 Rollback Plan - Any phase is reversible - Revert to CTT mode if IRT issues occur - **Score preservation**: Historical IRT scores kept as-is; CTT applied only to new sessions after rollback - Disable AI generation if costs too high - Revert to static normalization if dynamic unstable --- ## 11. Future Enhancements ### 11.1 Short-term (3-6 months) - **2PL/3PL IRT**: Add discrimination (a) and guessing (c) parameters - **Item Response Categorization**: Bloom's Taxonomy, cognitive domains - **Advanced AI Models**: Fine-tune models for specific subjects - **Data Retention Policy**: Define archival and anonymization strategy (currently: keep all data) ### 11.2 Long-term (6-12 months) - **Multi-dimensional IRT**: Measure multiple skills per question - **Automatic Item Difficulty Adjustment**: AI calibrates b parameters - **Predictive Analytics**: Student performance forecasting - **Integration with LMS**: Moodle, Canvas API support --- ## 12. Glossary | Term | Definition | |------|------------| | **p (TK)** | Proportion correct / Tingkat Kesukaran (CTT difficulty) | | **Bobot** | 1-p weight (CTT scoring weight) | | **NM** | Nilai Mentah (raw score 0-1000) | | **NN** | Nilai Nasional (normalized 500±100) | | **Rataan** | Mean of NM scores | | **SB** | Simpangan Baku (standard deviation of NM) | | **θ (theta)** | IRT ability (-3 to +3) | | **b** | IRT difficulty (-3 to +3) | | **SE** | Standard error (precision) | | **CAT** | Computerized Adaptive Testing | | **MLE** | Maximum Likelihood Estimation | | **CTT** | Classical Test Theory | | **IRT** | Item Response Theory | --- ## 13. Appendices ### 13.1 Formula Reference - **CTT p**: `p = Σ Benar / Total Peserta` - **CTT Bobot**: `Bobot = 1 - p` - **CTT NM**: `NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000` - **CTT NN**: `NN = 500 + 100 × ((NM - Rataan) / SB)` - **IRT 1PL**: `P(θ) = 1 / (1 + e^-(θ - b))` - **CTT→IRT conversion**: `b ≈ -ln((1-p)/p)` - **θ→NN mapping**: `NN = 500 + (θ / 3) × 500` ### 13.2 Difficulty Categories | CTT p | CTT Category | Level | IRT b Range | |-------|--------------|-------|-------------| | p < 0.30 | Sukar | Sulit | b > 0.85 | | 0.30 ≤ p ≤ 0.70 | Sedang | Sedang | -0.85 ≤ b ≤ 0.85 | | p > 0.70 | Mudah | Mudah | b < -0.85 | ### 13.3 API Quick Reference - `POST /api/v1/session/{session_id}/next_item` - Get next question - `POST /api/v1/session/{session_id}/complete` - Submit and score - `GET /api/v1/tryout/{tryout_id}/config` - Get configuration - `PUT /api/v1/tryout/{tryout_id}/normalization` - Update normalization --- ## 14. Reporting Requirements ### 14.1 Student Performance Reports **FR-14.1.1** System must provide individual student performance reports **Acceptance Criteria:** - Report all student sessions (CTT, IRT, hybrid) - Include NM, NN scores per session - Include time spent per question - Include total_benar, total_bobot_earned - Export to CSV/Excel **FR-14.1.2** System must provide aggregate student performance reports **Acceptance Criteria:** - Group by tryout, website_id, date range - Show average NM, NN, theta per group - Show distribution (min, max, median, std dev) - Show pass/fail rates - Export to CSV/Excel ### 14.2 Item Analysis Reports **FR-14.2.1** System must provide item difficulty reports **Acceptance Criteria:** - Show CTT p-value per item - Show IRT b-parameter per item - Show calibration status - Show discrimination index (if available) - Filter by difficulty category (Mudah/Sedang/Sulit) **FR-14.2.2** System must provide item information function reports **Acceptance Criteria:** - Show item information value at different theta levels - Visualize item characteristic curves (optional) - Show optimal theta range for each item ### 14.3 Calibration Status Reports **FR-14.3.1** System must provide calibration progress reports **Acceptance Criteria:** - Show total items per tryout - Show calibrated items count and percentage - Show items awaiting calibration - Show average calibration sample size - Show estimated time to reach calibration threshold - Highlight ready-for-IRT rollout status (≥90% calibrated) ### 14.4 Tryout Comparison Reports **FR-14.4.1** System must provide tryout comparison across dates **Acceptance Criteria:** - Compare NM/NN distributions across different tryout dates - Show trends over time (e.g., monthly averages) - Show normalization changes impact (static → dynamic) **FR-14.4.2** System must provide tryout comparison across subjects **Acceptance Criteria:** - Compare performance across different subjects (Mat SD vs Bahasa SMA) - Show subject-specific calibration status - Show IRT accuracy differences per subject ### 14.5 Reporting Infrastructure **FR-14.5.1** System must provide report scheduling **Acceptance Criteria:** - Admin can schedule daily/weekly/monthly reports - Reports emailed to admin on schedule - Report templates configurable (e.g., calibration status every Monday) **FR-14.5.2** System must provide report export formats **Acceptance Criteria:** - Export to CSV - Export to Excel (.xlsx) - Export to PDF (with charts if available) --- **Document End** **Document Version:** 1.1 **Created:** March 21, 2026 **Updated:** March 21, 2026 (Clarifications Incorporated) **Author:** Product Team (based on Technical Specification v1.2.0) **Status:** Draft - Ready for Implementation **Status:** Draft for Review