28 KiB
Product Requirements Document (PRD)
IRT-Powered Adaptive Question Bank System
Document Version: 1.1 Date: March 21, 2026 (Updated) Product Name: IRT Bank Soal (Adaptive Question Bank with AI Generation) Client: Sejoli Tryout Multi-Website Platform Status: Draft - Clarifications Incorporated
Changelog
v1.1 (March 21, 2026)
- Added AI Generation: 1 request = 1 question, no approval workflow
- Added Admin Playground: Admin can test AI generation without saving to DB
- Updated Normalization Control: Optional manual/automatic mode, system handles auto when sufficient data
- Updated IRT → CTT Rollback: Historical IRT scores preserved, CTT applied to new sessions only
- Removed Admin Permissions/Role-based Access: Not needed (each admin per site via WordPress)
- Updated Custom Dashboards: Use FastAPI Admin only (no custom dashboards)
- Added AI Generation Toggle: Global on/off switch for cost control
- Added User-level Question Reuse: Check if student already answered at difficulty level
- Updated Student UX: Admin sees internal metrics, students see only primary score
- Added Data Retention: Keep all data (no policy yet)
- Added Reporting Section: Student performance, Item analysis, Calibration status, Tryout comparison
- Updated Admin Persona Note: This project is backend tool for IRT/CTT calculation; WordPress handles static questions
1. Product Vision
1.1 Vision Statement
To provide an adaptive, intelligent question bank system that seamlessly integrates with Sejoli's existing Excel-based workflow while introducing modern Item Response Theory (IRT) capabilities and AI-powered question generation, enabling more accurate and efficient student assessment.
1.1.1 Primary Goals
- 100% Excel Compatibility: Maintain exact formula compatibility with client's existing Excel workflow (CTT scoring with p, bobot, NM, NN)
- Gradual Modernization: Enable smooth transition from Classical Test Theory (CTT) to Item Response Theory (IRT)
- Adaptive Assessment: Provide Computerized Adaptive Testing (CAT) capabilities for more efficient and accurate measurement
- AI-Enhanced Content: Automatically generate question variants (Mudah/Sulit) from base Sedang questions
- Multi-Site Support: Single backend serving multiple WordPress-powered educational sites
- Non-Destructive: Zero disruption to existing operations - all enhancements are additive
1.1.2 Success Metrics
- Technical: CTT scores match client Excel 100%, IRT calibration >80% coverage
- Educational: 30% reduction in test length with IRT vs CTT, measurement precision (SE < 0.5 after 15 items)
- Adoption: >70% tryouts use hybrid mode within 3 months, >80% student satisfaction with adaptive mode
- Efficiency: 99.9% question reuse rate via AI-generated variants
2. User Personas
2.1 Administrators (School/Guru)
Profile: Non-technical education professionals managing tryouts Pain Points:
- Excel-based scoring is manual and time-consuming
- Static questions require constant new content creation
- Difficulty normalization requires manual calculation
- Limited ability to compare student performance across groups
Needs:
- Simple, transparent scoring formulas (CTT mode)
- Easy Excel import/export workflow
- Clear visualizations of student performance
- Configurable normalization (static vs dynamic)
- Optional advanced features (IRT) without complexity
2.2 Students
Profile: Students taking tryouts for assessment Pain Points:
- Fixed-length tests regardless of ability level
- Question difficulty may not match their skill
- Long testing sessions with low-value questions
Needs:
- Adaptive tests that match their ability level
- Shorter, more efficient assessment
- Clear feedback on strengths/weaknesses
- Consistent scoring across attempts
2.3 Content Creators
Profile: Staff creating and managing question banks Pain Points:
- Creating 3 difficulty variants per question is time-consuming
- Limited question pool for repeated assessments
- Manual categorization of difficulty levels
Needs:
- AI-assisted question generation
- Easy difficulty level adjustment
- Reuse of base questions with variant generation
- Bulk question management tools
2.4 Technical Administrators
Profile: IT staff managing the platform Pain Points:
- Multiple WordPress sites with separate databases
- Difficulty scaling question pools
- Maintenance of complex scoring systems
Needs:
- Centralized backend for multiple sites
- Scalable architecture (AA-panel VPS)
- REST API for WordPress integration
- Automated calibration and normalization
- Note: Each admin manages static questions within WordPress; this project provides the backend tool for IRT/CTT calculation and dynamic question selection
3. Functional Requirements
3.1 CTT Scoring (Classical Test Theory)
FR-1.1 System must calculate tingkat kesukaran (p) per question using exact client Excel formula:
p = Σ Benar / Total Peserta
Acceptance Criteria:
- p-value calculated per question for each tryout
- Values stored in database (items.ctt_p)
- Results match client Excel to 4 decimal places
FR-1.2 System must calculate bobot (weight) per question:
Bobot = 1 - p
Acceptance Criteria:
- Bobot calculated and stored (items.ctt_bobot)
- Easy questions (p > 0.70) have low bobot (< 0.30)
- Difficult questions (p < 0.30) have high bobot (> 0.70)
FR-1.3 System must calculate Nilai Mentah (NM) per student:
NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
Acceptance Criteria:
- NM ranges 0-1000
- SUMPRODUCT equivalent implemented correctly
- Results stored per response (user_answers.ctt_nm)
FR-1.4 System must calculate Nilai Nasional (NN) with normalization:
NN = 500 + 100 × ((NM - Rataan) / SB)
Acceptance Criteria:
- NN normalized to mean=500, SD=100
- Support static (hardcoded rataan/SB) and dynamic (real-time) modes
- NN clipped to 0-1000 range
FR-1.5 System must categorize question difficulty per CTT standards:
- p < 0.30 → Sukar (Sulit)
- 0.30 ≤ p ≤ 0.70 → Sedang
- p > 0.70 → Mudah Acceptance Criteria:
- Category assigned (items.ctt_category)
- Used for level field (items.level)
3.2 IRT Scoring (Item Response Theory)
FR-2.1 System must implement 1PL Rasch model:
P(θ) = 1 / (1 + e^-(θ - b))
Acceptance Criteria:
- θ (ability) estimated per student
- b (difficulty) calibrated per question
- Ranges: θ, b ∈ [-3, +3]
FR-2.2 System must estimate θ using Maximum Likelihood Estimation (MLE) Acceptance Criteria:
- Initial guess θ = 0
- Optimization bounds [-3, +3]
- Standard error (SE) calculated using Fisher information
FR-2.3 System must calibrate b parameters from response data Acceptance Criteria:
- Minimum 100-500 responses per item for calibration
- Calibration status tracked (items.calibrated)
- Auto-convert CTT p to initial b:
b ≈ -ln((1-p)/p)
FR-2.4 System must map θ to NN for CTT comparison Acceptance Criteria:
- θ ∈ [-3, +3] mapped to NN ∈ [0, 1000]
- Formula:
NN = 500 + (θ / 3) × 500 - Secondary score returned in API responses
3.3 Hybrid Mode
FR-3.1 System must support dual scoring (CTT + IRT parallel) Acceptance Criteria:
- Both scores calculated per response
- Primary/secondary score returned
- Admin can choose which to display
FR-3.2 System must support hybrid item selection Acceptance Criteria:
- First N items: fixed order (CTT mode)
- Remaining items: adaptive (IRT mode)
- Configurable transition point (tryout_config.hybrid_transition_slot)
FR-3.3 System must support hybrid normalization Acceptance Criteria:
- Static mode for small samples (< threshold)
- Dynamic mode for large samples (≥ threshold)
- Configurable threshold (tryout_config.min_sample_for_dynamic)
3.4 Dynamic Normalization
FR-4.1 System must maintain running statistics per tryout Acceptance Criteria:
- Track: participant_count, total_nm_sum, total_nm_sq_sum
- Update on each completed session
- Stored in tryout_stats table
FR-4.2 System must calculate real-time rataan and SB Acceptance Criteria:
- Rataan = mean(all NM)
- SB = sqrt(variance(all NM))
- Updated incrementally (no full recalc)
FR-4.3 System must support optional normalization control (manual vs automatic) Acceptance Criteria:
- Admin can choose manual mode (static normalization with hardcoded values)
- Admin can choose automatic mode (dynamic normalization when sufficient data)
- When automatic selected and sufficient data reached: system handles normalization automatically
- Configurable threshold: min_sample_for_dynamic (default: 100)
- Admin can switch between manual/automatic at any time
- System displays current data readiness (participant count vs threshold)
3.5 AI Question Generation
FR-5.1 System must generate question variants via OpenRouter API Acceptance Criteria:
- Generate Mudah variant from Sedang base
- Generate Sulit variant from Sedang base
- Generate same-level variant from Sedang base
- Use Qwen3 Coder 480B or Llama 3.3 70B
- 1 request = 1 question (not batch generation)
FR-5.2 System must use standardized prompt template Acceptance Criteria:
- Include context (tryout_id, slot, level)
- Include basis soal for reference (provides topic/context)
- Request 1 question with 4 options
- Include explanation
- Maintain same context, vary only difficulty level
FR-5.3 System must implement question reuse/caching with user-level tracking Acceptance Criteria:
- Check DB for existing variant before generating
- Check if student user_id already answered question at specific difficulty level
- Reuse if found (same tryout_id, slot, level)
- Generate only if cache miss OR user hasn't answered at this difficulty
FR-5.4 System must provide admin playground for AI testing Acceptance Criteria:
- Admin can request AI generation without saving to database
- Admin can re-request unlimited times until satisfied (no approval workflow)
- Preview mode shows generated question before saving
- Admin can edit content before saving
- Purpose: Build admin trust in AI quality before enabling for students
FR-5.5 System must parse and store AI-generated questions Acceptance Criteria:
- Parse stem, options, correct answer, explanation
- Store in items table with generated_by='ai'
- Link to basis_item_id
- No approval workflow required for student tests
FR-5.6 System must support AI generation toggle Acceptance Criteria:
- Global toggle to enable/disable AI generation (config.AI_generation_enabled)
- When disabled: reuse DB questions regardless of repetition
- When enabled: generate new variants if cache miss
- Admin can toggle on/off based on cost/budget
3.6 Item Selection
FR-6.1 System must support fixed order selection (CTT mode) Acceptance Criteria:
- Items delivered in slot order (1, 2, 3, ...)
- No adaptive logic
- Used when selection_mode='fixed'
FR-6.2 System must support adaptive selection (IRT mode) Acceptance Criteria:
- Select item where b ≈ current θ
- Prioritize calibrated items
- Use item information to maximize precision
FR-6.3 System must support level-based selection (hybrid mode) Acceptance Criteria:
- Select from specified level (Mudah/Sedang/Sulit)
- Check if level variant exists in DB
- Generate via AI if not exists
3.7 Excel Import
FR-7.1 System must import from client Excel format Acceptance Criteria:
- Parse answer key (Row 2, KUNCI)
- Extract calculated p-values (Row 4, data_only=True)
- Extract bobot values (Row 5)
- Import student responses (Row 6+)
FR-7.2 System must create items from Excel import Acceptance Criteria:
- Create item per question slot
- Set ctt_p, ctt_bobot, ctt_category
- Auto-calculate irt_b from ctt_p
- Set calibrated=False
FR-7.3 System must configure tryout from Excel import Acceptance Criteria:
- Create tryout_config with CTT settings
- Set normalization_mode='static' (default)
- Set static_rataan=500, static_sb=100
3.8 API Endpoints
FR-8.1 System must provide Next Item endpoint Acceptance Criteria:
- POST /api/v1/session/{session_id}/next_item
- Accept mode (ctt/irt/hybrid)
- Accept current_responses array
- Return item with selection_method metadata
FR-8.2 System must provide Complete Session endpoint Acceptance Criteria:
- POST /api/v1/session/{session_id}/complete
- Return primary_score (CTT or IRT)
- Return secondary_score (parallel calculation)
- Return comparison (NN difference, agreement)
FR-8.3 System must provide Get Tryout Config endpoint Acceptance Criteria:
- GET /api/v1/tryout/{tryout_id}/config
- Return scoring_mode, normalization_mode
- Return current_stats (participant_count, rataan, SB)
- Return calibration_status
FR-8.4 System must provide Update Normalization endpoint Acceptance Criteria:
- PUT /api/v1/tryout/{tryout_id}/normalization
- Accept normalization_mode update
- Accept static_rataan, static_sb overrides
- Return will_switch_to_dynamic_at threshold
3.9 Multi-Site Support
FR-9.1 System must support multiple WordPress sites Acceptance Criteria:
- Each site has unique website_id
- Shared backend, isolated data per site
- API responses scoped to website_id
FR-9.2 System must support per-site configuration Acceptance Criteria:
- Each (website_id, tryout_id) pair unique
- Independent tryout_config per tryout
- Independent tryout_stats per tryout
4. Non-Functional Requirements
4.1 Performance
NFR-4.1.1 Next Item API response time < 500ms NFR-4.1.2 Complete Session API response time < 2s NFR-4.1.3 AI question generation < 10s (OpenRouter timeout) NFR-4.1.4 Support 1000 concurrent students
4.2 Scalability
NFR-4.2.1 Support 10,000+ items in database NFR-4.2.2 Support 100,000+ student responses NFR-4.2.3 Question reuse: 99.9% cache hit rate after initial generation NFR-4.2.4 Horizontal scaling via PostgreSQL read replicas
4.3 Reliability
NFR-4.3.1 99.9% uptime for tryout periods NFR-4.3.2 Automatic fallback to CTT if IRT fails NFR-4.3.3 Database transaction consistency NFR-4.3.4 Graceful degradation if AI API unavailable
4.4 Security
NFR-4.4.1 API authentication via WordPress tokens NFR-4.4.2 Website_id isolation (no cross-site data access) NFR-4.4.3 Rate limiting per API key NFR-4.4.4 Audit trail for all scoring changes
4.5 Compatibility
NFR-4.5.1 100% formula match with client Excel NFR-4.5.2 Non-destructive: zero data loss during transitions NFR-4.5.3 Reversible: can disable IRT features anytime NFR-4.5.4 WordPress REST API integration
4.6 Maintainability
NFR-4.6.1 FastAPI Admin auto-generated UI for CRUD NFR-4.6.2 Alembic migrations for schema changes NFR-4.6.3 Comprehensive API documentation (OpenAPI) NFR-4.6.4 Logging for debugging scoring calculations
5. Data Requirements
5.1 Core Entities
Items
- id: Primary key
- website_id, tryout_id: Composite key for multi-site
- slot, level: Position and difficulty
- stem, options, correct, explanation: Question content
- ctt_p, ctt_bobot, ctt_category: CTT parameters
- irt_b, irt_a, irt_c: IRT parameters
- calibrated, calibration_sample_size: Calibration status
- generated_by, ai_model, basis_item_id: AI generation metadata
User Answers
- id: Primary key
- wp_user_id, website_id, tryout_id, slot, level: Composite key
- item_id, response: Question and answer
- ctt_bobot_earned, ctt_total_bobot_cumulative, ctt_nm, ctt_nn: CTT scores
- rataan_used, sb_used, normalization_mode_used: Normalization metadata
- irt_theta, irt_theta_se, irt_information: IRT scores
- scoring_mode_used: Which mode was used
Tryout Config
- id: Primary key
- website_id, tryout_id: Composite key
- scoring_mode: 'ctt', 'irt', 'hybrid'
- selection_mode: 'fixed', 'adaptive', 'hybrid'
- normalization_mode: 'static', 'dynamic', 'hybrid'
- static_rataan, static_sb, min_sample_for_dynamic: Normalization settings
- min_calibration_sample, theta_estimation_method: IRT settings
- hybrid_transition_slot, fallback_to_ctt_on_error: Transition settings
Tryout Stats
- id: Primary key
- website_id, tryout_id: Composite key
- participant_count: Number of completed sessions
- total_nm_sum, total_nm_sq_sum: Running sums for mean/SD calc
- current_rataan, current_sb: Calculated values
- min_nm, max_nm: Score range
- last_calculated_at, last_participant_id: Metadata
5.2 Data Relationships
- Items → User Answers (1:N, CASCADE delete)
- Items → Items (self-reference via basis_item_id for AI generation)
- Tryout Config → User Answers (1:N via website_id, tryout_id)
- Tryout Stats → User Answers (1:N via website_id, tryout_id)
6. Technical Constraints
6.1 Tech Stack (Fixed)
- Backend: FastAPI (Python)
- Database: PostgreSQL (via aaPanel PgSQL Manager)
- ORM: SQLAlchemy
- Admin: FastAPI Admin
- AI: OpenRouter API (Qwen3 Coder 480B, Llama 3.3 70B)
- Deployment: aaPanel VPS (Python Manager)
6.2 External Dependencies
- OpenRouter API: Must handle rate limits, timeouts, errors
- WordPress: REST API integration, authentication
- Excel: openpyxl for import, pandas for data processing
6.3 Mathematical Constraints
- CTT: Must use EXACT client formulas (p, bobot, NM, NN)
- IRT: 1PL Rasch model only (no a, c parameters initially)
- Normalization: Mean=500, SD=100 target
- Ranges: θ, b ∈ [-3, +3], NM, NN ∈ [0, 1000]
7. User Stories
7.1 Administrator Stories
US-7.1.1 As an administrator, I want to import questions from Excel so that I can migrate existing content without manual entry.
- Priority: High
- Acceptance: FR-7.1, FR-7.2, FR-7.3
US-7.1.2 As an administrator, I want to configure normalization mode (static/dynamic/hybrid) so that I can control how scores are normalized.
- Priority: High
- Acceptance: FR-4.3, FR-8.4
US-7.1.3 As an administrator, I want to view calibration status so that I can know when IRT is ready for production.
- Priority: Medium
- Acceptance: FR-8.3
US-7.1.4 As an administrator, I want to choose scoring mode (CTT/IRT/hybrid) so that I can gradually adopt advanced features.
- Priority: High
- Acceptance: FR-3.1, FR-3.2, FR-3.3
7.2 Student Stories
US-7.2.1 As a student, I want to take adaptive tests so that I get questions matching my ability level.
- Priority: High
- Acceptance: FR-6.2, FR-2.1, FR-2.2
US-7.2.2 As a student, I want to see my normalized score (NN) so that I can compare my performance with others.
- Priority: High
- Acceptance: FR-1.4, FR-4.2
US-7.2.3 As a student, I want a seamless experience where any technical issues (IRT fallback, AI generation failures) are handled without interrupting my test.
- Priority: High
- Acceptance: Seamless fallback (student unaware of internal mode switching), no error messages visible to students
7.3 Content Creator Stories
US-7.3.1 As a content creator, I want to generate question variants via AI so that I don't have to manually create 3 difficulty levels.
- Priority: High
- Acceptance: FR-5.1, FR-5.2, FR-5.3, FR-5.4
US-7.3.2 As a content creator, I want to reuse existing questions with different difficulty levels so that I can maximize question pool efficiency.
- Priority: Medium
- Acceptance: FR-5.3, FR-6.3
7.4 Technical Administrator Stories
US-7.4.1 As a technical administrator, I want to manage multiple WordPress sites from one backend so that I don't have to duplicate infrastructure.
- Priority: High
- Acceptance: FR-9.1, FR-9.2
US-7.4.2 As a technical administrator, I want to monitor calibration progress so that I can plan IRT rollout.
- Priority: Medium
- Acceptance: FR-2.3, FR-8.3
US-7.4.3 As a technical administrator, I want access to internal scoring details (CTT vs IRT comparison, normalization metrics) for debugging and monitoring, while students only see primary scores.
- Priority: Medium
- Acceptance: Admin visibility of all internal metrics, student visibility limited to final NN score only
8. Success Criteria
8.1 Technical Validation
- ✅ CTT scores match client Excel to 4 decimal places (100% formula accuracy)
- ✅ Dynamic normalization produces mean=500±5, SD=100±5 after 100 users
- ✅ IRT calibration covers >80% items with 500+ responses per item
- ✅ CTT vs IRT NN difference <20 points (moderate agreement)
- ✅ Fallback rate <5% (IRT → CTT on error)
8.2 Educational Validation
- ✅ IRT measurement precision: SE <0.5 after 15 items
- ✅ Normalization quality: Distribution skewness <0.5
- ✅ Adaptive efficiency: 30% reduction in test length (15 IRT = 30 CTT items for same precision)
- ✅ Student satisfaction: >80% prefer adaptive mode in surveys
- ✅ Admin adoption: >70% tryouts use hybrid mode within 3 months
8.3 Business Validation
- ✅ Zero data loss during CTT→IRT transition
- ✅ Reversible: Can disable IRT and revert to CTT anytime
- ✅ Non-destructive: Existing Excel workflow remains functional
- ✅ Cost efficiency: 99.9% question reuse vs 90,000 unique questions for 1000 users
- ✅ Multi-site scalability: One backend supports unlimited WordPress sites
9. Risk Mitigation
9.1 Technical Risks
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| IRT calibration fails (insufficient data) | High | Medium | Fallback to CTT mode, enable hybrid transition |
| OpenRouter API down/unavailable | Medium | Low | Cache questions, serve static variants |
| Excel formula mismatch | High | Low | Unit tests with client Excel data |
| Database performance degradation | Medium | Low | Indexing, read replicas, query optimization |
9.2 Business Risks
| Risk | Impact | Probability | Mitigation |
|---|---|---|---|
| Administrators refuse to use IRT (too complex) | High | Medium | Hybrid mode with CTT-first UI |
| Students dislike adaptive tests | Medium | Low | A/B testing, optional mode |
| Excel workflow changes (client updates) | High | Low | Version control, flexible import parser |
| Multi-site data isolation failure | Critical | Low | Website_id validation, RBAC |
10. Migration Strategy
10.1 Phase 1: Import Existing Data (Week 1)
- Export current Sejoli Tryout data to Excel
- Run import script to load items and configurations
- Configure CTT mode with static normalization
- Validate: CTT scores match Excel 100%
10.2 Phase 2: Collect Calibration Data (Week 2-4)
- Students use tryout normally (CTT mode)
- Backend logs all responses
- Monitor calibration progress (items.calibrated status)
- Collect running statistics (tryout_stats)
10.3 Phase 3: Enable Dynamic Normalization (Week 5)
- Check participant count ≥ 100
- Update normalization_mode='hybrid'
- Test with 10-20 new students
- Verify: Normalized distribution has mean≈500, SD≈100
10.4 Phase 4: Enable IRT Adaptive (Week 6+)
- After 90% items calibrated + 1000+ responses
- Update scoring_mode='irt', selection_mode='adaptive'
- Enable AI generation for Mudah/Sulit variants
- Monitor fallback rate, measurement precision
10.5 Rollback Plan
- Any phase is reversible
- Revert to CTT mode if IRT issues occur
- Score preservation: Historical IRT scores kept as-is; CTT applied only to new sessions after rollback
- Disable AI generation if costs too high
- Revert to static normalization if dynamic unstable
11. Future Enhancements
11.1 Short-term (3-6 months)
- 2PL/3PL IRT: Add discrimination (a) and guessing (c) parameters
- Item Response Categorization: Bloom's Taxonomy, cognitive domains
- Advanced AI Models: Fine-tune models for specific subjects
- Data Retention Policy: Define archival and anonymization strategy (currently: keep all data)
11.2 Long-term (6-12 months)
- Multi-dimensional IRT: Measure multiple skills per question
- Automatic Item Difficulty Adjustment: AI calibrates b parameters
- Predictive Analytics: Student performance forecasting
- Integration with LMS: Moodle, Canvas API support
12. Glossary
| Term | Definition |
|---|---|
| p (TK) | Proportion correct / Tingkat Kesukaran (CTT difficulty) |
| Bobot | 1-p weight (CTT scoring weight) |
| NM | Nilai Mentah (raw score 0-1000) |
| NN | Nilai Nasional (normalized 500±100) |
| Rataan | Mean of NM scores |
| SB | Simpangan Baku (standard deviation of NM) |
| θ (theta) | IRT ability (-3 to +3) |
| b | IRT difficulty (-3 to +3) |
| SE | Standard error (precision) |
| CAT | Computerized Adaptive Testing |
| MLE | Maximum Likelihood Estimation |
| CTT | Classical Test Theory |
| IRT | Item Response Theory |
13. Appendices
13.1 Formula Reference
- CTT p:
p = Σ Benar / Total Peserta - CTT Bobot:
Bobot = 1 - p - CTT NM:
NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000 - CTT NN:
NN = 500 + 100 × ((NM - Rataan) / SB) - IRT 1PL:
P(θ) = 1 / (1 + e^-(θ - b)) - CTT→IRT conversion:
b ≈ -ln((1-p)/p) - θ→NN mapping:
NN = 500 + (θ / 3) × 500
13.2 Difficulty Categories
| CTT p | CTT Category | Level | IRT b Range |
|---|---|---|---|
| p < 0.30 | Sukar | Sulit | b > 0.85 |
| 0.30 ≤ p ≤ 0.70 | Sedang | Sedang | -0.85 ≤ b ≤ 0.85 |
| p > 0.70 | Mudah | Mudah | b < -0.85 |
13.3 API Quick Reference
POST /api/v1/session/{session_id}/next_item- Get next questionPOST /api/v1/session/{session_id}/complete- Submit and scoreGET /api/v1/tryout/{tryout_id}/config- Get configurationPUT /api/v1/tryout/{tryout_id}/normalization- Update normalization
14. Reporting Requirements
14.1 Student Performance Reports
FR-14.1.1 System must provide individual student performance reports Acceptance Criteria:
- Report all student sessions (CTT, IRT, hybrid)
- Include NM, NN scores per session
- Include time spent per question
- Include total_benar, total_bobot_earned
- Export to CSV/Excel
FR-14.1.2 System must provide aggregate student performance reports Acceptance Criteria:
- Group by tryout, website_id, date range
- Show average NM, NN, theta per group
- Show distribution (min, max, median, std dev)
- Show pass/fail rates
- Export to CSV/Excel
14.2 Item Analysis Reports
FR-14.2.1 System must provide item difficulty reports Acceptance Criteria:
- Show CTT p-value per item
- Show IRT b-parameter per item
- Show calibration status
- Show discrimination index (if available)
- Filter by difficulty category (Mudah/Sedang/Sulit)
FR-14.2.2 System must provide item information function reports Acceptance Criteria:
- Show item information value at different theta levels
- Visualize item characteristic curves (optional)
- Show optimal theta range for each item
14.3 Calibration Status Reports
FR-14.3.1 System must provide calibration progress reports Acceptance Criteria:
- Show total items per tryout
- Show calibrated items count and percentage
- Show items awaiting calibration
- Show average calibration sample size
- Show estimated time to reach calibration threshold
- Highlight ready-for-IRT rollout status (≥90% calibrated)
14.4 Tryout Comparison Reports
FR-14.4.1 System must provide tryout comparison across dates Acceptance Criteria:
- Compare NM/NN distributions across different tryout dates
- Show trends over time (e.g., monthly averages)
- Show normalization changes impact (static → dynamic)
FR-14.4.2 System must provide tryout comparison across subjects Acceptance Criteria:
- Compare performance across different subjects (Mat SD vs Bahasa SMA)
- Show subject-specific calibration status
- Show IRT accuracy differences per subject
14.5 Reporting Infrastructure
FR-14.5.1 System must provide report scheduling Acceptance Criteria:
- Admin can schedule daily/weekly/monthly reports
- Reports emailed to admin on schedule
- Report templates configurable (e.g., calibration status every Monday)
FR-14.5.2 System must provide report export formats Acceptance Criteria:
- Export to CSV
- Export to Excel (.xlsx)
- Export to PDF (with charts if available)
Document End
Document Version: 1.1 Created: March 21, 2026 Updated: March 21, 2026 (Clarifications Incorporated) Author: Product Team (based on Technical Specification v1.2.0) Status: Draft - Ready for Implementation Status: Draft for Review