Files
yellow-bank-soal/PRD.md
Dwindi Ramadhana cf193d7ea0 first commit
2026-03-21 23:32:59 +07:00

28 KiB
Raw Permalink Blame History

Product Requirements Document (PRD)

IRT-Powered Adaptive Question Bank System

Document Version: 1.1 Date: March 21, 2026 (Updated) Product Name: IRT Bank Soal (Adaptive Question Bank with AI Generation) Client: Sejoli Tryout Multi-Website Platform Status: Draft - Clarifications Incorporated


Changelog

v1.1 (March 21, 2026)

  • Added AI Generation: 1 request = 1 question, no approval workflow
  • Added Admin Playground: Admin can test AI generation without saving to DB
  • Updated Normalization Control: Optional manual/automatic mode, system handles auto when sufficient data
  • Updated IRT → CTT Rollback: Historical IRT scores preserved, CTT applied to new sessions only
  • Removed Admin Permissions/Role-based Access: Not needed (each admin per site via WordPress)
  • Updated Custom Dashboards: Use FastAPI Admin only (no custom dashboards)
  • Added AI Generation Toggle: Global on/off switch for cost control
  • Added User-level Question Reuse: Check if student already answered at difficulty level
  • Updated Student UX: Admin sees internal metrics, students see only primary score
  • Added Data Retention: Keep all data (no policy yet)
  • Added Reporting Section: Student performance, Item analysis, Calibration status, Tryout comparison
  • Updated Admin Persona Note: This project is backend tool for IRT/CTT calculation; WordPress handles static questions

1. Product Vision

1.1 Vision Statement

To provide an adaptive, intelligent question bank system that seamlessly integrates with Sejoli's existing Excel-based workflow while introducing modern Item Response Theory (IRT) capabilities and AI-powered question generation, enabling more accurate and efficient student assessment.

1.1.1 Primary Goals

  • 100% Excel Compatibility: Maintain exact formula compatibility with client's existing Excel workflow (CTT scoring with p, bobot, NM, NN)
  • Gradual Modernization: Enable smooth transition from Classical Test Theory (CTT) to Item Response Theory (IRT)
  • Adaptive Assessment: Provide Computerized Adaptive Testing (CAT) capabilities for more efficient and accurate measurement
  • AI-Enhanced Content: Automatically generate question variants (Mudah/Sulit) from base Sedang questions
  • Multi-Site Support: Single backend serving multiple WordPress-powered educational sites
  • Non-Destructive: Zero disruption to existing operations - all enhancements are additive

1.1.2 Success Metrics

  • Technical: CTT scores match client Excel 100%, IRT calibration >80% coverage
  • Educational: 30% reduction in test length with IRT vs CTT, measurement precision (SE < 0.5 after 15 items)
  • Adoption: >70% tryouts use hybrid mode within 3 months, >80% student satisfaction with adaptive mode
  • Efficiency: 99.9% question reuse rate via AI-generated variants

2. User Personas

2.1 Administrators (School/Guru)

Profile: Non-technical education professionals managing tryouts Pain Points:

  • Excel-based scoring is manual and time-consuming
  • Static questions require constant new content creation
  • Difficulty normalization requires manual calculation
  • Limited ability to compare student performance across groups

Needs:

  • Simple, transparent scoring formulas (CTT mode)
  • Easy Excel import/export workflow
  • Clear visualizations of student performance
  • Configurable normalization (static vs dynamic)
  • Optional advanced features (IRT) without complexity

2.2 Students

Profile: Students taking tryouts for assessment Pain Points:

  • Fixed-length tests regardless of ability level
  • Question difficulty may not match their skill
  • Long testing sessions with low-value questions

Needs:

  • Adaptive tests that match their ability level
  • Shorter, more efficient assessment
  • Clear feedback on strengths/weaknesses
  • Consistent scoring across attempts

2.3 Content Creators

Profile: Staff creating and managing question banks Pain Points:

  • Creating 3 difficulty variants per question is time-consuming
  • Limited question pool for repeated assessments
  • Manual categorization of difficulty levels

Needs:

  • AI-assisted question generation
  • Easy difficulty level adjustment
  • Reuse of base questions with variant generation
  • Bulk question management tools

2.4 Technical Administrators

Profile: IT staff managing the platform Pain Points:

  • Multiple WordPress sites with separate databases
  • Difficulty scaling question pools
  • Maintenance of complex scoring systems

Needs:

  • Centralized backend for multiple sites
  • Scalable architecture (AA-panel VPS)
  • REST API for WordPress integration
  • Automated calibration and normalization
  • Note: Each admin manages static questions within WordPress; this project provides the backend tool for IRT/CTT calculation and dynamic question selection

3. Functional Requirements

3.1 CTT Scoring (Classical Test Theory)

FR-1.1 System must calculate tingkat kesukaran (p) per question using exact client Excel formula:

p = Σ Benar / Total Peserta

Acceptance Criteria:

  • p-value calculated per question for each tryout
  • Values stored in database (items.ctt_p)
  • Results match client Excel to 4 decimal places

FR-1.2 System must calculate bobot (weight) per question:

Bobot = 1 - p

Acceptance Criteria:

  • Bobot calculated and stored (items.ctt_bobot)
  • Easy questions (p > 0.70) have low bobot (< 0.30)
  • Difficult questions (p < 0.30) have high bobot (> 0.70)

FR-1.3 System must calculate Nilai Mentah (NM) per student:

NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000

Acceptance Criteria:

  • NM ranges 0-1000
  • SUMPRODUCT equivalent implemented correctly
  • Results stored per response (user_answers.ctt_nm)

FR-1.4 System must calculate Nilai Nasional (NN) with normalization:

NN = 500 + 100 × ((NM - Rataan) / SB)

Acceptance Criteria:

  • NN normalized to mean=500, SD=100
  • Support static (hardcoded rataan/SB) and dynamic (real-time) modes
  • NN clipped to 0-1000 range

FR-1.5 System must categorize question difficulty per CTT standards:

  • p < 0.30 → Sukar (Sulit)
  • 0.30 ≤ p ≤ 0.70 → Sedang
  • p > 0.70 → Mudah Acceptance Criteria:
  • Category assigned (items.ctt_category)
  • Used for level field (items.level)

3.2 IRT Scoring (Item Response Theory)

FR-2.1 System must implement 1PL Rasch model:

P(θ) = 1 / (1 + e^-(θ - b))

Acceptance Criteria:

  • θ (ability) estimated per student
  • b (difficulty) calibrated per question
  • Ranges: θ, b ∈ [-3, +3]

FR-2.2 System must estimate θ using Maximum Likelihood Estimation (MLE) Acceptance Criteria:

  • Initial guess θ = 0
  • Optimization bounds [-3, +3]
  • Standard error (SE) calculated using Fisher information

FR-2.3 System must calibrate b parameters from response data Acceptance Criteria:

  • Minimum 100-500 responses per item for calibration
  • Calibration status tracked (items.calibrated)
  • Auto-convert CTT p to initial b: b ≈ -ln((1-p)/p)

FR-2.4 System must map θ to NN for CTT comparison Acceptance Criteria:

  • θ ∈ [-3, +3] mapped to NN ∈ [0, 1000]
  • Formula: NN = 500 + (θ / 3) × 500
  • Secondary score returned in API responses

3.3 Hybrid Mode

FR-3.1 System must support dual scoring (CTT + IRT parallel) Acceptance Criteria:

  • Both scores calculated per response
  • Primary/secondary score returned
  • Admin can choose which to display

FR-3.2 System must support hybrid item selection Acceptance Criteria:

  • First N items: fixed order (CTT mode)
  • Remaining items: adaptive (IRT mode)
  • Configurable transition point (tryout_config.hybrid_transition_slot)

FR-3.3 System must support hybrid normalization Acceptance Criteria:

  • Static mode for small samples (< threshold)
  • Dynamic mode for large samples (≥ threshold)
  • Configurable threshold (tryout_config.min_sample_for_dynamic)

3.4 Dynamic Normalization

FR-4.1 System must maintain running statistics per tryout Acceptance Criteria:

  • Track: participant_count, total_nm_sum, total_nm_sq_sum
  • Update on each completed session
  • Stored in tryout_stats table

FR-4.2 System must calculate real-time rataan and SB Acceptance Criteria:

  • Rataan = mean(all NM)
  • SB = sqrt(variance(all NM))
  • Updated incrementally (no full recalc)

FR-4.3 System must support optional normalization control (manual vs automatic) Acceptance Criteria:

  • Admin can choose manual mode (static normalization with hardcoded values)
  • Admin can choose automatic mode (dynamic normalization when sufficient data)
  • When automatic selected and sufficient data reached: system handles normalization automatically
  • Configurable threshold: min_sample_for_dynamic (default: 100)
  • Admin can switch between manual/automatic at any time
  • System displays current data readiness (participant count vs threshold)

3.5 AI Question Generation

FR-5.1 System must generate question variants via OpenRouter API Acceptance Criteria:

  • Generate Mudah variant from Sedang base
  • Generate Sulit variant from Sedang base
  • Generate same-level variant from Sedang base
  • Use Qwen3 Coder 480B or Llama 3.3 70B
  • 1 request = 1 question (not batch generation)

FR-5.2 System must use standardized prompt template Acceptance Criteria:

  • Include context (tryout_id, slot, level)
  • Include basis soal for reference (provides topic/context)
  • Request 1 question with 4 options
  • Include explanation
  • Maintain same context, vary only difficulty level

FR-5.3 System must implement question reuse/caching with user-level tracking Acceptance Criteria:

  • Check DB for existing variant before generating
  • Check if student user_id already answered question at specific difficulty level
  • Reuse if found (same tryout_id, slot, level)
  • Generate only if cache miss OR user hasn't answered at this difficulty

FR-5.4 System must provide admin playground for AI testing Acceptance Criteria:

  • Admin can request AI generation without saving to database
  • Admin can re-request unlimited times until satisfied (no approval workflow)
  • Preview mode shows generated question before saving
  • Admin can edit content before saving
  • Purpose: Build admin trust in AI quality before enabling for students

FR-5.5 System must parse and store AI-generated questions Acceptance Criteria:

  • Parse stem, options, correct answer, explanation
  • Store in items table with generated_by='ai'
  • Link to basis_item_id
  • No approval workflow required for student tests

FR-5.6 System must support AI generation toggle Acceptance Criteria:

  • Global toggle to enable/disable AI generation (config.AI_generation_enabled)
  • When disabled: reuse DB questions regardless of repetition
  • When enabled: generate new variants if cache miss
  • Admin can toggle on/off based on cost/budget

3.6 Item Selection

FR-6.1 System must support fixed order selection (CTT mode) Acceptance Criteria:

  • Items delivered in slot order (1, 2, 3, ...)
  • No adaptive logic
  • Used when selection_mode='fixed'

FR-6.2 System must support adaptive selection (IRT mode) Acceptance Criteria:

  • Select item where b ≈ current θ
  • Prioritize calibrated items
  • Use item information to maximize precision

FR-6.3 System must support level-based selection (hybrid mode) Acceptance Criteria:

  • Select from specified level (Mudah/Sedang/Sulit)
  • Check if level variant exists in DB
  • Generate via AI if not exists

3.7 Excel Import

FR-7.1 System must import from client Excel format Acceptance Criteria:

  • Parse answer key (Row 2, KUNCI)
  • Extract calculated p-values (Row 4, data_only=True)
  • Extract bobot values (Row 5)
  • Import student responses (Row 6+)

FR-7.2 System must create items from Excel import Acceptance Criteria:

  • Create item per question slot
  • Set ctt_p, ctt_bobot, ctt_category
  • Auto-calculate irt_b from ctt_p
  • Set calibrated=False

FR-7.3 System must configure tryout from Excel import Acceptance Criteria:

  • Create tryout_config with CTT settings
  • Set normalization_mode='static' (default)
  • Set static_rataan=500, static_sb=100

3.8 API Endpoints

FR-8.1 System must provide Next Item endpoint Acceptance Criteria:

  • POST /api/v1/session/{session_id}/next_item
  • Accept mode (ctt/irt/hybrid)
  • Accept current_responses array
  • Return item with selection_method metadata

FR-8.2 System must provide Complete Session endpoint Acceptance Criteria:

  • POST /api/v1/session/{session_id}/complete
  • Return primary_score (CTT or IRT)
  • Return secondary_score (parallel calculation)
  • Return comparison (NN difference, agreement)

FR-8.3 System must provide Get Tryout Config endpoint Acceptance Criteria:

  • GET /api/v1/tryout/{tryout_id}/config
  • Return scoring_mode, normalization_mode
  • Return current_stats (participant_count, rataan, SB)
  • Return calibration_status

FR-8.4 System must provide Update Normalization endpoint Acceptance Criteria:

  • PUT /api/v1/tryout/{tryout_id}/normalization
  • Accept normalization_mode update
  • Accept static_rataan, static_sb overrides
  • Return will_switch_to_dynamic_at threshold

3.9 Multi-Site Support

FR-9.1 System must support multiple WordPress sites Acceptance Criteria:

  • Each site has unique website_id
  • Shared backend, isolated data per site
  • API responses scoped to website_id

FR-9.2 System must support per-site configuration Acceptance Criteria:

  • Each (website_id, tryout_id) pair unique
  • Independent tryout_config per tryout
  • Independent tryout_stats per tryout

4. Non-Functional Requirements

4.1 Performance

NFR-4.1.1 Next Item API response time < 500ms NFR-4.1.2 Complete Session API response time < 2s NFR-4.1.3 AI question generation < 10s (OpenRouter timeout) NFR-4.1.4 Support 1000 concurrent students

4.2 Scalability

NFR-4.2.1 Support 10,000+ items in database NFR-4.2.2 Support 100,000+ student responses NFR-4.2.3 Question reuse: 99.9% cache hit rate after initial generation NFR-4.2.4 Horizontal scaling via PostgreSQL read replicas

4.3 Reliability

NFR-4.3.1 99.9% uptime for tryout periods NFR-4.3.2 Automatic fallback to CTT if IRT fails NFR-4.3.3 Database transaction consistency NFR-4.3.4 Graceful degradation if AI API unavailable

4.4 Security

NFR-4.4.1 API authentication via WordPress tokens NFR-4.4.2 Website_id isolation (no cross-site data access) NFR-4.4.3 Rate limiting per API key NFR-4.4.4 Audit trail for all scoring changes

4.5 Compatibility

NFR-4.5.1 100% formula match with client Excel NFR-4.5.2 Non-destructive: zero data loss during transitions NFR-4.5.3 Reversible: can disable IRT features anytime NFR-4.5.4 WordPress REST API integration

4.6 Maintainability

NFR-4.6.1 FastAPI Admin auto-generated UI for CRUD NFR-4.6.2 Alembic migrations for schema changes NFR-4.6.3 Comprehensive API documentation (OpenAPI) NFR-4.6.4 Logging for debugging scoring calculations


5. Data Requirements

5.1 Core Entities

Items

  • id: Primary key
  • website_id, tryout_id: Composite key for multi-site
  • slot, level: Position and difficulty
  • stem, options, correct, explanation: Question content
  • ctt_p, ctt_bobot, ctt_category: CTT parameters
  • irt_b, irt_a, irt_c: IRT parameters
  • calibrated, calibration_sample_size: Calibration status
  • generated_by, ai_model, basis_item_id: AI generation metadata

User Answers

  • id: Primary key
  • wp_user_id, website_id, tryout_id, slot, level: Composite key
  • item_id, response: Question and answer
  • ctt_bobot_earned, ctt_total_bobot_cumulative, ctt_nm, ctt_nn: CTT scores
  • rataan_used, sb_used, normalization_mode_used: Normalization metadata
  • irt_theta, irt_theta_se, irt_information: IRT scores
  • scoring_mode_used: Which mode was used

Tryout Config

  • id: Primary key
  • website_id, tryout_id: Composite key
  • scoring_mode: 'ctt', 'irt', 'hybrid'
  • selection_mode: 'fixed', 'adaptive', 'hybrid'
  • normalization_mode: 'static', 'dynamic', 'hybrid'
  • static_rataan, static_sb, min_sample_for_dynamic: Normalization settings
  • min_calibration_sample, theta_estimation_method: IRT settings
  • hybrid_transition_slot, fallback_to_ctt_on_error: Transition settings

Tryout Stats

  • id: Primary key
  • website_id, tryout_id: Composite key
  • participant_count: Number of completed sessions
  • total_nm_sum, total_nm_sq_sum: Running sums for mean/SD calc
  • current_rataan, current_sb: Calculated values
  • min_nm, max_nm: Score range
  • last_calculated_at, last_participant_id: Metadata

5.2 Data Relationships

  • Items → User Answers (1:N, CASCADE delete)
  • Items → Items (self-reference via basis_item_id for AI generation)
  • Tryout Config → User Answers (1:N via website_id, tryout_id)
  • Tryout Stats → User Answers (1:N via website_id, tryout_id)

6. Technical Constraints

6.1 Tech Stack (Fixed)

  • Backend: FastAPI (Python)
  • Database: PostgreSQL (via aaPanel PgSQL Manager)
  • ORM: SQLAlchemy
  • Admin: FastAPI Admin
  • AI: OpenRouter API (Qwen3 Coder 480B, Llama 3.3 70B)
  • Deployment: aaPanel VPS (Python Manager)

6.2 External Dependencies

  • OpenRouter API: Must handle rate limits, timeouts, errors
  • WordPress: REST API integration, authentication
  • Excel: openpyxl for import, pandas for data processing

6.3 Mathematical Constraints

  • CTT: Must use EXACT client formulas (p, bobot, NM, NN)
  • IRT: 1PL Rasch model only (no a, c parameters initially)
  • Normalization: Mean=500, SD=100 target
  • Ranges: θ, b ∈ [-3, +3], NM, NN ∈ [0, 1000]

7. User Stories

7.1 Administrator Stories

US-7.1.1 As an administrator, I want to import questions from Excel so that I can migrate existing content without manual entry.

  • Priority: High
  • Acceptance: FR-7.1, FR-7.2, FR-7.3

US-7.1.2 As an administrator, I want to configure normalization mode (static/dynamic/hybrid) so that I can control how scores are normalized.

  • Priority: High
  • Acceptance: FR-4.3, FR-8.4

US-7.1.3 As an administrator, I want to view calibration status so that I can know when IRT is ready for production.

  • Priority: Medium
  • Acceptance: FR-8.3

US-7.1.4 As an administrator, I want to choose scoring mode (CTT/IRT/hybrid) so that I can gradually adopt advanced features.

  • Priority: High
  • Acceptance: FR-3.1, FR-3.2, FR-3.3

7.2 Student Stories

US-7.2.1 As a student, I want to take adaptive tests so that I get questions matching my ability level.

  • Priority: High
  • Acceptance: FR-6.2, FR-2.1, FR-2.2

US-7.2.2 As a student, I want to see my normalized score (NN) so that I can compare my performance with others.

  • Priority: High
  • Acceptance: FR-1.4, FR-4.2

US-7.2.3 As a student, I want a seamless experience where any technical issues (IRT fallback, AI generation failures) are handled without interrupting my test.

  • Priority: High
  • Acceptance: Seamless fallback (student unaware of internal mode switching), no error messages visible to students

7.3 Content Creator Stories

US-7.3.1 As a content creator, I want to generate question variants via AI so that I don't have to manually create 3 difficulty levels.

  • Priority: High
  • Acceptance: FR-5.1, FR-5.2, FR-5.3, FR-5.4

US-7.3.2 As a content creator, I want to reuse existing questions with different difficulty levels so that I can maximize question pool efficiency.

  • Priority: Medium
  • Acceptance: FR-5.3, FR-6.3

7.4 Technical Administrator Stories

US-7.4.1 As a technical administrator, I want to manage multiple WordPress sites from one backend so that I don't have to duplicate infrastructure.

  • Priority: High
  • Acceptance: FR-9.1, FR-9.2

US-7.4.2 As a technical administrator, I want to monitor calibration progress so that I can plan IRT rollout.

  • Priority: Medium
  • Acceptance: FR-2.3, FR-8.3

US-7.4.3 As a technical administrator, I want access to internal scoring details (CTT vs IRT comparison, normalization metrics) for debugging and monitoring, while students only see primary scores.

  • Priority: Medium
  • Acceptance: Admin visibility of all internal metrics, student visibility limited to final NN score only

8. Success Criteria

8.1 Technical Validation

  • CTT scores match client Excel to 4 decimal places (100% formula accuracy)
  • Dynamic normalization produces mean=500±5, SD=100±5 after 100 users
  • IRT calibration covers >80% items with 500+ responses per item
  • CTT vs IRT NN difference <20 points (moderate agreement)
  • Fallback rate <5% (IRT → CTT on error)

8.2 Educational Validation

  • IRT measurement precision: SE <0.5 after 15 items
  • Normalization quality: Distribution skewness <0.5
  • Adaptive efficiency: 30% reduction in test length (15 IRT = 30 CTT items for same precision)
  • Student satisfaction: >80% prefer adaptive mode in surveys
  • Admin adoption: >70% tryouts use hybrid mode within 3 months

8.3 Business Validation

  • Zero data loss during CTT→IRT transition
  • Reversible: Can disable IRT and revert to CTT anytime
  • Non-destructive: Existing Excel workflow remains functional
  • Cost efficiency: 99.9% question reuse vs 90,000 unique questions for 1000 users
  • Multi-site scalability: One backend supports unlimited WordPress sites

9. Risk Mitigation

9.1 Technical Risks

Risk Impact Probability Mitigation
IRT calibration fails (insufficient data) High Medium Fallback to CTT mode, enable hybrid transition
OpenRouter API down/unavailable Medium Low Cache questions, serve static variants
Excel formula mismatch High Low Unit tests with client Excel data
Database performance degradation Medium Low Indexing, read replicas, query optimization

9.2 Business Risks

Risk Impact Probability Mitigation
Administrators refuse to use IRT (too complex) High Medium Hybrid mode with CTT-first UI
Students dislike adaptive tests Medium Low A/B testing, optional mode
Excel workflow changes (client updates) High Low Version control, flexible import parser
Multi-site data isolation failure Critical Low Website_id validation, RBAC

10. Migration Strategy

10.1 Phase 1: Import Existing Data (Week 1)

  • Export current Sejoli Tryout data to Excel
  • Run import script to load items and configurations
  • Configure CTT mode with static normalization
  • Validate: CTT scores match Excel 100%

10.2 Phase 2: Collect Calibration Data (Week 2-4)

  • Students use tryout normally (CTT mode)
  • Backend logs all responses
  • Monitor calibration progress (items.calibrated status)
  • Collect running statistics (tryout_stats)

10.3 Phase 3: Enable Dynamic Normalization (Week 5)

  • Check participant count ≥ 100
  • Update normalization_mode='hybrid'
  • Test with 10-20 new students
  • Verify: Normalized distribution has mean≈500, SD≈100

10.4 Phase 4: Enable IRT Adaptive (Week 6+)

  • After 90% items calibrated + 1000+ responses
  • Update scoring_mode='irt', selection_mode='adaptive'
  • Enable AI generation for Mudah/Sulit variants
  • Monitor fallback rate, measurement precision

10.5 Rollback Plan

  • Any phase is reversible
  • Revert to CTT mode if IRT issues occur
  • Score preservation: Historical IRT scores kept as-is; CTT applied only to new sessions after rollback
  • Disable AI generation if costs too high
  • Revert to static normalization if dynamic unstable

11. Future Enhancements

11.1 Short-term (3-6 months)

  • 2PL/3PL IRT: Add discrimination (a) and guessing (c) parameters
  • Item Response Categorization: Bloom's Taxonomy, cognitive domains
  • Advanced AI Models: Fine-tune models for specific subjects
  • Data Retention Policy: Define archival and anonymization strategy (currently: keep all data)

11.2 Long-term (6-12 months)

  • Multi-dimensional IRT: Measure multiple skills per question
  • Automatic Item Difficulty Adjustment: AI calibrates b parameters
  • Predictive Analytics: Student performance forecasting
  • Integration with LMS: Moodle, Canvas API support

12. Glossary

Term Definition
p (TK) Proportion correct / Tingkat Kesukaran (CTT difficulty)
Bobot 1-p weight (CTT scoring weight)
NM Nilai Mentah (raw score 0-1000)
NN Nilai Nasional (normalized 500±100)
Rataan Mean of NM scores
SB Simpangan Baku (standard deviation of NM)
θ (theta) IRT ability (-3 to +3)
b IRT difficulty (-3 to +3)
SE Standard error (precision)
CAT Computerized Adaptive Testing
MLE Maximum Likelihood Estimation
CTT Classical Test Theory
IRT Item Response Theory

13. Appendices

13.1 Formula Reference

  • CTT p: p = Σ Benar / Total Peserta
  • CTT Bobot: Bobot = 1 - p
  • CTT NM: NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
  • CTT NN: NN = 500 + 100 × ((NM - Rataan) / SB)
  • IRT 1PL: P(θ) = 1 / (1 + e^-(θ - b))
  • CTT→IRT conversion: b ≈ -ln((1-p)/p)
  • θ→NN mapping: NN = 500 + (θ / 3) × 500

13.2 Difficulty Categories

CTT p CTT Category Level IRT b Range
p < 0.30 Sukar Sulit b > 0.85
0.30 ≤ p ≤ 0.70 Sedang Sedang -0.85 ≤ b ≤ 0.85
p > 0.70 Mudah Mudah b < -0.85

13.3 API Quick Reference

  • POST /api/v1/session/{session_id}/next_item - Get next question
  • POST /api/v1/session/{session_id}/complete - Submit and score
  • GET /api/v1/tryout/{tryout_id}/config - Get configuration
  • PUT /api/v1/tryout/{tryout_id}/normalization - Update normalization

14. Reporting Requirements

14.1 Student Performance Reports

FR-14.1.1 System must provide individual student performance reports Acceptance Criteria:

  • Report all student sessions (CTT, IRT, hybrid)
  • Include NM, NN scores per session
  • Include time spent per question
  • Include total_benar, total_bobot_earned
  • Export to CSV/Excel

FR-14.1.2 System must provide aggregate student performance reports Acceptance Criteria:

  • Group by tryout, website_id, date range
  • Show average NM, NN, theta per group
  • Show distribution (min, max, median, std dev)
  • Show pass/fail rates
  • Export to CSV/Excel

14.2 Item Analysis Reports

FR-14.2.1 System must provide item difficulty reports Acceptance Criteria:

  • Show CTT p-value per item
  • Show IRT b-parameter per item
  • Show calibration status
  • Show discrimination index (if available)
  • Filter by difficulty category (Mudah/Sedang/Sulit)

FR-14.2.2 System must provide item information function reports Acceptance Criteria:

  • Show item information value at different theta levels
  • Visualize item characteristic curves (optional)
  • Show optimal theta range for each item

14.3 Calibration Status Reports

FR-14.3.1 System must provide calibration progress reports Acceptance Criteria:

  • Show total items per tryout
  • Show calibrated items count and percentage
  • Show items awaiting calibration
  • Show average calibration sample size
  • Show estimated time to reach calibration threshold
  • Highlight ready-for-IRT rollout status (≥90% calibrated)

14.4 Tryout Comparison Reports

FR-14.4.1 System must provide tryout comparison across dates Acceptance Criteria:

  • Compare NM/NN distributions across different tryout dates
  • Show trends over time (e.g., monthly averages)
  • Show normalization changes impact (static → dynamic)

FR-14.4.2 System must provide tryout comparison across subjects Acceptance Criteria:

  • Compare performance across different subjects (Mat SD vs Bahasa SMA)
  • Show subject-specific calibration status
  • Show IRT accuracy differences per subject

14.5 Reporting Infrastructure

FR-14.5.1 System must provide report scheduling Acceptance Criteria:

  • Admin can schedule daily/weekly/monthly reports
  • Reports emailed to admin on schedule
  • Report templates configurable (e.g., calibration status every Monday)

FR-14.5.2 System must provide report export formats Acceptance Criteria:

  • Export to CSV
  • Export to Excel (.xlsx)
  • Export to PDF (with charts if available)

Document End

Document Version: 1.1 Created: March 21, 2026 Updated: March 21, 2026 (Clarifications Incorporated) Author: Product Team (based on Technical Specification v1.2.0) Status: Draft - Ready for Implementation Status: Draft for Review