yellow-bank-soal/PRD.md

# Product Requirements Document (PRD)
## IRT-Powered Adaptive Question Bank System

**Document Version:** 1.1
**Date:** March 21, 2026 (Updated)
**Product Name:** IRT Bank Soal (Adaptive Question Bank with AI Generation)
**Client:** Sejoli Tryout Multi-Website Platform
**Status:** Draft - Clarifications Incorporated

---

## Changelog

### v1.1 (March 21, 2026)
- Added **AI Generation**: 1 request = 1 question, no approval workflow
- Added **Admin Playground**: Admin can test AI generation without saving to DB
- Updated **Normalization Control**: Optional manual/automatic mode, system handles auto when sufficient data
- Updated **IRT → CTT Rollback**: Historical IRT scores preserved, CTT applied to new sessions only
- Removed **Admin Permissions/Role-based Access**: Not needed (each admin per site via WordPress)
- Updated **Custom Dashboards**: Use FastAPI Admin only (no custom dashboards)
- Added **AI Generation Toggle**: Global on/off switch for cost control
- Added **User-level Question Reuse**: Check if student already answered at difficulty level
- Updated **Student UX**: Admin sees internal metrics, students see only primary score
- Added **Data Retention**: Keep all data (no policy yet)
- Added **Reporting Section**: Student performance, Item analysis, Calibration status, Tryout comparison
- Updated **Admin Persona Note**: This project is backend tool for IRT/CTT calculation; WordPress handles static questions

---

## 1. Product Vision

### 1.1 Vision Statement
To provide an adaptive, intelligent question bank system that seamlessly integrates with Sejoli's existing Excel-based workflow while introducing modern Item Response Theory (IRT) capabilities and AI-powered question generation, enabling more accurate and efficient student assessment.

### 1.1.1 Primary Goals
- **100% Excel Compatibility**: Maintain exact formula compatibility with client's existing Excel workflow (CTT scoring with p, bobot, NM, NN)
- **Gradual Modernization**: Enable smooth transition from Classical Test Theory (CTT) to Item Response Theory (IRT)
- **Adaptive Assessment**: Provide Computerized Adaptive Testing (CAT) capabilities for more efficient and accurate measurement
- **AI-Enhanced Content**: Automatically generate question variants (Mudah/Sulit) from base Sedang questions
- **Multi-Site Support**: Single backend serving multiple WordPress-powered educational sites
- **Non-Destructive**: Zero disruption to existing operations - all enhancements are additive

### 1.1.2 Success Metrics
- **Technical**: CTT scores match client Excel 100%, IRT calibration >80% coverage
- **Educational**: 30% reduction in test length with IRT vs CTT, measurement precision (SE < 0.5 after 15 items)
- **Adoption**: >70% tryouts use hybrid mode within 3 months, >80% student satisfaction with adaptive mode
- **Efficiency**: 99.9% question reuse rate via AI-generated variants

---

## 2. User Personas

### 2.1 Administrators (School/Guru)
**Profile:** Non-technical education professionals managing tryouts
**Pain Points:**
- Excel-based scoring is manual and time-consuming
- Static questions require constant new content creation
- Difficulty normalization requires manual calculation
- Limited ability to compare student performance across groups

**Needs:**
- Simple, transparent scoring formulas (CTT mode)
- Easy Excel import/export workflow
- Clear visualizations of student performance
- Configurable normalization (static vs dynamic)
- Optional advanced features (IRT) without complexity

### 2.2 Students
**Profile:** Students taking tryouts for assessment
**Pain Points:**
- Fixed-length tests regardless of ability level
- Question difficulty may not match their skill
- Long testing sessions with low-value questions

**Needs:**
- Adaptive tests that match their ability level
- Shorter, more efficient assessment
- Clear feedback on strengths/weaknesses
- Consistent scoring across attempts

### 2.3 Content Creators
**Profile:** Staff creating and managing question banks
**Pain Points:**
- Creating 3 difficulty variants per question is time-consuming
- Limited question pool for repeated assessments
- Manual categorization of difficulty levels

**Needs:**
- AI-assisted question generation
- Easy difficulty level adjustment
- Reuse of base questions with variant generation
- Bulk question management tools

### 2.4 Technical Administrators
**Profile:** IT staff managing the platform
**Pain Points:**
- Multiple WordPress sites with separate databases
- Difficulty scaling question pools
- Maintenance of complex scoring systems

**Needs:**
- Centralized backend for multiple sites
- Scalable architecture (AA-panel VPS)
- REST API for WordPress integration
- Automated calibration and normalization
- **Note**: Each admin manages static questions within WordPress; this project provides the backend tool for IRT/CTT calculation and dynamic question selection

---

## 3. Functional Requirements

### 3.1 CTT Scoring (Classical Test Theory)
**FR-1.1** System must calculate tingkat kesukaran (p) per question using exact client Excel formula:
```
p = Σ Benar / Total Peserta
```
**Acceptance Criteria:**
- p-value calculated per question for each tryout
- Values stored in database (items.ctt_p)
- Results match client Excel to 4 decimal places

**FR-1.2** System must calculate bobot (weight) per question:
```
Bobot = 1 - p
```
**Acceptance Criteria:**
- Bobot calculated and stored (items.ctt_bobot)
- Easy questions (p > 0.70) have low bobot (< 0.30)
- Difficult questions (p < 0.30) have high bobot (> 0.70)

**FR-1.3** System must calculate Nilai Mentah (NM) per student:
```
NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
```
**Acceptance Criteria:**
- NM ranges 0-1000
- SUMPRODUCT equivalent implemented correctly
- Results stored per response (user_answers.ctt_nm)

**FR-1.4** System must calculate Nilai Nasional (NN) with normalization:
```
NN = 500 + 100 × ((NM - Rataan) / SB)
```
**Acceptance Criteria:**
- NN normalized to mean=500, SD=100
- Support static (hardcoded rataan/SB) and dynamic (real-time) modes
- NN clipped to 0-1000 range

**FR-1.5** System must categorize question difficulty per CTT standards:
- p < 0.30 → Sukar (Sulit)
- 0.30 ≤ p ≤ 0.70 → Sedang
- p > 0.70 → Mudah
**Acceptance Criteria:**
- Category assigned (items.ctt_category)
- Used for level field (items.level)

### 3.2 IRT Scoring (Item Response Theory)
**FR-2.1** System must implement 1PL Rasch model:
```
P(θ) = 1 / (1 + e^-(θ - b))
```
**Acceptance Criteria:**
- θ (ability) estimated per student
- b (difficulty) calibrated per question
- Ranges: θ, b ∈ [-3, +3]

**FR-2.2** System must estimate θ using Maximum Likelihood Estimation (MLE)
**Acceptance Criteria:**
- Initial guess θ = 0
- Optimization bounds [-3, +3]
- Standard error (SE) calculated using Fisher information

**FR-2.3** System must calibrate b parameters from response data
**Acceptance Criteria:**
- Minimum 100-500 responses per item for calibration
- Calibration status tracked (items.calibrated)
- Auto-convert CTT p to initial b: `b ≈ -ln((1-p)/p)`

**FR-2.4** System must map θ to NN for CTT comparison
**Acceptance Criteria:**
- θ ∈ [-3, +3] mapped to NN ∈ [0, 1000]
- Formula: `NN = 500 + (θ / 3) × 500`
- Secondary score returned in API responses

### 3.3 Hybrid Mode
**FR-3.1** System must support dual scoring (CTT + IRT parallel)
**Acceptance Criteria:**
- Both scores calculated per response
- Primary/secondary score returned
- Admin can choose which to display

**FR-3.2** System must support hybrid item selection
**Acceptance Criteria:**
- First N items: fixed order (CTT mode)
- Remaining items: adaptive (IRT mode)
- Configurable transition point (tryout_config.hybrid_transition_slot)

**FR-3.3** System must support hybrid normalization
**Acceptance Criteria:**
- Static mode for small samples (< threshold)
- Dynamic mode for large samples (≥ threshold)
- Configurable threshold (tryout_config.min_sample_for_dynamic)

### 3.4 Dynamic Normalization
**FR-4.1** System must maintain running statistics per tryout
**Acceptance Criteria:**
- Track: participant_count, total_nm_sum, total_nm_sq_sum
- Update on each completed session
- Stored in tryout_stats table

**FR-4.2** System must calculate real-time rataan and SB
**Acceptance Criteria:**
- Rataan = mean(all NM)
- SB = sqrt(variance(all NM))
- Updated incrementally (no full recalc)

**FR-4.3** System must support optional normalization control (manual vs automatic)
**Acceptance Criteria:**
- Admin can choose manual mode (static normalization with hardcoded values)
- Admin can choose automatic mode (dynamic normalization when sufficient data)
- When automatic selected and sufficient data reached: system handles normalization automatically
- Configurable threshold: min_sample_for_dynamic (default: 100)
- Admin can switch between manual/automatic at any time
- System displays current data readiness (participant count vs threshold)

### 3.5 AI Question Generation
**FR-5.1** System must generate question variants via OpenRouter API
**Acceptance Criteria:**
- Generate Mudah variant from Sedang base
- Generate Sulit variant from Sedang base
- Generate same-level variant from Sedang base
- Use Qwen3 Coder 480B or Llama 3.3 70B
- **1 request = 1 question** (not batch generation)

**FR-5.2** System must use standardized prompt template
**Acceptance Criteria:**
- Include context (tryout_id, slot, level)
- Include basis soal for reference (provides topic/context)
- Request 1 question with 4 options
- Include explanation
- Maintain same context, vary only difficulty level

**FR-5.3** System must implement question reuse/caching with user-level tracking
**Acceptance Criteria:**
- Check DB for existing variant before generating
- Check if student user_id already answered question at specific difficulty level
- Reuse if found (same tryout_id, slot, level)
- Generate only if cache miss OR user hasn't answered at this difficulty

**FR-5.4** System must provide admin playground for AI testing
**Acceptance Criteria:**
- Admin can request AI generation without saving to database
- Admin can re-request unlimited times until satisfied (no approval workflow)
- Preview mode shows generated question before saving
- Admin can edit content before saving
- Purpose: Build admin trust in AI quality before enabling for students

**FR-5.5** System must parse and store AI-generated questions
**Acceptance Criteria:**
- Parse stem, options, correct answer, explanation
- Store in items table with generated_by='ai'
- Link to basis_item_id
- No approval workflow required for student tests

**FR-5.6** System must support AI generation toggle
**Acceptance Criteria:**
- Global toggle to enable/disable AI generation (config.AI_generation_enabled)
- When disabled: reuse DB questions regardless of repetition
- When enabled: generate new variants if cache miss
- Admin can toggle on/off based on cost/budget

### 3.6 Item Selection
**FR-6.1** System must support fixed order selection (CTT mode)
**Acceptance Criteria:**
- Items delivered in slot order (1, 2, 3, ...)
- No adaptive logic
- Used when selection_mode='fixed'

**FR-6.2** System must support adaptive selection (IRT mode)
**Acceptance Criteria:**
- Select item where b ≈ current θ
- Prioritize calibrated items
- Use item information to maximize precision

**FR-6.3** System must support level-based selection (hybrid mode)
**Acceptance Criteria:**
- Select from specified level (Mudah/Sedang/Sulit)
- Check if level variant exists in DB
- Generate via AI if not exists

### 3.7 Excel Import
**FR-7.1** System must import from client Excel format
**Acceptance Criteria:**
- Parse answer key (Row 2, KUNCI)
- Extract calculated p-values (Row 4, data_only=True)
- Extract bobot values (Row 5)
- Import student responses (Row 6+)

**FR-7.2** System must create items from Excel import
**Acceptance Criteria:**
- Create item per question slot
- Set ctt_p, ctt_bobot, ctt_category
- Auto-calculate irt_b from ctt_p
- Set calibrated=False

**FR-7.3** System must configure tryout from Excel import
**Acceptance Criteria:**
- Create tryout_config with CTT settings
- Set normalization_mode='static' (default)
- Set static_rataan=500, static_sb=100

### 3.8 API Endpoints
**FR-8.1** System must provide Next Item endpoint
**Acceptance Criteria:**
- POST /api/v1/session/{session_id}/next_item
- Accept mode (ctt/irt/hybrid)
- Accept current_responses array
- Return item with selection_method metadata

**FR-8.2** System must provide Complete Session endpoint
**Acceptance Criteria:**
- POST /api/v1/session/{session_id}/complete
- Return primary_score (CTT or IRT)
- Return secondary_score (parallel calculation)
- Return comparison (NN difference, agreement)

**FR-8.3** System must provide Get Tryout Config endpoint
**Acceptance Criteria:**
- GET /api/v1/tryout/{tryout_id}/config
- Return scoring_mode, normalization_mode
- Return current_stats (participant_count, rataan, SB)
- Return calibration_status

**FR-8.4** System must provide Update Normalization endpoint
**Acceptance Criteria:**
- PUT /api/v1/tryout/{tryout_id}/normalization
- Accept normalization_mode update
- Accept static_rataan, static_sb overrides
- Return will_switch_to_dynamic_at threshold

### 3.9 Multi-Site Support
**FR-9.1** System must support multiple WordPress sites
**Acceptance Criteria:**
- Each site has unique website_id
- Shared backend, isolated data per site
- API responses scoped to website_id

**FR-9.2** System must support per-site configuration
**Acceptance Criteria:**
- Each (website_id, tryout_id) pair unique
- Independent tryout_config per tryout
- Independent tryout_stats per tryout

---

## 4. Non-Functional Requirements

### 4.1 Performance
**NFR-4.1.1** Next Item API response time < 500ms
**NFR-4.1.2** Complete Session API response time < 2s
**NFR-4.1.3** AI question generation < 10s (OpenRouter timeout)
**NFR-4.1.4** Support 1000 concurrent students

### 4.2 Scalability
**NFR-4.2.1** Support 10,000+ items in database
**NFR-4.2.2** Support 100,000+ student responses
**NFR-4.2.3** Question reuse: 99.9% cache hit rate after initial generation
**NFR-4.2.4** Horizontal scaling via PostgreSQL read replicas

### 4.3 Reliability
**NFR-4.3.1** 99.9% uptime for tryout periods
**NFR-4.3.2** Automatic fallback to CTT if IRT fails
**NFR-4.3.3** Database transaction consistency
**NFR-4.3.4** Graceful degradation if AI API unavailable

### 4.4 Security
**NFR-4.4.1** API authentication via WordPress tokens
**NFR-4.4.2** Website_id isolation (no cross-site data access)
**NFR-4.4.3** Rate limiting per API key
**NFR-4.4.4** Audit trail for all scoring changes

### 4.5 Compatibility
**NFR-4.5.1** 100% formula match with client Excel
**NFR-4.5.2** Non-destructive: zero data loss during transitions
**NFR-4.5.3** Reversible: can disable IRT features anytime
**NFR-4.5.4** WordPress REST API integration

### 4.6 Maintainability
**NFR-4.6.1** FastAPI Admin auto-generated UI for CRUD
**NFR-4.6.2** Alembic migrations for schema changes
**NFR-4.6.3** Comprehensive API documentation (OpenAPI)
**NFR-4.6.4** Logging for debugging scoring calculations

---

## 5. Data Requirements

### 5.1 Core Entities

#### Items
- **id**: Primary key
- **website_id, tryout_id**: Composite key for multi-site
- **slot, level**: Position and difficulty
- **stem, options, correct, explanation**: Question content
- **ctt_p, ctt_bobot, ctt_category**: CTT parameters
- **irt_b, irt_a, irt_c**: IRT parameters
- **calibrated, calibration_sample_size**: Calibration status
- **generated_by, ai_model, basis_item_id**: AI generation metadata

#### User Answers
- **id**: Primary key
- **wp_user_id, website_id, tryout_id, slot, level**: Composite key
- **item_id, response**: Question and answer
- **ctt_bobot_earned, ctt_total_bobot_cumulative, ctt_nm, ctt_nn**: CTT scores
- **rataan_used, sb_used, normalization_mode_used**: Normalization metadata
- **irt_theta, irt_theta_se, irt_information**: IRT scores
- **scoring_mode_used**: Which mode was used

#### Tryout Config
- **id**: Primary key
- **website_id, tryout_id**: Composite key
- **scoring_mode**: 'ctt', 'irt', 'hybrid'
- **selection_mode**: 'fixed', 'adaptive', 'hybrid'
- **normalization_mode**: 'static', 'dynamic', 'hybrid'
- **static_rataan, static_sb, min_sample_for_dynamic**: Normalization settings
- **min_calibration_sample, theta_estimation_method**: IRT settings
- **hybrid_transition_slot, fallback_to_ctt_on_error**: Transition settings

#### Tryout Stats
- **id**: Primary key
- **website_id, tryout_id**: Composite key
- **participant_count**: Number of completed sessions
- **total_nm_sum, total_nm_sq_sum**: Running sums for mean/SD calc
- **current_rataan, current_sb**: Calculated values
- **min_nm, max_nm**: Score range
- **last_calculated_at, last_participant_id**: Metadata

### 5.2 Data Relationships
- Items → User Answers (1:N, CASCADE delete)
- Items → Items (self-reference via basis_item_id for AI generation)
- Tryout Config → User Answers (1:N via website_id, tryout_id)
- Tryout Stats → User Answers (1:N via website_id, tryout_id)

---

## 6. Technical Constraints

### 6.1 Tech Stack (Fixed)
- **Backend**: FastAPI (Python)
- **Database**: PostgreSQL (via aaPanel PgSQL Manager)
- **ORM**: SQLAlchemy
- **Admin**: FastAPI Admin
- **AI**: OpenRouter API (Qwen3 Coder 480B, Llama 3.3 70B)
- **Deployment**: aaPanel VPS (Python Manager)

### 6.2 External Dependencies
- **OpenRouter API**: Must handle rate limits, timeouts, errors
- **WordPress**: REST API integration, authentication
- **Excel**: openpyxl for import, pandas for data processing

### 6.3 Mathematical Constraints
- **CTT**: Must use EXACT client formulas (p, bobot, NM, NN)
- **IRT**: 1PL Rasch model only (no a, c parameters initially)
- **Normalization**: Mean=500, SD=100 target
- **Ranges**: θ, b ∈ [-3, +3], NM, NN ∈ [0, 1000]

---

## 7. User Stories

### 7.1 Administrator Stories
**US-7.1.1** As an administrator, I want to import questions from Excel so that I can migrate existing content without manual entry.
- Priority: High
- Acceptance: FR-7.1, FR-7.2, FR-7.3

**US-7.1.2** As an administrator, I want to configure normalization mode (static/dynamic/hybrid) so that I can control how scores are normalized.
- Priority: High
- Acceptance: FR-4.3, FR-8.4

**US-7.1.3** As an administrator, I want to view calibration status so that I can know when IRT is ready for production.
- Priority: Medium
- Acceptance: FR-8.3

**US-7.1.4** As an administrator, I want to choose scoring mode (CTT/IRT/hybrid) so that I can gradually adopt advanced features.
- Priority: High
- Acceptance: FR-3.1, FR-3.2, FR-3.3

### 7.2 Student Stories
**US-7.2.1** As a student, I want to take adaptive tests so that I get questions matching my ability level.
- Priority: High
- Acceptance: FR-6.2, FR-2.1, FR-2.2

**US-7.2.2** As a student, I want to see my normalized score (NN) so that I can compare my performance with others.
- Priority: High
- Acceptance: FR-1.4, FR-4.2

**US-7.2.3** As a student, I want a seamless experience where any technical issues (IRT fallback, AI generation failures) are handled without interrupting my test.
- Priority: High
- Acceptance: Seamless fallback (student unaware of internal mode switching), no error messages visible to students

### 7.3 Content Creator Stories
**US-7.3.1** As a content creator, I want to generate question variants via AI so that I don't have to manually create 3 difficulty levels.
- Priority: High
- Acceptance: FR-5.1, FR-5.2, FR-5.3, FR-5.4

**US-7.3.2** As a content creator, I want to reuse existing questions with different difficulty levels so that I can maximize question pool efficiency.
- Priority: Medium
- Acceptance: FR-5.3, FR-6.3

### 7.4 Technical Administrator Stories
**US-7.4.1** As a technical administrator, I want to manage multiple WordPress sites from one backend so that I don't have to duplicate infrastructure.
- Priority: High
- Acceptance: FR-9.1, FR-9.2

**US-7.4.2** As a technical administrator, I want to monitor calibration progress so that I can plan IRT rollout.
- Priority: Medium
- Acceptance: FR-2.3, FR-8.3

**US-7.4.3** As a technical administrator, I want access to internal scoring details (CTT vs IRT comparison, normalization metrics) for debugging and monitoring, while students only see primary scores.
- Priority: Medium
- Acceptance: Admin visibility of all internal metrics, student visibility limited to final NN score only

---

## 8. Success Criteria

### 8.1 Technical Validation
- ✅ CTT scores match client Excel to 4 decimal places (100% formula accuracy)
- ✅ Dynamic normalization produces mean=500±5, SD=100±5 after 100 users
- ✅ IRT calibration covers >80% items with 500+ responses per item
- ✅ CTT vs IRT NN difference <20 points (moderate agreement)
- ✅ Fallback rate <5% (IRT → CTT on error)

### 8.2 Educational Validation
- ✅ IRT measurement precision: SE <0.5 after 15 items
- ✅ Normalization quality: Distribution skewness <0.5
- ✅ Adaptive efficiency: 30% reduction in test length (15 IRT = 30 CTT items for same precision)
- ✅ Student satisfaction: >80% prefer adaptive mode in surveys
- ✅ Admin adoption: >70% tryouts use hybrid mode within 3 months

### 8.3 Business Validation
- ✅ Zero data loss during CTT→IRT transition
- ✅ Reversible: Can disable IRT and revert to CTT anytime
- ✅ Non-destructive: Existing Excel workflow remains functional
- ✅ Cost efficiency: 99.9% question reuse vs 90,000 unique questions for 1000 users
- ✅ Multi-site scalability: One backend supports unlimited WordPress sites

---

## 9. Risk Mitigation

### 9.1 Technical Risks
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| IRT calibration fails (insufficient data) | High | Medium | Fallback to CTT mode, enable hybrid transition |
| OpenRouter API down/unavailable | Medium | Low | Cache questions, serve static variants |
| Excel formula mismatch | High | Low | Unit tests with client Excel data |
| Database performance degradation | Medium | Low | Indexing, read replicas, query optimization |

### 9.2 Business Risks
| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| Administrators refuse to use IRT (too complex) | High | Medium | Hybrid mode with CTT-first UI |
| Students dislike adaptive tests | Medium | Low | A/B testing, optional mode |
| Excel workflow changes (client updates) | High | Low | Version control, flexible import parser |
| Multi-site data isolation failure | Critical | Low | Website_id validation, RBAC |

---

## 10. Migration Strategy

### 10.1 Phase 1: Import Existing Data (Week 1)
- Export current Sejoli Tryout data to Excel
- Run import script to load items and configurations
- Configure CTT mode with static normalization
- Validate: CTT scores match Excel 100%

### 10.2 Phase 2: Collect Calibration Data (Week 2-4)
- Students use tryout normally (CTT mode)
- Backend logs all responses
- Monitor calibration progress (items.calibrated status)
- Collect running statistics (tryout_stats)

### 10.3 Phase 3: Enable Dynamic Normalization (Week 5)
- Check participant count ≥ 100
- Update normalization_mode='hybrid'
- Test with 10-20 new students
- Verify: Normalized distribution has mean≈500, SD≈100

### 10.4 Phase 4: Enable IRT Adaptive (Week 6+)
- After 90% items calibrated + 1000+ responses
- Update scoring_mode='irt', selection_mode='adaptive'
- Enable AI generation for Mudah/Sulit variants
- Monitor fallback rate, measurement precision

### 10.5 Rollback Plan
- Any phase is reversible
- Revert to CTT mode if IRT issues occur
- **Score preservation**: Historical IRT scores kept as-is; CTT applied only to new sessions after rollback
- Disable AI generation if costs too high
- Revert to static normalization if dynamic unstable

---

## 11. Future Enhancements

### 11.1 Short-term (3-6 months)
- **2PL/3PL IRT**: Add discrimination (a) and guessing (c) parameters
- **Item Response Categorization**: Bloom's Taxonomy, cognitive domains
- **Advanced AI Models**: Fine-tune models for specific subjects
- **Data Retention Policy**: Define archival and anonymization strategy (currently: keep all data)

### 11.2 Long-term (6-12 months)
- **Multi-dimensional IRT**: Measure multiple skills per question
- **Automatic Item Difficulty Adjustment**: AI calibrates b parameters
- **Predictive Analytics**: Student performance forecasting
- **Integration with LMS**: Moodle, Canvas API support

---

## 12. Glossary

| Term | Definition |
|------|------------|
| **p (TK)** | Proportion correct / Tingkat Kesukaran (CTT difficulty) |
| **Bobot** | 1-p weight (CTT scoring weight) |
| **NM** | Nilai Mentah (raw score 0-1000) |
| **NN** | Nilai Nasional (normalized 500±100) |
| **Rataan** | Mean of NM scores |
| **SB** | Simpangan Baku (standard deviation of NM) |
| **θ (theta)** | IRT ability (-3 to +3) |
| **b** | IRT difficulty (-3 to +3) |
| **SE** | Standard error (precision) |
| **CAT** | Computerized Adaptive Testing |
| **MLE** | Maximum Likelihood Estimation |
| **CTT** | Classical Test Theory |
| **IRT** | Item Response Theory |

---

## 13. Appendices

### 13.1 Formula Reference
- **CTT p**: `p = Σ Benar / Total Peserta`
- **CTT Bobot**: `Bobot = 1 - p`
- **CTT NM**: `NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000`
- **CTT NN**: `NN = 500 + 100 × ((NM - Rataan) / SB)`
- **IRT 1PL**: `P(θ) = 1 / (1 + e^-(θ - b))`
- **CTT→IRT conversion**: `b ≈ -ln((1-p)/p)`
- **θ→NN mapping**: `NN = 500 + (θ / 3) × 500`

### 13.2 Difficulty Categories
| CTT p | CTT Category | Level | IRT b Range |
|-------|--------------|-------|-------------|
| p < 0.30 | Sukar | Sulit | b > 0.85 |
| 0.30 ≤ p ≤ 0.70 | Sedang | Sedang | -0.85 ≤ b ≤ 0.85 |
| p > 0.70 | Mudah | Mudah | b < -0.85 |

### 13.3 API Quick Reference
- `POST /api/v1/session/{session_id}/next_item` - Get next question
- `POST /api/v1/session/{session_id}/complete` - Submit and score
- `GET /api/v1/tryout/{tryout_id}/config` - Get configuration
- `PUT /api/v1/tryout/{tryout_id}/normalization` - Update normalization

---

## 14. Reporting Requirements

### 14.1 Student Performance Reports
**FR-14.1.1** System must provide individual student performance reports
**Acceptance Criteria:**
- Report all student sessions (CTT, IRT, hybrid)
- Include NM, NN scores per session
- Include time spent per question
- Include total_benar, total_bobot_earned
- Export to CSV/Excel

**FR-14.1.2** System must provide aggregate student performance reports
**Acceptance Criteria:**
- Group by tryout, website_id, date range
- Show average NM, NN, theta per group
- Show distribution (min, max, median, std dev)
- Show pass/fail rates
- Export to CSV/Excel

### 14.2 Item Analysis Reports
**FR-14.2.1** System must provide item difficulty reports
**Acceptance Criteria:**
- Show CTT p-value per item
- Show IRT b-parameter per item
- Show calibration status
- Show discrimination index (if available)
- Filter by difficulty category (Mudah/Sedang/Sulit)

**FR-14.2.2** System must provide item information function reports
**Acceptance Criteria:**
- Show item information value at different theta levels
- Visualize item characteristic curves (optional)
- Show optimal theta range for each item

### 14.3 Calibration Status Reports
**FR-14.3.1** System must provide calibration progress reports
**Acceptance Criteria:**
- Show total items per tryout
- Show calibrated items count and percentage
- Show items awaiting calibration
- Show average calibration sample size
- Show estimated time to reach calibration threshold
- Highlight ready-for-IRT rollout status (≥90% calibrated)

### 14.4 Tryout Comparison Reports
**FR-14.4.1** System must provide tryout comparison across dates
**Acceptance Criteria:**
- Compare NM/NN distributions across different tryout dates
- Show trends over time (e.g., monthly averages)
- Show normalization changes impact (static → dynamic)

**FR-14.4.2** System must provide tryout comparison across subjects
**Acceptance Criteria:**
- Compare performance across different subjects (Mat SD vs Bahasa SMA)
- Show subject-specific calibration status
- Show IRT accuracy differences per subject

### 14.5 Reporting Infrastructure
**FR-14.5.1** System must provide report scheduling
**Acceptance Criteria:**
- Admin can schedule daily/weekly/monthly reports
- Reports emailed to admin on schedule
- Report templates configurable (e.g., calibration status every Monday)

**FR-14.5.2** System must provide report export formats
**Acceptance Criteria:**
- Export to CSV
- Export to Excel (.xlsx)
- Export to PDF (with charts if available)

---

**Document End**

**Document Version:** 1.1
**Created:** March 21, 2026
**Updated:** March 21, 2026 (Clarifications Incorporated)
**Author:** Product Team (based on Technical Specification v1.2.0)
**Status:** Draft - Ready for Implementation
**Status:** Draft for Review