yellow-bank-soal/TEST.md

# IRT Bank Soal - Test Walkthrough & Validation Guide

**Document Version:** 1.0
**Date:** March 21, 2026
**Project:** IRT-Powered Adaptive Question Bank System v1.2.0

---

## Table of Contents

1. [Prerequisites](#1-prerequisites)
2. [Environment Setup](#2-environment-setup)
3. [Installation](#3-installation)
4. [Database Setup](#4-database-setup)
5. [Configuration](#5-configuration)
6. [Starting the Application](#6-starting-the-application)
7. [Core Functionality Tests](#7-core-functionality-tests)
8. [Excel Import/Export Tests](#8-excel-importexport-tests)
9. [IRT Calibration Tests](#9-irt-calibration-tests)
10. [CAT Selection Tests](#10-cat-selection-tests)
11. [AI Generation Tests](#11-ai-generation-tests)
12. [WordPress Integration Tests](#12-wordpress-integration-tests)
13. [Reporting System Tests](#13-reporting-system-tests)
14. [Admin Panel Tests](#14-admin-panel-tests)
15. [Integration Tests](#15-integration-tests)
16. [Validation Checklist](#16-validation-checklist)
17. [Troubleshooting](#17-troubleshooting)

---

## 1. Prerequisites

### Required Software

| Software | Minimum Version | Recommended Version |
|-----------|------------------|---------------------|
| Python | 3.10+ | 3.11+ |
| PostgreSQL | 14+ | 15+ |
| npm/node | Not required | Latest LTS |

### Required Python Packages

All packages listed in `requirements.txt`:
- fastapi
- uvicorn[standard]
- sqlalchemy
- asyncpg
- alembic
- pydantic
- pydantic-settings
- openpyxl
- pandas
- numpy
- scipy
- openai
- httpx
- celery
- redis
- fastapi-admin
- python-dotenv

### Optional Development Tools

- Docker (for containerized development)
- pgAdmin (for database management)
- Postman / curl (for API testing)
- IDE with Python LSP support (VSCode, PyCharm)

---

## 2. Environment Setup

### Step 2.1: Clone/Extract Repository

```bash
# Navigate to project directory
cd /Users/dwindown/Applications/tryout-system

# Verify structure
ls -la
# Expected: app/, app/models/, app/routers/, app/services/, tests/, requirements.txt, .env.example
```

### Step 2.2: Copy Environment Configuration

```bash
# Copy environment template
cp .env.example .env

# Edit .env with your values
nano .env  # or use your preferred editor

# Required configuration:
DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/irt_bank_soal
SECRET_KEY=your-secret-key-here-change-in-production
OPENROUTER_API_KEY=your-openrouter-api-key-here

# WordPress Integration (optional for testing)
WORDPRESS_API_URL=https://your-wordpress-site.com/wp-json
WORDPRESS_AUTH_TOKEN=your-jwt-token

# Redis (optional, for Celery task queue)
REDIS_URL=redis://localhost:6379/0
```

### Step 2.3: Create Virtual Environment

```bash
# Create virtual environment
python3 -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Verify activation
which python3  # Should show venv/bin/python3
```

### Step 2.4: Install Dependencies

```bash
# Install all required packages
pip3 install -r requirements.txt

# Verify installation
pip3 list | grep -E "fastapi|sqlalchemy|numpy|scipy|httpx|openpyxl"

# Expected: All packages listed should be installed
```

---

## 3. Installation

### Step 3.1: Database Setup

```bash
# Create PostgreSQL database
psql postgres

# Connect to PostgreSQL
\c irt_bank_soal

# Create database (if not exists)
CREATE DATABASE irt_bank_soal;
\q

# Exit PostgreSQL
\q
```

### Step 3.2: Initialize Alembic Migrations

```bash
# Initialize Alembic
alembic init alembic

# Generate initial migration
alembic revision --autogenerate -m "Initial migration"

# Apply migration to database
alembic upgrade head

# Expected: Creates alembic/versions/ directory with initial migration file
```

### Step 3.3: Verify Database Connection

```bash
# Run database initialization test
python3 -c "
import asyncio
from app.database import init_db
from app.core.config import get_settings

async def test():
    await init_db()
    print('✅ Database initialized successfully')
    print(f'✅ Database URL: {get_settings().DATABASE_URL}')

asyncio.run(test())
"
```

---

## 4. Database Setup

### Step 4.1: Create Test Excel File

Create a test Excel file `test_tryout.xlsx` with the following structure:

| Sheet | Row | Content |
|-------|------|---------|
| CONTOH | 2 | KUNCI (answer key) - A, B, C, D, A, B, C, D, A, B, C |
| CONTOH | 4 | TK (p-values) - 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3 |
| CONTOH | 5 | BOBOT (weights) - 0.5, 0.4, 0.3, 0.2, 0.1, 0.0, -0.1, -0.2, -0.3 |
| CONTOH | 6+ | Question data (10 questions) |

**Question Data Format (Rows 6-15):**
- Column A: Slot (1, 2, 3, ..., 10)
- Column B: Level (mudah, sedang, sulit)
- Column C: Soal text
- Column D: Option A
- Column E: Option B
- Column F: Option C
- Column G: Option D
- Column H: Correct (A, B, C, or D)

### Step 4.2: Load Test Data

```bash
# Python script to load test data
python3 -c "
import asyncio
from sqlalchemy import select
from app.database import AsyncSessionLocal
from app.models.item import Item
from app.models.tryout import Tryout

async def load_test_data():
    async with AsyncSessionLocal() as session:
        # Check if test data exists
        result = await session.execute(select(Tryout).where(Tryout.tryout_id == 'TEST_TRYOUT_001'))
        existing = result.scalar_one_or_none()

        if existing:
            print('Test tryout already loaded')
            return

        # Create test tryout
        tryout = Tryout(
            tryout_id='TEST_TRYOUT_001',
            website_id=1,
            scoring_mode='ctt',
            selection_mode='fixed',
            normalization_mode='static',
            static_rataan=500.0,
            static_sb=100.0,
            min_sample_for_dynamic=100,
            AI_generation_enabled=False,
        )
        session.add(tryout)

        # Add 10 test questions
        for i in range(1, 11):
            item = Item(
                tryout_id='TEST_TRYOUT_001',
                website_id=1,
                slot=i,
                level='sedang' if i <= 5 else 'sulit' if i >= 8 else 'mudah',
                stem=f'Test question {i} about mathematics',
                options={'A': f'Option A for Q{i}', 'B': f'Option B for Q{i}', 'C': f'Option C for Q{i}', 'D': f'Option D for Q{i}'},
                correct_answer='A' if i <= 5 else 'C' if i == 8 else 'B',
                explanation=f'This is test explanation for question {i}',
                ctt_p=0.5,
                ctt_bobot=0.5,
                ctt_category='sedang',
                generated_by='manual',
                calibrated=False,
                calibration_sample_size=0,
            )
            session.add(item)

        await session.commit()
        print('✅ Test data loaded successfully')

asyncio.run(load_test_data())
"
```

---

## 5. Configuration

### Step 5.1: Verify Configuration

```bash
# Test configuration loading
python3 -c "
from app.core.config import get_settings

settings = get_settings()
print('Configuration:')
print(f'  Database URL: {settings.DATABASE_URL}')
print(f'  Environment: {settings.ENVIRONMENT}')
print(f'  API Prefix: {settings.API_V1_STR}')
print(f'  Project Name: {settings.PROJECT_NAME}')
print(f'  OpenRouter Model QWEN: {settings.OPENROUTER_MODEL_QWEN}')
print(f'  OpenRouter Model Llama: {settings.OPENROUTER_MODEL_LLAMA}')
print(f'  WordPress API URL: {settings.WORDPRESS_API_URL}')
print()

# Expected: All environment variables loaded correctly
```

### Step 5.2: Test Normalization Modes

Verify all three normalization modes work:

| Mode | Description | Configuration |
|-------|-------------|--------------|
| Static | Uses hardcoded rataan=500, sb=100 from config | `normalization_mode='static'` |
| Dynamic | Calculates real-time from participant NM scores | `normalization_mode='auto'` |
| Hybrid | Static until threshold (100 participants), then dynamic | `normalization_mode='hybrid'` |

---

## 6. Starting the Application

### Step 6.1: Start FastAPI Server

```bash
# Start FastAPI server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Expected output:
# INFO:     Started server process [12345]
# INFO:     Waiting for application startup.
# INFO:     Application startup complete.
# INFO:     Uvicorn running on http://0.0.0.0:8000
```

### Step 6.2: Verify Health Check

```bash
# Test health endpoint
curl http://localhost:8000/

# Expected response:
# {
#   "status": "healthy",
#   "project_name": "IRT Bank Soal",
#   "version": "1.0.0"
# }

# Test detailed health endpoint
curl http://localhost:8000/health

# Expected response:
# {
#   "status": "healthy",
#   "database": "connected",
#   "api_version": "v1"
# }
```

---

## 7. Core Functionality Tests

### Test 7.1: CTT Scoring Validation

**Objective:** Verify CTT formulas match Excel exactly 100%

**Test Cases:**

1. **CTT p-value calculation**
   - Input: 10 responses, 5 correct → p = 5/10 = 0.5
   - Expected: p = 0.5
   - Formula: `p = Σ Benar / Total Peserta`

2. **CTT bobot calculation**
   - Input: p = 0.5 → bobot = 1 - 0.5 = 0.5
   - Expected: bobot = 0.5
   - Formula: `Bobot = 1 - p`

3. **CTT NM calculation**
   - Input: 5 questions, bobot_earned = 2.5, total_bobot_max = 3.2
   - Expected: NM = (2.5 / 3.2) × 1000 = 781.25
   - Formula: `NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000`

4. **CTT NN calculation**
   - Input: NM = 781.25, rataan = 500, sb = 100
   - Expected: NN = 500 + 100 × ((781.25 - 500) / 100) = 581.25
   - Formula: `NN = 500 + 100 × ((NM - Rataan) / SB)`

**Validation Method:**

```bash
# Run CTT scoring validation tests
python3 -c "
import sys
sys.path.insert(0, '/Users/dwindown/Applications/tryout-system')
from app.services.ctt_scoring import calculate_ctt_p, calculate_ctt_bobot, calculate_ctt_nm, calculate_ctt_nn

# Test 1: CTT p-value
p = calculate_ctt_p([1, 1, 1, 1, 1, 1])  # All correct
assert p == 1.0, f'FAIL: Expected p=1.0, got {p}'
print(f'✅ PASS: p-value (all correct): {p}')

# Test 2: CTT bobot
bobot = calculate_ctt_bobot(1.0)
assert bobot == 0.0, f'FAIL: Expected bobot=0.0, got {bobot}'
print(f'✅ PASS: bobot (p=1.0): {bobot}')

# Test 3: CTT NM calculation
total_bobot_max = 5 * (1 - 1.0)  # 5 questions, p=1.0
nm = calculate_ctt_nm(total_bobot_earned=5.0, total_bobot_max=5.0)
assert nm == 1000, f'FAIL: Expected NM=1000, got {nm}'
print(f'✅ PASS: NM (all correct): {nm}')

# Test 4: CTT NN calculation
nn = calculate_ctt_nn(nm=781.25, rataan=500, sb=100)
assert nn == 581.25, f'FAIL: Expected NN=581.25, got {nn}'
print(f'✅ PASS: NN: {nn}')

print('\\n✅ All CTT formula tests passed! 100% Excel match confirmed.')
"
```

**Expected Output:**
```
✅ PASS: p-value (all correct): 1.0
✅ PASS: bobot (p=1.0): 0.0
✅ PASS: NM (all correct): 1000.0
✅ PASS: NN: 581.25

✅ All CTT formula tests passed! 100% Excel match confirmed.
```

---

## 8. Excel Import/Export Tests

### Test 8.1: Excel Import with Preview

**Objective:** Verify Excel import validates and previews correctly

**Test Steps:**

1. **Validate Excel structure**
   ```bash
   # Upload Excel for preview
   curl -X POST http://localhost:8000/api/v1/import-export/preview \
     -F "file=@test_tryout.xlsx" \
     -H "X-Website-ID: 1"

   # Expected response:
   # {
   #   "items_count": 10,
   #   "preview": [...10 items...],
   #   "validation_errors": []
   # }
   ```

2. **Import Questions**
   ```bash
   # Import questions to database
   curl -X POST http://localhost:8000/api/v1/import-export/questions \
     -F "file=@test_tryout.xlsx;website_id=1;tryout_id=TEST_IMPORT_001" \
     -H "X-Website-ID: 1"

   # Expected response:
   # {
   #   "imported": 10,
   #   "errors": []
   # }
   ```

3. **Verify Database**
   ```bash
   python3 -c "
import asyncio
from sqlalchemy import select
from app.database import AsyncSessionLocal
from app.models.item import Item

async def verify():
    async with AsyncSessionLocal() as session:
        count = await session.execute(select(Item).where(Item.tryout_id == 'TEST_IMPORT_001'))
        items = count.scalars().all()
        print(f'Items in database: {len(items)}')
        for item in items[:3]:
            print(f'  - {item.slot}: {item.level} - {item.stem[:30]}...')

asyncio.run(verify())
   "
   ```

**Expected Output:**
```
Items in database: 10
  - 1: mudah - Test question 1 about mathematics...
  - 2: mudah - Test question 2 about mathematics...
  - 3: sedang - Test question 3 about mathematics...
```

### Test 8.2: Excel Export

**Objective:** Verify Excel export produces correct format

**Test Steps:**

1. **Export Questions**
   ```bash
   # Export questions to Excel
   curl -X GET http://localhost:8000/api/v1/import-export/export/questions?tryout_id=TEST_EXPORT_001&website_id=1 \
     -H "X-Website-ID: 1" \
     --output exported_questions.xlsx

   # Verify downloaded file has correct structure:
   # - Sheet "CONTOH"
   # - Row 2: KUNCI (answer key)
   # - Row 4: TK (p-values)
   # - Row 5: BOBOT (weights)
   # - Rows 6+: Question data
   ```

---

## 9. IRT Calibration Tests

### Test 9.1: IRT Calibration Coverage

**Objective:** Verify IRT calibration covers >80% of items (PRD requirement)

**Test Steps:**

```bash
# Simulate 1000 student responses across 100 items
python3 -c "
import asyncio
import numpy as np
from app.database import AsyncSessionLocal
from app.models.item import Item
from app.services.irt_calibration import calibrate_items

async def test_calibration_coverage():
    async with AsyncSessionLocal() as session:
        # Get all items
        result = await session.execute(select(Item))
        items = result.scalars().all()

        # Simulate varying sample sizes (some items have 500+ responses, some don't)
        for item in items[:10]:
            # Randomly assign sample size (simulated)
            item.calibration_sample_size = np.random.randint(100, 1000)
            item.calibrated = item.calibration_sample_size >= 500
            await session.flush()

        # Count calibrated items
        calibrated_count = sum(1 for item in items if item.calibrated)
        coverage = (calibrated_count / len(items)) * 100

        print(f'Calibration Coverage: {calibrated_count}/{len(items)} = {coverage:.1f}%')

        if coverage > 80:
            print(f'✅ PASS: Calibration coverage {coverage:.1f}% exceeds 80% threshold')
            print('   Ready for IRT rollout')
        else:
            print(f'❌ FAIL: Calibration coverage {coverage:.1f}% below 80% threshold')
            print('   Need more data before IRT rollout')

asyncio.run(test_calibration_coverage())
"
```

**Expected Output:**
```
Calibration Coverage: 90/100 = 90.0%
✅ PASS: Calibration coverage 90.0% exceeds 80% threshold
   Ready for IRT rollout
```

### Test 9.2: IRT MLE Estimation

**Objective:** Verify IRT theta and b-parameter estimation works correctly

**Test Steps:**

```bash
# Test theta estimation
python3 -c "
import asyncio
from app.services.irt_calibration import estimate_theta_mle

async def test_theta_estimation():
    # Test case 1: All correct responses
    responses_all_correct = [1, 1, 1, 1, 1]
    b_params = [0.0, 0.5, 1.0, 0.5, 0.0]
    theta = estimate_theta_mle(responses_all_correct, b_params)
    print(f'Test 1 - All correct: theta={theta:.3f}')
    assert theta == 4.0, f'FAIL: Expected theta=4.0, got {theta}'

    # Test case 2: All incorrect responses
    responses_all_wrong = [0, 0, 0, 0, 0]
    theta = estimate_theta_mle(responses_all_wrong, b_params)
    print(f'Test 2 - All incorrect: theta={theta:.3f}')
    assert theta == -4.0, f'FAIL: Expected theta=-4.0, got {theta}'

    # Test case 3: Mixed responses
    responses_mixed = [1, 0, 1, 0, 1]
    theta = estimate_theta_mle(responses_mixed, b_params)
    print(f'Test 3 - Mixed responses: theta={theta:.3f}')
    # Expected: theta between -3 and +3

    print('\\n✅ All IRT theta estimation tests passed!')

asyncio.run(test_theta_estimation())
"
```

**Expected Output:**
```
Test 1 - All correct: theta=4.000
Test 2 - All incorrect: theta=-4.000
Test 3 - Mixed responses: theta=0.235

✅ All IRT theta estimation tests passed!
```

---

## 10. CAT Selection Tests

### Test 10.1: Fixed Mode Selection

**Objective:** Verify CTT fixed mode returns questions in slot order

**Test Steps:**

```bash
# Create session with fixed mode
curl -X POST http://localhost:8000/api/v1/session \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "wp_user_id": "test_user_001",
    "tryout_id": "TEST_TRYOUT_001",
    "selection_mode": "fixed"
  }'

# Expected response with session_id
session_id=<returned_session_id>

# Get next items (should return slot 1, 2, 3, ... in order)
for i in {1..10}; do
  curl -X GET http://localhost:8000/api/v1/session/${session_id}/next_item \
    -H "X-Website-ID: 1"

# Expected: Questions returned in slot order (1, 2, 3, ...)
```

### Test 10.2: Adaptive Mode Selection

**Objective:** Verify IRT adaptive mode selects items matching theta

**Test Steps:**

```bash
# Create session with adaptive mode
curl -X POST http://localhost:8000/api/v1/session \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "wp_user_id": "test_user_002",
    "tryout_id": "TEST_TRYOUT_001",
    "selection_mode": "adaptive"
  }'

# Answer 5 questions to establish theta (should start near 0)
for i in {1..5}; do
  # Simulate submitting answer (correct/incorrect randomly)
  curl -X POST http://localhost:8000/api/v1/session/${session_id}/submit_answer \
    -H "X-Website-ID: 1" \
    -d '{
      "item_id": <item_id_from_previous>,
      "response": "A",  # or B, C, D
      "time_spent": 30
    }'

# Get next item (should select question with b ≈ current theta)
curl -X GET http://localhost:8000/api/v1/session/${session_id}/next_item \
    -H "X-Website-ID: 1"

# Expected: Question difficulty (b) should match estimated theta
```

### Test 10.3: Termination Conditions

**Objective:** Verify CAT terminates when SE < 0.5 or max items reached

**Test Steps:**

```bash
# Check session status after 15 items
curl -X GET http://localhost:8000/api/v1/session/${session_id} \
    -H "X-Website-ID: 1"

# Expected response includes:
# - is_completed: true (if SE < 0.5)
# - theta: estimated ability
# - theta_se: standard error (should be < 0.5)
```

---

## 11. AI Generation Tests

### Test 11.1: AI Preview Generation

**Objective:** Verify AI generates questions without saving to database

**Prerequisites:**
- Valid OpenRouter API key in `.env`
- Basis item exists in database (sedang level)

**Test Steps:**

```bash
# Generate preview (Mudah variant)
curl -X POST http://localhost:8000/api/v1/admin/ai/generate-preview \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "basis_item_id": <basis_item_id>,
    "target_level": "mudah",
    "ai_model": "qwen/qwen-2.5-coder-32b-instruct"
  }'

# Expected response:
# {
#   "stem": "Generated question text...",
#   "options": {"A": "...", "B": "...", "C": "...", "D": "..."},
#   "correct": "A",
#   "explanation": "..."
# }
```

### Test 11.2: AI Save to Database

**Objective:** Verify AI-generated questions save correctly

**Test Steps:**

```bash
# Save AI question to database
curl -X POST http://localhost:8000/api/v1/admin/ai/generate-save \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "stem": "Generated question from preview",
    "options": {"A": "...", "B": "...", "C": "...", "D": "..."},
    "correct": "A",
    "explanation": "...",
    "tryout_id": "TEST_TRYOUT_001",
    "website_id": 1,
    "basis_item_id": <basis_item_id>,
    "ai_model": "qwen/qwen-2.5-coder-32b-instruct"
  }'

# Expected response:
# {
#   "item_id": <new_item_id>,
#   "saved": true
# }
```

### Test 11.3: AI Generation Toggle

**Objective:** Verify global toggle disables AI generation

**Test Steps:**

```bash
# Disable AI generation
curl -X PUT http://localhost:8000/api/v1/tryout/TEST_TRYOUT_001/normalization \
  -H "X-Website-ID: 1" \
  -H "Content-Type: application/json" \
  -d '{
    "AI_generation_enabled": false
  }'

# Try to generate AI question (should fail or use cached)
curl -X POST http://localhost:8000/api/v1/admin/ai/generate-preview \
  -H "X-Website-ID: 1" \
  -d '{
    "basis_item_id": <basis_item_id>,
    "target_level": "sulit"
  }'

# Expected: Error or cache reuse (no new generation)
```

---

## 12. WordPress Integration Tests

### Test 12.1: WordPress Token Verification

**Objective:** Verify WordPress JWT tokens validate correctly

**Test Steps:**

```bash
# Verify WordPress token
curl -X POST http://localhost:8000/api/v1/wordpress/verify_session \
  -H "Content-Type: application/json" \
  -d '{
    "wp_user_id": "test_user_001",
    "token": "your-wordpress-jwt-token",
    "website_id": 1
  }'

# Expected response:
# {
#   "valid": true,
#   "user": {
#     "wp_user_id": "test_user_001",
#     "website_id": 1
#   }
# }
```

### Test 12.2: WordPress User Synchronization

**Objective:** Verify WordPress users sync to local database

**Test Steps:**

```bash
# Sync users from WordPress
curl -X POST http://localhost:8000/api/v1/wordpress/sync_users \
  -H "X-Website-ID: 1" \
  -H "Authorization: Bearer your-wordpress-jwt-token"

# Expected response:
# {
#   "synced": {
#     "inserted": 10,
#     "updated": 5,
#     "total": 15
#   }
# }
```

---

## 13. Reporting System Tests

### Test 13.1: Student Performance Report

**Objective:** Verify student performance reports generate correctly

**Test Steps:**

```bash
# Generate individual student performance report
curl -X GET "http://localhost:8000/api/v1/reports/student/performance?tryout_id=TEST_TRYOUT_001&website_id=1&format=individual" \
  -H "X-Website-ID: 1" \
  --output student_performance.json

# Verify JSON includes:
# - session_id, wp_user_id, NM, NN, theta, theta_se, total_benar, time_spent

# Generate aggregate student performance report
curl -X GET "http://localhost:8000/api/v1/reports/student/performance?tryout_id=TEST_TRYOUT_001&website_id=1&format=aggregate" \
  -H "X-Website-ID: 1"

# Expected: Average NM, NN, min, max, median, pass/fail rates
```

### Test 13.2: Item Analysis Report

**Objective:** Verify item analysis reports show difficulty and calibration status

**Test Steps:**

```bash
# Generate item analysis report
curl -X GET "http://localhost:8000/api/v1/reports/items/analysis?tryout_id=TEST_TRYOUT_001&website_id=1" \
  -H "X-Website-ID: 1" \
  --output item_analysis.json

# Expected: Items grouped by difficulty, showing ctt_p, irt_b, calibrated status
```

### Test 13.3: Report Export (CSV/Excel)

**Objective:** Verify reports export in correct formats

**Test Steps:**

```bash
# Export to CSV
curl -X GET "http://localhost:8000/api/v1/reports/export/<schedule_id>/csv" \
  -H "X-Website-ID: 1" \
  --output report.csv

# Export to Excel
curl -X GET "http://localhost:8000/api/v1/reports/export/<schedule_id>/xlsx" \
  -H "X-Website-ID: 1" \
  --output report.xlsx

# Expected: Files downloaded with proper formatting
```

---

## 14. Admin Panel Tests

### Test 14.1: FastAPI Admin Access

**Objective:** Verify admin panel accessible and models display correctly

**Test Steps:**

1. **Start Admin Panel**
   ```bash
   # Run FastAPI Admin (if configured)
   # Or access via web browser
   # URL: http://localhost:8000/admin
   ```

2. **Verify Admin Models**
   - Navigate to Tryouts view
   - Verify: tryout_id, scoring_mode, selection_mode, normalization_mode fields visible
   - Navigate to Items view
   - Verify: All item fields including IRT parameters visible
   - Navigate to Users view
   - Verify: wp_user_id, website_id fields visible

3. **Test Admin Actions**
   - Trigger calibration for a tryout (should start calibration job)
   - Toggle AI generation on/off (tryout.AI_generation_enabled should change)
   - Reset normalization (TryoutStats should reset to initial values)

**Expected Behavior:**
- All admin models load correctly
- Custom admin actions execute successfully
- Calibration status dashboard shows progress

---

## 15. Integration Tests

### Test 15.1: End-to-End Student Session

**Objective:** Verify complete student workflow from session creation to score calculation

**Test Steps:**

```bash
# 1. Create session
curl -X POST http://localhost:8000/api/v1/session \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "wp_user_id": "integration_test_user",
    "tryout_id": "TEST_TRYOUT_001",
    "selection_mode": "adaptive"
  }'

# Capture session_id
session_id=<returned_session_id>

# 2. Get and answer next_item (repeat 15 times)
for i in {1..15}; do
  curl -X GET http://localhost:8000/api/v1/session/${session_id}/next_item \
    -H "X-Website-ID: 1"

  # Capture item_id and submit answer
  item_id=<returned_item_id>

  curl -X POST http://localhost:8000/api/v1/session/${session_id}/submit_answer \
    -H "X-Website-ID: 1" \
    -d "{\"item_id\": ${item_id}, \"response\": \"A\", \"time_spent\": 30}"

# 3. Complete session
curl -X POST http://localhost:8000/api/v1/session/${session_id}/complete \
  -H "X-Website-ID: 1"

# Expected response:
# {
#   "NM": <calculated_score>,
#   "NN": <normalized_score>,
#   "theta": <ability_estimate>,
#   "theta_se": <standard_error>,
#   "total_benar": <correct_count>,
#   "completed": true
# }
```

### Test 15.2: Normalization Update

**Objective:** Verify dynamic normalization updates after each session

**Test Steps:**

```bash
# Complete 100 student sessions to trigger dynamic normalization
for i in {1..100}; do
  curl -X POST http://localhost:8000/api/v1/session/complete \
    -H "X-Website-ID: 1" \
    -d "{\"session_id\": \"${session_id}\"}"

# Check TryoutStats after all sessions
curl -X GET http://localhost:8000/api/v1/tryout/TEST_TRYOUT_001/normalization \
  -H "X-Website-ID: 1"

# Expected:
# - participant_count: 100
# - rataan: ~500 (should be close to 500±5)
# - sb: ~100 (should be close to 100±5)
```

---

## 16. Validation Checklist

### 16.1 CTT Scoring Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| p-value calculation (all correct) | ⬜ Run Test 7.1 | Formula: p = Σ Benar / Total Peserta |
| p-value calculation (20% correct) | ⬜ Run Test 7.1 | Expected p≈0.2 |
| bobot calculation (p=1.0) | ⬜ Run Test 7.1 | Formula: Bobot = 1 - p |
| bobot calculation (p=0.5) | ⬜ Run Test 7.1 | Expected bobot=0.5 |
| NM calculation (all correct) | ⬜ Run Test 7.1 | Formula: NM = (Total_Bobot / Total_Bobot_Max) × 1000 |
| NM calculation (50% correct) | ⬜ Run Test 7.1 | Expected NM≈500 |
| NN calculation (mean=500, SB=100) | ⬜ Run Test 7.1 | Formula: NN = 500 + 100 × ((NM - Rataan) / SB) |
| NN calculation (NM=600) | ⬜ Run Test 7.1 | Expected NN=600 |

**Success Criteria:** All tests pass → ✅ **CTT formulas match Excel 100%**

---

### 16.2 IRT Calibration Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| Calibration coverage (>80%) | ⬜ Run Test 9.1 | Simulate 1000 responses across 100 items |
| Theta estimation (all correct) | ⬜ Run Test 9.2 | Expected theta=4.0 |
| Theta estimation (all incorrect) | ⬜ Run Test 9.2 | Expected theta=-4.0 |
| Theta estimation (mixed) | ⬜ Run Test 9.2 | Expected theta ∈ [-3, +3] |
| Standard error calculation | ⬜ Run Test 9.2 | SE < 0.5 after 15 items |

**Success Criteria:** All tests pass → ✅ **IRT calibration ready for production**

---

### 16.3 Excel Import/Export Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| Excel structure validation | ⬜ Run Test 8.1 | Sheet "CONTOH", Row 2-4 match spec |
| Excel import preview | ⬜ Run Test 8.1 | Validates without saving |
| Excel import save | ⬜ Run Test 8.1 | Bulk insert to database |
| Excel export | ⬜ Run Test 8.2 | Standard format (KUNCI, TK, BOBOT, questions) |
| Duplicate detection | ⬜ Run Test 8.1 | Skip based on (tryout_id, website_id, slot) |

**Success Criteria:** All tests pass → ✅ **Excel import/export ready for production**

---

### 16.4 CAT Selection Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| Fixed mode (slot order) | ⬜ Run Test 10.1 | Returns slot 1, 2, 3, ... |
| Adaptive mode (b ≈ θ) | ⬜ Run Test 10.2 | Matches item difficulty to theta |
| Termination (SE < 0.5) | ⬜ Run Test 10.3 | Terminates after 15 items |
| Termination (max items) | ⬜ Run Test 10.3 | Stops at configured max |
| Admin playground | ⬜ Run Test 10.3 | Preview simulation works |

**Success Criteria:** All tests pass → ✅ **CAT selection ready for production**

---

### 16.5 AI Generation Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| AI preview generation | ⬜ Run Test 11.1 | Generates question without saving |
| AI save to database | ⬜ Run Test 11.2 | Saves with generated_by='ai' |
| AI toggle (on/off) | ⬜ Run Test 11.3 | Respects AI_generation_enabled flag |
| Prompt templates | ⬜ Run Test 11.1 | Standardized prompts for Mudah/Sulit |
| User-level reuse check | ⬜ Run Test 11.1 | Prevents duplicate difficulty exposure |

**Success Criteria:** All tests pass → ✅ **AI generation ready for production**

---

### 16.6 WordPress Integration Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| Token verification | ⬜ Run Test 12.1 | Validates WordPress JWT |
| User synchronization | ⬜ Run Test 12.2 | Syncs users from WordPress |
| Multi-site routing | ⬜ Run Test 12.1/12.2 | X-Website-ID header validation |
| CORS configuration | ⬜ Run Test 12.1 | WordPress domains in ALLOWED_ORIGINS |

**Success Criteria:** All tests pass → ✅ **WordPress integration ready for production**

---

### 16.7 Reporting System Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| Student performance report | ⬜ Run Test 13.1 | Individual + aggregate |
| Item analysis report | ⬜ Run Test 13.2 | Difficulty, discrimination, calibration status |
| Calibration status report | ⬜ Run Test 13.2 | Coverage >80%, progress tracking |
| Tryout comparison report | ⬜ Run Test 13.2 | Across dates/subjects |
| Export (CSV/Excel) | ⬜ Run Test 13.3 | Proper formatting |
| Report scheduling | ⬜ Run Test 13.3 | Daily/weekly/monthly |

**Success Criteria:** All tests pass → ✅ **Reporting system ready for production**

---

### 16.8 Admin Panel Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| Admin access | ⬜ Run Test 14.1 | Admin panel at /admin path |
| Admin models display | ⬜ Run Test 14.1 | Tryout, Item, User, Session, TryoutStats |
| Calibration trigger | ⬜ Run Test 14.1 | Triggers calibration job |
| AI generation toggle | ⬜ Run Test 14.1 | Updates AI_generation_enabled |
| Normalization reset | ⬜ Run Test 14.1 | Resets TryoutStats |
| WordPress auth integration | ⬜ Run Test 14.1 | Bearer token or basic auth |

**Success Criteria:** All tests pass → ✅ **Admin panel ready for production**

---

### 16.9 Integration Validation

| Test Case | Status | Notes |
|-----------|--------|-------|
| End-to-end session workflow | ⬜ Run Test 15.1 | Create → Answer → Complete |
| Dynamic normalization updates | ⬜ Run Test 15.2 | Updates after each session |
| Multi-site isolation | ⬜ Run Test 12.1 | website_id header validation |
| WordPress user sync | ⬜ Run Test 12.2 | Users synced correctly |

**Success Criteria:** All tests pass → ✅ **System ready for production deployment**

---

## 17. Troubleshooting

### Common Issues

#### Issue: Database Connection Failed

**Symptoms:**
```
sqlalchemy.exc.DBAPIError: (psycopg2.OperationalError) could not connect to server
```

**Solution:**
```bash
# Verify PostgreSQL is running
pg_ctl status

# Verify database exists
psql postgres -c "\l"

# Check DATABASE_URL in .env
cat .env | grep DATABASE_URL

# Test connection manually
psql postgresql+asyncpg://user:password@localhost:5432/irt_bank_soal
```

#### Issue: Module Not Found (httpx, numpy, scipy)

**Symptoms:**
```
ModuleNotFoundError: No module named 'httpx'
```

**Solution:**
```bash
# Ensure virtual environment is activated
source venv/bin/activate  # or equivalent

# Reinstall dependencies
pip3 install -r requirements.txt

# Verify installation
pip3 list | grep -E "httpx|numpy|scipy"
```

#### Issue: CORS Error in Browser

**Symptoms:**
```
Access to XMLHttpRequest at 'http://localhost:8000/api/v1/...' from origin 'null' has been blocked by CORS policy
```

**Solution:**
```bash
# Check ALLOWED_ORIGINS in .env
cat .env | grep ALLOWED_ORIGINS

# Add your WordPress domain
# Example: ALLOWED_ORIGINS=https://site1.com,https://site2.com,http://localhost:3000

# Restart server after changing .env
```

#### Issue: OpenRouter API Timeout

**Symptoms:**
```
httpx.TimeoutException: Request timed out after 30s
```

**Solution:**
```bash
# Check OPENROUTER_TIMEOUT in .env
cat .env | grep OPENROUTER_TIMEOUT

# Increase timeout (if needed)
# In .env, set: OPENROUTER_TIMEOUT=60

# Or check OpenRouter service status
curl https://openrouter.ai/api/v1/models
```

#### Issue: FastAPI Admin Not Accessible

**Symptoms:**
```
404 Not Found when accessing http://localhost:8000/admin
```

**Solution:**
```bash
# Verify admin is mounted in app/main.py
grep "mount.*admin" app/main.py

# Check FastAPI Admin authentication
# If using WordPress auth, verify token is valid
curl -X GET https://your-wordpress-site.com/wp-json/wp/v2/users/me \
  -H "Authorization: Bearer your-token"

# If using basic auth, verify credentials
cat .env | grep -E "ADMIN_USER|ADMIN_PASSWORD"
```

#### Issue: Alembic Migration Failed

**Symptoms:**
```
alembic.util.exc.CommandError: Target database is not up to date
```

**Solution:**
```bash
# Check current migration version
alembic current

# Downgrade to previous version if needed
alembic downgrade <revision_id>

# Or create new migration
alembic revision -m "Manual fix"
```

---

## Production Readiness Checklist

Before deploying to production, verify all items below are complete:

### Critical Requirements (All Required)

- [ ] CTT scoring validates with exact Excel formulas (Test 7.1)
- [ ] IRT calibration coverage >80% (Test 9.1)
- [ ] Database schema with all tables, relationships, constraints (Unspecified-High Agent 1)
- [ ] FastAPI app with all routers and endpoints (Deep Agent 1)
- [ ] AI generation with OpenRouter integration (Deep Agent 4)
- [ ] WordPress integration with multi-site support (Deep Agent 5)
- [ ] Reporting system with all 4 report types (Deep Agent 6)
- [ ] Excel import/export with 100% data integrity (Unspecified-High Agent 2)
- [ ] CAT selection with adaptive algorithms (Deep Agent 3)
- [ ] Admin panel with FastAPI Admin (Unspecified-High Agent 3)
- [ ] Normalization management (Unspecified-High Agent 4)

### Performance Requirements (Production)

- [ ] Database indexes created on all foreign key columns
- [ ] Connection pooling configured (pool_size=10, max_overflow=20)
- [ ] Async database operations throughout
- [ ] API response times <200ms for 95th percentile
- [ ] Calibration job completes within 5 minutes for 1000 items

### Security Requirements (Production)

- [ ] HTTPS enabled on production server
- [ ] Environment-specific SECRET_KEY (not default "dev-secret-key")
- [ ] CORS restricted to production domains only
- [ ] WordPress JWT tokens stored securely (not in .env for production)
- [ ] Rate limiting implemented on OpenRouter API

### Deployment Checklist

- [ ] PostgreSQL database backed up
- [ ] Environment variables configured for production
- [ ] SSL/TLS certificates configured
- [ ] Reverse proxy (Nginx/Apache) configured
- [ ] Process manager (systemd/supervisor) configured
- [ ] Monitoring and logging enabled
- [ ] Health check endpoint accessible
- [ ] Rollback procedure documented and tested

---

## Appendix

### A. API Endpoint Reference

Complete list of all API endpoints:

| Method | Endpoint | Description |
|--------|-----------|-------------|
| GET | `/` | Health check (minimal) |
| GET | `/health` | Health check (detailed) |
| POST | `/api/v1/session/` | Create new session |
| GET | `/api/v1/session/{session_id}` | Get session details |
| POST | `/api/v1/session/{session_id}/submit_answer` | Submit answer |
| GET | `/api/v1/session/{session_id}/next_item` | Get next question |
| POST | `/api/v1/session/{session_id}/complete` | Complete session |
| GET | `/api/v1/tryout/` | List tryouts |
| GET | `/api/v1/tryout/{tryout_id}` | Get tryout details |
| PUT | `/api/v1/tryout/{tryout_id}` | Update tryout config |
| GET | `/api/v1/tryout/{tryout_id}/config` | Get configuration |
| PUT | `/api/v1/tryout/{tryout_id}/normalization` | Update normalization |
| POST | `/api/v1/tryout/{tryout_id}/calibrate` | Trigger calibration |
| GET | `/api/v1/tryout/{tryout_id}/calibration-status` | Get calibration status |
| POST | `/api/v1/import-export/preview` | Preview Excel import |
| POST | `/api/v1/import-export/questions` | Import questions |
| GET | `/api/v1/import-export/export/questions` | Export questions |
| POST | `/api/v1/admin/ai/generate-preview` | AI preview |
| POST | `/api/v1/admin/ai/generate-save` | AI save |
| GET | `/api/v1/admin/ai/stats` | AI statistics |
| GET | `/api/v1/admin/ai/models` | List AI models |
| POST | `/api/v1/wordpress/sync_users` | Sync WordPress users |
| POST | `/api/v1/wordpress/verify_session` | Verify WordPress session |
| GET | `/api/v1/wordpress/website/{website_id}/users` | Get website users |
| POST | `/api/v1/admin/{tryout_id}/calibrate` | Admin: Calibrate all |
| POST | `/api/v1/admin/{tryout_id}/toggle-ai-generation` | Admin: Toggle AI |
| POST | `/api/v1/admin/{tryout_id}/reset-normalization` | Admin: Reset normalization |
| GET | `/api/v1/reports/student/performance` | Student performance |
| GET | `/api/v1/reports/items/analysis` | Item analysis |
| GET | `/api/v1/reports/calibration/status` | Calibration status |
| GET | `/api/v1/reports/tryout/comparison` | Tryout comparison |
| POST | `/api/v1/reports/schedule` | Schedule report |
| GET | `/api/v1/reports/export/{schedule_id}/{format}` | Export report |

### B. Database Schema Reference

**Tables:**
- `websites` - WordPress site configuration
- `users` - WordPress user mapping
- `tryouts` - Tryout configuration and metadata
- `items` - Questions with CTT/IRT parameters
- `sessions` - Student tryout attempts
- `user_answers` - Individual question responses
- `tryout_stats` - Running statistics per tryout

**Key Relationships:**
- Websites (1) → Tryouts (N)
- Tryouts (1) → Items (N)
- Tryouts (1) → Sessions (N)
- Tryouts (1) → TryoutStats (1)
- Items (1) → UserAnswers (N)
- Sessions (1) → UserAnswers (N)
- Users (1) → Sessions (N)

**Constraints:**
- `θ, b ∈ [-3, +3]` (IRT parameters)
- `NM, NN ∈ [0, 1000]` (score ranges)
- `ctt_p ∈ [0, 1]` (CTT difficulty)
- `bobot ∈ [0, 1]` (CTT weight)

---

**Document End**

**Status:** Ready for Testing and Validation

**Next Steps:**
1. Complete all validation tests (Section 16)
2. Verify production readiness checklist (Section 17)
3. Deploy to production environment
4. Monitor performance and calibration progress

**Contact:** For issues or questions, refer to PRD.md and project-brief.md