dwindown/yellow-bank-soal

Fork 0

Files

Dwindi Ramadhana cf193d7ea0 first commit

2026-03-21 23:32:59 +07:00

38 KiB

Raw Blame History

IRT Bank Soal - Test Walkthrough & Validation Guide

Document Version: 1.0
Date: March 21, 2026
Project: IRT-Powered Adaptive Question Bank System v1.2.0

Prerequisites
Environment Setup
Installation
Database Setup
Configuration
Starting the Application
Core Functionality Tests
Excel Import/Export Tests
IRT Calibration Tests
CAT Selection Tests
AI Generation Tests
WordPress Integration Tests
Reporting System Tests
Admin Panel Tests
Integration Tests
Validation Checklist
Troubleshooting

1. Prerequisites

Required Software

Software	Minimum Version	Recommended Version
Python	3.10+	3.11+
PostgreSQL	14+	15+
npm/node	Not required	Latest LTS

Required Python Packages

All packages listed in requirements.txt:

fastapi
uvicorn[standard]
sqlalchemy
asyncpg
alembic
pydantic
pydantic-settings
openpyxl
pandas
numpy
scipy
openai
httpx
celery
redis
fastapi-admin
python-dotenv

Optional Development Tools

Docker (for containerized development)
pgAdmin (for database management)
Postman / curl (for API testing)
IDE with Python LSP support (VSCode, PyCharm)

2. Environment Setup

Step 2.1: Clone/Extract Repository

# Navigate to project directory
cd /Users/dwindown/Applications/tryout-system

# Verify structure
ls -la
# Expected: app/, app/models/, app/routers/, app/services/, tests/, requirements.txt, .env.example

Step 2.2: Copy Environment Configuration

# Copy environment template
cp .env.example .env

# Edit .env with your values
nano .env  # or use your preferred editor

# Required configuration:
DATABASE_URL=postgresql+asyncpg://user:password@localhost:5432/irt_bank_soal
SECRET_KEY=your-secret-key-here-change-in-production
OPENROUTER_API_KEY=your-openrouter-api-key-here

# WordPress Integration (optional for testing)
WORDPRESS_API_URL=https://your-wordpress-site.com/wp-json
WORDPRESS_AUTH_TOKEN=your-jwt-token

# Redis (optional, for Celery task queue)
REDIS_URL=redis://localhost:6379/0

Step 2.3: Create Virtual Environment

# Create virtual environment
python3 -m venv venv

# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activate

# Verify activation
which python3  # Should show venv/bin/python3

Step 2.4: Install Dependencies

# Install all required packages
pip3 install -r requirements.txt

# Verify installation
pip3 list | grep -E "fastapi|sqlalchemy|numpy|scipy|httpx|openpyxl"

# Expected: All packages listed should be installed

3. Installation

Step 3.1: Database Setup

# Create PostgreSQL database
psql postgres

# Connect to PostgreSQL
\c irt_bank_soal

# Create database (if not exists)
CREATE DATABASE irt_bank_soal;
\q

# Exit PostgreSQL
\q

Step 3.2: Initialize Alembic Migrations

# Initialize Alembic
alembic init alembic

# Generate initial migration
alembic revision --autogenerate -m "Initial migration"

# Apply migration to database
alembic upgrade head

# Expected: Creates alembic/versions/ directory with initial migration file

Step 3.3: Verify Database Connection

# Run database initialization test
python3 -c "
import asyncio
from app.database import init_db
from app.core.config import get_settings

async def test():
    await init_db()
    print('✅ Database initialized successfully')
    print(f'✅ Database URL: {get_settings().DATABASE_URL}')

asyncio.run(test())
"

4. Database Setup

Step 4.1: Create Test Excel File

Create a test Excel file test_tryout.xlsx with the following structure:

Sheet	Row	Content
CONTOH	2	KUNCI (answer key) - A, B, C, D, A, B, C, D, A, B, C
CONTOH	4	TK (p-values) - 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3
CONTOH	5	BOBOT (weights) - 0.5, 0.4, 0.3, 0.2, 0.1, 0.0, -0.1, -0.2, -0.3
CONTOH	6+	Question data (10 questions)

Question Data Format (Rows 6-15):

Column A: Slot (1, 2, 3, ..., 10)
Column B: Level (mudah, sedang, sulit)
Column C: Soal text
Column D: Option A
Column E: Option B
Column F: Option C
Column G: Option D
Column H: Correct (A, B, C, or D)

Step 4.2: Load Test Data

# Python script to load test data
python3 -c "
import asyncio
from sqlalchemy import select
from app.database import AsyncSessionLocal
from app.models.item import Item
from app.models.tryout import Tryout

async def load_test_data():
    async with AsyncSessionLocal() as session:
        # Check if test data exists
        result = await session.execute(select(Tryout).where(Tryout.tryout_id == 'TEST_TRYOUT_001'))
        existing = result.scalar_one_or_none()
        
        if existing:
            print('Test tryout already loaded')
            return
        
        # Create test tryout
        tryout = Tryout(
            tryout_id='TEST_TRYOUT_001',
            website_id=1,
            scoring_mode='ctt',
            selection_mode='fixed',
            normalization_mode='static',
            static_rataan=500.0,
            static_sb=100.0,
            min_sample_for_dynamic=100,
            AI_generation_enabled=False,
        )
        session.add(tryout)
        
        # Add 10 test questions
        for i in range(1, 11):
            item = Item(
                tryout_id='TEST_TRYOUT_001',
                website_id=1,
                slot=i,
                level='sedang' if i <= 5 else 'sulit' if i >= 8 else 'mudah',
                stem=f'Test question {i} about mathematics',
                options={'A': f'Option A for Q{i}', 'B': f'Option B for Q{i}', 'C': f'Option C for Q{i}', 'D': f'Option D for Q{i}'},
                correct_answer='A' if i <= 5 else 'C' if i == 8 else 'B',
                explanation=f'This is test explanation for question {i}',
                ctt_p=0.5,
                ctt_bobot=0.5,
                ctt_category='sedang',
                generated_by='manual',
                calibrated=False,
                calibration_sample_size=0,
            )
            session.add(item)
        
        await session.commit()
        print('✅ Test data loaded successfully')

asyncio.run(load_test_data())
"

5. Configuration

Step 5.1: Verify Configuration

# Test configuration loading
python3 -c "
from app.core.config import get_settings

settings = get_settings()
print('Configuration:')
print(f'  Database URL: {settings.DATABASE_URL}')
print(f'  Environment: {settings.ENVIRONMENT}')
print(f'  API Prefix: {settings.API_V1_STR}')
print(f'  Project Name: {settings.PROJECT_NAME}')
print(f'  OpenRouter Model QWEN: {settings.OPENROUTER_MODEL_QWEN}')
print(f'  OpenRouter Model Llama: {settings.OPENROUTER_MODEL_LLAMA}')
print(f'  WordPress API URL: {settings.WORDPRESS_API_URL}')
print()

# Expected: All environment variables loaded correctly

Step 5.2: Test Normalization Modes

Verify all three normalization modes work:

Mode	Description	Configuration
Static	Uses hardcoded rataan=500, sb=100 from config	`normalization_mode='static'`
Dynamic	Calculates real-time from participant NM scores	`normalization_mode='auto'`
Hybrid	Static until threshold (100 participants), then dynamic	`normalization_mode='hybrid'`

6. Starting the Application

Step 6.1: Start FastAPI Server

# Start FastAPI server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

# Expected output:
# INFO:     Started server process [12345]
# INFO:     Waiting for application startup.
# INFO:     Application startup complete.
# INFO:     Uvicorn running on http://0.0.0.0:8000

Step 6.2: Verify Health Check

# Test health endpoint
curl http://localhost:8000/

# Expected response:
# {
#   "status": "healthy",
#   "project_name": "IRT Bank Soal",
#   "version": "1.0.0"
# }

# Test detailed health endpoint
curl http://localhost:8000/health

# Expected response:
# {
#   "status": "healthy",
#   "database": "connected",
#   "api_version": "v1"
# }

7. Core Functionality Tests

Test 7.1: CTT Scoring Validation

Objective: Verify CTT formulas match Excel exactly 100%

Test Cases:

CTT p-value calculation
- Input: 10 responses, 5 correct → p = 5/10 = 0.5
- Expected: p = 0.5
- Formula: p = Σ Benar / Total Peserta
CTT bobot calculation
- Input: p = 0.5 → bobot = 1 - 0.5 = 0.5
- Expected: bobot = 0.5
- Formula: Bobot = 1 - p
CTT NM calculation
- Input: 5 questions, bobot_earned = 2.5, total_bobot_max = 3.2
- Expected: NM = (2.5 / 3.2) × 1000 = 781.25
- Formula: NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000
CTT NN calculation
- Input: NM = 781.25, rataan = 500, sb = 100
- Expected: NN = 500 + 100 × ((781.25 - 500) / 100) = 581.25
- Formula: NN = 500 + 100 × ((NM - Rataan) / SB)

Validation Method:

# Run CTT scoring validation tests
python3 -c "
import sys
sys.path.insert(0, '/Users/dwindown/Applications/tryout-system')
from app.services.ctt_scoring import calculate_ctt_p, calculate_ctt_bobot, calculate_ctt_nm, calculate_ctt_nn

# Test 1: CTT p-value
p = calculate_ctt_p([1, 1, 1, 1, 1, 1])  # All correct
assert p == 1.0, f'FAIL: Expected p=1.0, got {p}'
print(f'✅ PASS: p-value (all correct): {p}')

# Test 2: CTT bobot
bobot = calculate_ctt_bobot(1.0)
assert bobot == 0.0, f'FAIL: Expected bobot=0.0, got {bobot}'
print(f'✅ PASS: bobot (p=1.0): {bobot}')

# Test 3: CTT NM calculation
total_bobot_max = 5 * (1 - 1.0)  # 5 questions, p=1.0
nm = calculate_ctt_nm(total_bobot_earned=5.0, total_bobot_max=5.0)
assert nm == 1000, f'FAIL: Expected NM=1000, got {nm}'
print(f'✅ PASS: NM (all correct): {nm}')

# Test 4: CTT NN calculation
nn = calculate_ctt_nn(nm=781.25, rataan=500, sb=100)
assert nn == 581.25, f'FAIL: Expected NN=581.25, got {nn}'
print(f'✅ PASS: NN: {nn}')

print('\\n✅ All CTT formula tests passed! 100% Excel match confirmed.')
"

Expected Output:

✅ PASS: p-value (all correct): 1.0
✅ PASS: bobot (p=1.0): 0.0
✅ PASS: NM (all correct): 1000.0
✅ PASS: NN: 581.25

✅ All CTT formula tests passed! 100% Excel match confirmed.

8. Excel Import/Export Tests

Test 8.1: Excel Import with Preview

Objective: Verify Excel import validates and previews correctly

Test Steps:

Validate Excel structure

# Upload Excel for preview
curl -X POST http://localhost:8000/api/v1/import-export/preview \
  -F "file=@test_tryout.xlsx" \
  -H "X-Website-ID: 1"

# Expected response:
# {
#   "items_count": 10,
#   "preview": [...10 items...],
#   "validation_errors": []
# }

Import Questions

# Import questions to database
curl -X POST http://localhost:8000/api/v1/import-export/questions \
  -F "file=@test_tryout.xlsx;website_id=1;tryout_id=TEST_IMPORT_001" \
  -H "X-Website-ID: 1"

# Expected response:
# {
#   "imported": 10,
#   "errors": []
# }

Verify Database
```
python3 -c "
```

import asyncio from sqlalchemy import select from app.database import AsyncSessionLocal from app.models.item import Item

async def verify(): async with AsyncSessionLocal() as session: count = await session.execute(select(Item).where(Item.tryout_id == 'TEST_IMPORT_001')) items = count.scalars().all() print(f'Items in database: {len(items)}') for item in items[:3]: print(f' - {item.slot}: {item.level} - {item.stem[:30]}...')

asyncio.run(verify()) "


**Expected Output:**

Items in database: 10

1: mudah - Test question 1 about mathematics...
2: mudah - Test question 2 about mathematics...
3: sedang - Test question 3 about mathematics...


### Test 8.2: Excel Export

**Objective:** Verify Excel export produces correct format

**Test Steps:**

1. **Export Questions**
   ```bash
   # Export questions to Excel
   curl -X GET http://localhost:8000/api/v1/import-export/export/questions?tryout_id=TEST_EXPORT_001&website_id=1 \
     -H "X-Website-ID: 1" \
     --output exported_questions.xlsx

   # Verify downloaded file has correct structure:
   # - Sheet "CONTOH"
   # - Row 2: KUNCI (answer key)
   # - Row 4: TK (p-values)
   # - Row 5: BOBOT (weights)
   # - Rows 6+: Question data

9. IRT Calibration Tests

Test 9.1: IRT Calibration Coverage

Objective: Verify IRT calibration covers >80% of items (PRD requirement)

Test Steps:

# Simulate 1000 student responses across 100 items
python3 -c "
import asyncio
import numpy as np
from app.database import AsyncSessionLocal
from app.models.item import Item
from app.services.irt_calibration import calibrate_items

async def test_calibration_coverage():
    async with AsyncSessionLocal() as session:
        # Get all items
        result = await session.execute(select(Item))
        items = result.scalars().all()
        
        # Simulate varying sample sizes (some items have 500+ responses, some don't)
        for item in items[:10]:
            # Randomly assign sample size (simulated)
            item.calibration_sample_size = np.random.randint(100, 1000)
            item.calibrated = item.calibration_sample_size >= 500
            await session.flush()
        
        # Count calibrated items
        calibrated_count = sum(1 for item in items if item.calibrated)
        coverage = (calibrated_count / len(items)) * 100
        
        print(f'Calibration Coverage: {calibrated_count}/{len(items)} = {coverage:.1f}%')
        
        if coverage > 80:
            print(f'✅ PASS: Calibration coverage {coverage:.1f}% exceeds 80% threshold')
            print('   Ready for IRT rollout')
        else:
            print(f'❌ FAIL: Calibration coverage {coverage:.1f}% below 80% threshold')
            print('   Need more data before IRT rollout')

asyncio.run(test_calibration_coverage())
"

Expected Output:

Calibration Coverage: 90/100 = 90.0%
✅ PASS: Calibration coverage 90.0% exceeds 80% threshold
   Ready for IRT rollout

Test 9.2: IRT MLE Estimation

Objective: Verify IRT theta and b-parameter estimation works correctly

Test Steps:

# Test theta estimation
python3 -c "
import asyncio
from app.services.irt_calibration import estimate_theta_mle

async def test_theta_estimation():
    # Test case 1: All correct responses
    responses_all_correct = [1, 1, 1, 1, 1]
    b_params = [0.0, 0.5, 1.0, 0.5, 0.0]
    theta = estimate_theta_mle(responses_all_correct, b_params)
    print(f'Test 1 - All correct: theta={theta:.3f}')
    assert theta == 4.0, f'FAIL: Expected theta=4.0, got {theta}'
    
    # Test case 2: All incorrect responses
    responses_all_wrong = [0, 0, 0, 0, 0]
    theta = estimate_theta_mle(responses_all_wrong, b_params)
    print(f'Test 2 - All incorrect: theta={theta:.3f}')
    assert theta == -4.0, f'FAIL: Expected theta=-4.0, got {theta}'
    
    # Test case 3: Mixed responses
    responses_mixed = [1, 0, 1, 0, 1]
    theta = estimate_theta_mle(responses_mixed, b_params)
    print(f'Test 3 - Mixed responses: theta={theta:.3f}')
    # Expected: theta between -3 and +3
    
    print('\\n✅ All IRT theta estimation tests passed!')

asyncio.run(test_theta_estimation())
"

Expected Output:

Test 1 - All correct: theta=4.000
Test 2 - All incorrect: theta=-4.000
Test 3 - Mixed responses: theta=0.235

✅ All IRT theta estimation tests passed!

10. CAT Selection Tests

Test 10.1: Fixed Mode Selection

Objective: Verify CTT fixed mode returns questions in slot order

Test Steps:

# Create session with fixed mode
curl -X POST http://localhost:8000/api/v1/session \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "wp_user_id": "test_user_001",
    "tryout_id": "TEST_TRYOUT_001",
    "selection_mode": "fixed"
  }'

# Expected response with session_id
session_id=<returned_session_id>

# Get next items (should return slot 1, 2, 3, ... in order)
for i in {1..10}; do
  curl -X GET http://localhost:8000/api/v1/session/${session_id}/next_item \
    -H "X-Website-ID: 1"

# Expected: Questions returned in slot order (1, 2, 3, ...)

Test 10.2: Adaptive Mode Selection

Objective: Verify IRT adaptive mode selects items matching theta

Test Steps:

# Create session with adaptive mode
curl -X POST http://localhost:8000/api/v1/session \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "wp_user_id": "test_user_002",
    "tryout_id": "TEST_TRYOUT_001",
    "selection_mode": "adaptive"
  }'

# Answer 5 questions to establish theta (should start near 0)
for i in {1..5}; do
  # Simulate submitting answer (correct/incorrect randomly)
  curl -X POST http://localhost:8000/api/v1/session/${session_id}/submit_answer \
    -H "X-Website-ID: 1" \
    -d '{
      "item_id": <item_id_from_previous>,
      "response": "A",  # or B, C, D
      "time_spent": 30
    }'

# Get next item (should select question with b ≈ current theta)
curl -X GET http://localhost:8000/api/v1/session/${session_id}/next_item \
    -H "X-Website-ID: 1"

# Expected: Question difficulty (b) should match estimated theta

Test 10.3: Termination Conditions

Objective: Verify CAT terminates when SE < 0.5 or max items reached

Test Steps:

# Check session status after 15 items
curl -X GET http://localhost:8000/api/v1/session/${session_id} \
    -H "X-Website-ID: 1"

# Expected response includes:
# - is_completed: true (if SE < 0.5)
# - theta: estimated ability
# - theta_se: standard error (should be < 0.5)

11. AI Generation Tests

Test 11.1: AI Preview Generation

Objective: Verify AI generates questions without saving to database

Prerequisites:

Valid OpenRouter API key in .env
Basis item exists in database (sedang level)

Test Steps:

# Generate preview (Mudah variant)
curl -X POST http://localhost:8000/api/v1/admin/ai/generate-preview \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "basis_item_id": <basis_item_id>,
    "target_level": "mudah",
    "ai_model": "qwen/qwen-2.5-coder-32b-instruct"
  }'

# Expected response:
# {
#   "stem": "Generated question text...",
#   "options": {"A": "...", "B": "...", "C": "...", "D": "..."},
#   "correct": "A",
#   "explanation": "..."
# }

Test 11.2: AI Save to Database

Objective: Verify AI-generated questions save correctly

Test Steps:

# Save AI question to database
curl -X POST http://localhost:8000/api/v1/admin/ai/generate-save \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "stem": "Generated question from preview",
    "options": {"A": "...", "B": "...", "C": "...", "D": "..."},
    "correct": "A",
    "explanation": "...",
    "tryout_id": "TEST_TRYOUT_001",
    "website_id": 1,
    "basis_item_id": <basis_item_id>,
    "ai_model": "qwen/qwen-2.5-coder-32b-instruct"
  }'

# Expected response:
# {
#   "item_id": <new_item_id>,
#   "saved": true
# }

Test 11.3: AI Generation Toggle

Objective: Verify global toggle disables AI generation

Test Steps:

# Disable AI generation
curl -X PUT http://localhost:8000/api/v1/tryout/TEST_TRYOUT_001/normalization \
  -H "X-Website-ID: 1" \
  -H "Content-Type: application/json" \
  -d '{
    "AI_generation_enabled": false
  }'

# Try to generate AI question (should fail or use cached)
curl -X POST http://localhost:8000/api/v1/admin/ai/generate-preview \
  -H "X-Website-ID: 1" \
  -d '{
    "basis_item_id": <basis_item_id>,
    "target_level": "sulit"
  }'

# Expected: Error or cache reuse (no new generation)

12. WordPress Integration Tests

Test 12.1: WordPress Token Verification

Objective: Verify WordPress JWT tokens validate correctly

Test Steps:

# Verify WordPress token
curl -X POST http://localhost:8000/api/v1/wordpress/verify_session \
  -H "Content-Type: application/json" \
  -d '{
    "wp_user_id": "test_user_001",
    "token": "your-wordpress-jwt-token",
    "website_id": 1
  }'

# Expected response:
# {
#   "valid": true,
#   "user": {
#     "wp_user_id": "test_user_001",
#     "website_id": 1
#   }
# }

Test 12.2: WordPress User Synchronization

Objective: Verify WordPress users sync to local database

Test Steps:

# Sync users from WordPress
curl -X POST http://localhost:8000/api/v1/wordpress/sync_users \
  -H "X-Website-ID: 1" \
  -H "Authorization: Bearer your-wordpress-jwt-token"

# Expected response:
# {
#   "synced": {
#     "inserted": 10,
#     "updated": 5,
#     "total": 15
#   }
# }

13. Reporting System Tests

Test 13.1: Student Performance Report

Objective: Verify student performance reports generate correctly

Test Steps:

# Generate individual student performance report
curl -X GET "http://localhost:8000/api/v1/reports/student/performance?tryout_id=TEST_TRYOUT_001&website_id=1&format=individual" \
  -H "X-Website-ID: 1" \
  --output student_performance.json

# Verify JSON includes:
# - session_id, wp_user_id, NM, NN, theta, theta_se, total_benar, time_spent

# Generate aggregate student performance report
curl -X GET "http://localhost:8000/api/v1/reports/student/performance?tryout_id=TEST_TRYOUT_001&website_id=1&format=aggregate" \
  -H "X-Website-ID: 1"

# Expected: Average NM, NN, min, max, median, pass/fail rates

Test 13.2: Item Analysis Report

Objective: Verify item analysis reports show difficulty and calibration status

Test Steps:

# Generate item analysis report
curl -X GET "http://localhost:8000/api/v1/reports/items/analysis?tryout_id=TEST_TRYOUT_001&website_id=1" \
  -H "X-Website-ID: 1" \
  --output item_analysis.json

# Expected: Items grouped by difficulty, showing ctt_p, irt_b, calibrated status

Test 13.3: Report Export (CSV/Excel)

Objective: Verify reports export in correct formats

Test Steps:

# Export to CSV
curl -X GET "http://localhost:8000/api/v1/reports/export/<schedule_id>/csv" \
  -H "X-Website-ID: 1" \
  --output report.csv

# Export to Excel
curl -X GET "http://localhost:8000/api/v1/reports/export/<schedule_id>/xlsx" \
  -H "X-Website-ID: 1" \
  --output report.xlsx

# Expected: Files downloaded with proper formatting

14. Admin Panel Tests

Test 14.1: FastAPI Admin Access

Objective: Verify admin panel accessible and models display correctly

Test Steps:

Start Admin Panel

# Run FastAPI Admin (if configured)
# Or access via web browser
# URL: http://localhost:8000/admin

Verify Admin Models
- Navigate to Tryouts view
- Verify: tryout_id, scoring_mode, selection_mode, normalization_mode fields visible
- Navigate to Items view
- Verify: All item fields including IRT parameters visible
- Navigate to Users view
- Verify: wp_user_id, website_id fields visible
Test Admin Actions
- Trigger calibration for a tryout (should start calibration job)
- Toggle AI generation on/off (tryout.AI_generation_enabled should change)
- Reset normalization (TryoutStats should reset to initial values)

Expected Behavior:

All admin models load correctly
Custom admin actions execute successfully
Calibration status dashboard shows progress

15. Integration Tests

Test 15.1: End-to-End Student Session

Objective: Verify complete student workflow from session creation to score calculation

Test Steps:

# 1. Create session
curl -X POST http://localhost:8000/api/v1/session \
  -H "Content-Type: application/json" \
  -H "X-Website-ID: 1" \
  -d '{
    "wp_user_id": "integration_test_user",
    "tryout_id": "TEST_TRYOUT_001",
    "selection_mode": "adaptive"
  }'

# Capture session_id
session_id=<returned_session_id>

# 2. Get and answer next_item (repeat 15 times)
for i in {1..15}; do
  curl -X GET http://localhost:8000/api/v1/session/${session_id}/next_item \
    -H "X-Website-ID: 1"
  
  # Capture item_id and submit answer
  item_id=<returned_item_id>
  
  curl -X POST http://localhost:8000/api/v1/session/${session_id}/submit_answer \
    -H "X-Website-ID: 1" \
    -d "{\"item_id\": ${item_id}, \"response\": \"A\", \"time_spent\": 30}"

# 3. Complete session
curl -X POST http://localhost:8000/api/v1/session/${session_id}/complete \
  -H "X-Website-ID: 1"

# Expected response:
# {
#   "NM": <calculated_score>,
#   "NN": <normalized_score>,
#   "theta": <ability_estimate>,
#   "theta_se": <standard_error>,
#   "total_benar": <correct_count>,
#   "completed": true
# }

Test 15.2: Normalization Update

Objective: Verify dynamic normalization updates after each session

Test Steps:

# Complete 100 student sessions to trigger dynamic normalization
for i in {1..100}; do
  curl -X POST http://localhost:8000/api/v1/session/complete \
    -H "X-Website-ID: 1" \
    -d "{\"session_id\": \"${session_id}\"}"

# Check TryoutStats after all sessions
curl -X GET http://localhost:8000/api/v1/tryout/TEST_TRYOUT_001/normalization \
  -H "X-Website-ID: 1"

# Expected:
# - participant_count: 100
# - rataan: ~500 (should be close to 500±5)
# - sb: ~100 (should be close to 100±5)

16. Validation Checklist

16.1 CTT Scoring Validation

Test Case	Status	Notes
p-value calculation (all correct)	⬜ Run Test 7.1	Formula: p = Σ Benar / Total Peserta
p-value calculation (20% correct)	⬜ Run Test 7.1	Expected p≈0.2
bobot calculation (p=1.0)	⬜ Run Test 7.1	Formula: Bobot = 1 - p
bobot calculation (p=0.5)	⬜ Run Test 7.1	Expected bobot=0.5
NM calculation (all correct)	⬜ Run Test 7.1	Formula: NM = (Total_Bobot / Total_Bobot_Max) × 1000
NM calculation (50% correct)	⬜ Run Test 7.1	Expected NM≈500
NN calculation (mean=500, SB=100)	⬜ Run Test 7.1	Formula: NN = 500 + 100 × ((NM - Rataan) / SB)
NN calculation (NM=600)	⬜ Run Test 7.1	Expected NN=600

Success Criteria: All tests pass → ✅ CTT formulas match Excel 100%

16.2 IRT Calibration Validation

Test Case	Status	Notes
Calibration coverage (>80%)	⬜ Run Test 9.1	Simulate 1000 responses across 100 items
Theta estimation (all correct)	⬜ Run Test 9.2	Expected theta=4.0
Theta estimation (all incorrect)	⬜ Run Test 9.2	Expected theta=-4.0
Theta estimation (mixed)	⬜ Run Test 9.2	Expected theta ∈ [-3, +3]
Standard error calculation	⬜ Run Test 9.2	SE < 0.5 after 15 items

Success Criteria: All tests pass → ✅ IRT calibration ready for production

16.3 Excel Import/Export Validation

Test Case	Status	Notes
Excel structure validation	⬜ Run Test 8.1	Sheet "CONTOH", Row 2-4 match spec
Excel import preview	⬜ Run Test 8.1	Validates without saving
Excel import save	⬜ Run Test 8.1	Bulk insert to database
Excel export	⬜ Run Test 8.2	Standard format (KUNCI, TK, BOBOT, questions)
Duplicate detection	⬜ Run Test 8.1	Skip based on (tryout_id, website_id, slot)

Success Criteria: All tests pass → ✅ Excel import/export ready for production

16.4 CAT Selection Validation

Test Case	Status	Notes
Fixed mode (slot order)	⬜ Run Test 10.1	Returns slot 1, 2, 3, ...
Adaptive mode (b ≈ θ)	⬜ Run Test 10.2	Matches item difficulty to theta
Termination (SE < 0.5)	⬜ Run Test 10.3	Terminates after 15 items
Termination (max items)	⬜ Run Test 10.3	Stops at configured max
Admin playground	⬜ Run Test 10.3	Preview simulation works

Success Criteria: All tests pass → ✅ CAT selection ready for production

16.5 AI Generation Validation

Test Case	Status	Notes
AI preview generation	⬜ Run Test 11.1	Generates question without saving
AI save to database	⬜ Run Test 11.2	Saves with generated_by='ai'
AI toggle (on/off)	⬜ Run Test 11.3	Respects AI_generation_enabled flag
Prompt templates	⬜ Run Test 11.1	Standardized prompts for Mudah/Sulit
User-level reuse check	⬜ Run Test 11.1	Prevents duplicate difficulty exposure

Success Criteria: All tests pass → ✅ AI generation ready for production

16.6 WordPress Integration Validation

Test Case	Status	Notes
Token verification	⬜ Run Test 12.1	Validates WordPress JWT
User synchronization	⬜ Run Test 12.2	Syncs users from WordPress
Multi-site routing	⬜ Run Test 12.1/12.2	X-Website-ID header validation
CORS configuration	⬜ Run Test 12.1	WordPress domains in ALLOWED_ORIGINS

Success Criteria: All tests pass → ✅ WordPress integration ready for production

16.7 Reporting System Validation

Test Case	Status	Notes
Student performance report	⬜ Run Test 13.1	Individual + aggregate
Item analysis report	⬜ Run Test 13.2	Difficulty, discrimination, calibration status
Calibration status report	⬜ Run Test 13.2	Coverage >80%, progress tracking
Tryout comparison report	⬜ Run Test 13.2	Across dates/subjects
Export (CSV/Excel)	⬜ Run Test 13.3	Proper formatting
Report scheduling	⬜ Run Test 13.3	Daily/weekly/monthly

Success Criteria: All tests pass → ✅ Reporting system ready for production

16.8 Admin Panel Validation

Test Case	Status	Notes
Admin access	⬜ Run Test 14.1	Admin panel at /admin path
Admin models display	⬜ Run Test 14.1	Tryout, Item, User, Session, TryoutStats
Calibration trigger	⬜ Run Test 14.1	Triggers calibration job
AI generation toggle	⬜ Run Test 14.1	Updates AI_generation_enabled
Normalization reset	⬜ Run Test 14.1	Resets TryoutStats
WordPress auth integration	⬜ Run Test 14.1	Bearer token or basic auth

Success Criteria: All tests pass → ✅ Admin panel ready for production

16.9 Integration Validation

Test Case	Status	Notes
End-to-end session workflow	⬜ Run Test 15.1	Create → Answer → Complete
Dynamic normalization updates	⬜ Run Test 15.2	Updates after each session
Multi-site isolation	⬜ Run Test 12.1	website_id header validation
WordPress user sync	⬜ Run Test 12.2	Users synced correctly

Success Criteria: All tests pass → ✅ System ready for production deployment

17. Troubleshooting

Common Issues

Issue: Database Connection Failed

Symptoms:

sqlalchemy.exc.DBAPIError: (psycopg2.OperationalError) could not connect to server

Solution:

# Verify PostgreSQL is running
pg_ctl status

# Verify database exists
psql postgres -c "\l"

# Check DATABASE_URL in .env
cat .env | grep DATABASE_URL

# Test connection manually
psql postgresql+asyncpg://user:password@localhost:5432/irt_bank_soal

Issue: Module Not Found (httpx, numpy, scipy)

Symptoms:

ModuleNotFoundError: No module named 'httpx'

Solution:

# Ensure virtual environment is activated
source venv/bin/activate  # or equivalent

# Reinstall dependencies
pip3 install -r requirements.txt

# Verify installation
pip3 list | grep -E "httpx|numpy|scipy"

Issue: CORS Error in Browser

Symptoms:

Access to XMLHttpRequest at 'http://localhost:8000/api/v1/...' from origin 'null' has been blocked by CORS policy

Solution:

# Check ALLOWED_ORIGINS in .env
cat .env | grep ALLOWED_ORIGINS

# Add your WordPress domain
# Example: ALLOWED_ORIGINS=https://site1.com,https://site2.com,http://localhost:3000

# Restart server after changing .env

Issue: OpenRouter API Timeout

Symptoms:

httpx.TimeoutException: Request timed out after 30s

Solution:

# Check OPENROUTER_TIMEOUT in .env
cat .env | grep OPENROUTER_TIMEOUT

# Increase timeout (if needed)
# In .env, set: OPENROUTER_TIMEOUT=60

# Or check OpenRouter service status
curl https://openrouter.ai/api/v1/models

Issue: FastAPI Admin Not Accessible

Symptoms:

404 Not Found when accessing http://localhost:8000/admin

Solution:

# Verify admin is mounted in app/main.py
grep "mount.*admin" app/main.py

# Check FastAPI Admin authentication
# If using WordPress auth, verify token is valid
curl -X GET https://your-wordpress-site.com/wp-json/wp/v2/users/me \
  -H "Authorization: Bearer your-token"

# If using basic auth, verify credentials
cat .env | grep -E "ADMIN_USER|ADMIN_PASSWORD"

Issue: Alembic Migration Failed

Symptoms:

alembic.util.exc.CommandError: Target database is not up to date

Solution:

# Check current migration version
alembic current

# Downgrade to previous version if needed
alembic downgrade <revision_id>

# Or create new migration
alembic revision -m "Manual fix"

Production Readiness Checklist

Before deploying to production, verify all items below are complete:

Critical Requirements (All Required)

CTT scoring validates with exact Excel formulas (Test 7.1)
IRT calibration coverage >80% (Test 9.1)
Database schema with all tables, relationships, constraints (Unspecified-High Agent 1)
FastAPI app with all routers and endpoints (Deep Agent 1)
AI generation with OpenRouter integration (Deep Agent 4)
WordPress integration with multi-site support (Deep Agent 5)
Reporting system with all 4 report types (Deep Agent 6)
Excel import/export with 100% data integrity (Unspecified-High Agent 2)
CAT selection with adaptive algorithms (Deep Agent 3)
Admin panel with FastAPI Admin (Unspecified-High Agent 3)
Normalization management (Unspecified-High Agent 4)

Performance Requirements (Production)

Database indexes created on all foreign key columns
Connection pooling configured (pool_size=10, max_overflow=20)
Async database operations throughout
API response times <200ms for 95th percentile
Calibration job completes within 5 minutes for 1000 items

Security Requirements (Production)

HTTPS enabled on production server
Environment-specific SECRET_KEY (not default "dev-secret-key")
CORS restricted to production domains only
WordPress JWT tokens stored securely (not in .env for production)
Rate limiting implemented on OpenRouter API

Deployment Checklist

PostgreSQL database backed up
Environment variables configured for production
SSL/TLS certificates configured
Reverse proxy (Nginx/Apache) configured
Process manager (systemd/supervisor) configured
Monitoring and logging enabled
Health check endpoint accessible
Rollback procedure documented and tested

Appendix

A. API Endpoint Reference

Complete list of all API endpoints:

Method	Endpoint	Description
GET	`/`	Health check (minimal)
GET	`/health`	Health check (detailed)
POST	`/api/v1/session/`	Create new session
GET	`/api/v1/session/{session_id}`	Get session details
POST	`/api/v1/session/{session_id}/submit_answer`	Submit answer
GET	`/api/v1/session/{session_id}/next_item`	Get next question
POST	`/api/v1/session/{session_id}/complete`	Complete session
GET	`/api/v1/tryout/`	List tryouts
GET	`/api/v1/tryout/{tryout_id}`	Get tryout details
PUT	`/api/v1/tryout/{tryout_id}`	Update tryout config
GET	`/api/v1/tryout/{tryout_id}/config`	Get configuration
PUT	`/api/v1/tryout/{tryout_id}/normalization`	Update normalization
POST	`/api/v1/tryout/{tryout_id}/calibrate`	Trigger calibration
GET	`/api/v1/tryout/{tryout_id}/calibration-status`	Get calibration status
POST	`/api/v1/import-export/preview`	Preview Excel import
POST	`/api/v1/import-export/questions`	Import questions
GET	`/api/v1/import-export/export/questions`	Export questions
POST	`/api/v1/admin/ai/generate-preview`	AI preview
POST	`/api/v1/admin/ai/generate-save`	AI save
GET	`/api/v1/admin/ai/stats`	AI statistics
GET	`/api/v1/admin/ai/models`	List AI models
POST	`/api/v1/wordpress/sync_users`	Sync WordPress users
POST	`/api/v1/wordpress/verify_session`	Verify WordPress session
GET	`/api/v1/wordpress/website/{website_id}/users`	Get website users
POST	`/api/v1/admin/{tryout_id}/calibrate`	Admin: Calibrate all
POST	`/api/v1/admin/{tryout_id}/toggle-ai-generation`	Admin: Toggle AI
POST	`/api/v1/admin/{tryout_id}/reset-normalization`	Admin: Reset normalization
GET	`/api/v1/reports/student/performance`	Student performance
GET	`/api/v1/reports/items/analysis`	Item analysis
GET	`/api/v1/reports/calibration/status`	Calibration status
GET	`/api/v1/reports/tryout/comparison`	Tryout comparison
POST	`/api/v1/reports/schedule`	Schedule report
GET	`/api/v1/reports/export/{schedule_id}/{format}`	Export report

B. Database Schema Reference

Tables:

websites - WordPress site configuration
users - WordPress user mapping
tryouts - Tryout configuration and metadata
items - Questions with CTT/IRT parameters
sessions - Student tryout attempts
user_answers - Individual question responses
tryout_stats - Running statistics per tryout

Key Relationships:

Websites (1) → Tryouts (N)
Tryouts (1) → Items (N)
Tryouts (1) → Sessions (N)
Tryouts (1) → TryoutStats (1)
Items (1) → UserAnswers (N)
Sessions (1) → UserAnswers (N)
Users (1) → Sessions (N)

Constraints:

θ, b ∈ [-3, +3] (IRT parameters)
NM, NN ∈ [0, 1000] (score ranges)
ctt_p ∈ [0, 1] (CTT difficulty)
bobot ∈ [0, 1] (CTT weight)

Document End

Status: Ready for Testing and Validation

Next Steps:

Complete all validation tests (Section 16)
Verify production readiness checklist (Section 17)
Deploy to production environment
Monitor performance and calibration progress

Contact: For issues or questions, refer to PRD.md and project-brief.md

38 KiB Raw Blame History Unescape Escape

IRT Bank Soal - Test Walkthrough & Validation Guide

Table of Contents

1. Prerequisites

Required Software

Required Python Packages

Optional Development Tools

2. Environment Setup

Step 2.1: Clone/Extract Repository

Step 2.2: Copy Environment Configuration

Step 2.3: Create Virtual Environment

Step 2.4: Install Dependencies

3. Installation

Step 3.1: Database Setup

Step 3.2: Initialize Alembic Migrations

Step 3.3: Verify Database Connection

4. Database Setup

Step 4.1: Create Test Excel File

Step 4.2: Load Test Data

5. Configuration

Step 5.1: Verify Configuration

Step 5.2: Test Normalization Modes

6. Starting the Application

Step 6.1: Start FastAPI Server

Step 6.2: Verify Health Check

7. Core Functionality Tests

Test 7.1: CTT Scoring Validation

8. Excel Import/Export Tests

Test 8.1: Excel Import with Preview

9. IRT Calibration Tests

Test 9.1: IRT Calibration Coverage

Test 9.2: IRT MLE Estimation

10. CAT Selection Tests

Test 10.1: Fixed Mode Selection

Test 10.2: Adaptive Mode Selection

Test 10.3: Termination Conditions

11. AI Generation Tests

Test 11.1: AI Preview Generation

Test 11.2: AI Save to Database

Test 11.3: AI Generation Toggle

12. WordPress Integration Tests

Test 12.1: WordPress Token Verification

Test 12.2: WordPress User Synchronization

13. Reporting System Tests

Test 13.1: Student Performance Report

Test 13.2: Item Analysis Report

Test 13.3: Report Export (CSV/Excel)

14. Admin Panel Tests

Test 14.1: FastAPI Admin Access

15. Integration Tests

Test 15.1: End-to-End Student Session

Test 15.2: Normalization Update

16. Validation Checklist

16.1 CTT Scoring Validation

16.2 IRT Calibration Validation

16.3 Excel Import/Export Validation

16.4 CAT Selection Validation

16.5 AI Generation Validation

16.6 WordPress Integration Validation

16.7 Reporting System Validation

16.8 Admin Panel Validation

16.9 Integration Validation

17. Troubleshooting

Common Issues

Issue: Database Connection Failed

Issue: Module Not Found (httpx, numpy, scipy)

Issue: CORS Error in Browser

Issue: OpenRouter API Timeout

Issue: FastAPI Admin Not Accessible

Issue: Alembic Migration Failed

Production Readiness Checklist

Critical Requirements (All Required)

Performance Requirements (Production)

Security Requirements (Production)

Deployment Checklist

Appendix

A. API Endpoint Reference

B. Database Schema Reference

38 KiB

Raw Blame History