yellow-bank-soal/PROJECT_UNDERSTANDING.md

# Project Understanding: IRT-Powered Adaptive Question Bank System

> **Project Name:** IRT Bank Soal
> **Version:** 1.0.0
> **Last Updated:** 2026-06-15
> **Repository:** https://git.backoffice.biz.id/dwindown/yellow-bank-soal

---

## Table of Contents

1. [Executive Summary](#executive-summary)
2. [Project Purpose](#project-purpose)
3. [Tech Stack](#tech-stack)
4. [Project Structure](#project-structure)
5. [Core Concepts](#core-concepts)
6. [Data Models](#data-models)
7. [API Endpoints](#api-endpoints)
8. [Key Services](#key-services)
9. [Scoring Formulas](#scoring-formulas)
10. [Configuration](#configuration)
11. [Workflows](#workflows)
12. [Deployment](#deployment)

---

## Executive Summary

This is a **FastAPI-based backend system** for managing adaptive assessment/tryout exams with sophisticated scoring capabilities. The system supports both **Classical Test Theory (CTT)** and **Item Response Theory (IRT)** scoring methods, with multi-website support for WordPress integration.

### Key Features

| Feature | Description |
|---------|-------------|
| **CTT Scoring** | Classical Test Theory with exact Excel formula compatibility |
| **IRT Support** | Item Response Theory (1PL Rasch model) for adaptive testing |
| **Multi-Site** | Single backend serving multiple WordPress sites |
| **AI Generation** | Automatic question variant generation via OpenRouter |
| **Excel Import/Export** | Bulk import/export questions from Excel files |
| **Adaptive Testing** | Computer Adaptive Testing (CAT) with theta estimation |
| **Normalization** | Static, dynamic, or hybrid score normalization |

---

## Project Purpose

The system replaces traditional fixed-difficulty exams with an **adaptive question bank** that:

1. **Measures student ability accurately** using IRT theta estimation
2. **Provides comparable scores** across different exam sessions via normalization
3. **Generates new questions** using AI when needed
4. **Integrates with WordPress** LMS platforms for student access
5. **Reduces exam fraud** by delivering different question variants to each student

---

## Tech Stack

### Core Technologies

```
Framework:       FastAPI >= 0.104.1
Server:          Uvicorn >= 0.24.0
Database:        PostgreSQL + SQLAlchemy 2.0 (async)
ORM:             SQLAlchemy >= 2.0.23
Driver:          asyncpg >= 0.29.0
Migrations:      Alembic >= 1.13.0
Validation:      Pydantic >= 2.5.0
```

### Data Processing

```
Excel:           openpyxl >= 3.1.2, pandas >= 2.1.4
Math/Science:    numpy >= 1.26.2, scipy >= 1.11.4
```

### External Integrations

```
AI:              OpenAI >= 1.6.1 (OpenRouter API)
Task Queue:      Celery >= 5.3.6, Redis >= 5.0.1
Admin Panel:     FastAPI-Admin >= 1.0.0
```

---

## Project Structure

```
yellow-bank-soal/
├── app/
│   ├── __init__.py
│   ├── main.py              # FastAPI app entry point
│   ├── admin.py             # FastAPI Admin configuration
│   ├── admin_web.py         # Admin web interface
│   ├── database.py          # Database configuration & session
│   │
│   ├── api/
│   │   └── v1/
│   │       ├── __init__.py
│   │       └── session.py   # Adaptive session endpoints
│   │
│   ├── core/
│   │   ├── __init__.py
│   │   ├── auth.py          # Authentication & authorization
│   │   ├── config.py        # Settings from environment
│   │   └── rate_limit.py    # Rate limiting
│   │
│   ├── models/
│   │   ├── __init__.py
│   │   ├── ai_generation_run.py
│   │   ├── item.py          # Question items
│   │   ├── report_schedule.py
│   │   ├── session.py       # Student tryout sessions
│   │   ├── tryout.py        # Tryout configurations
│   │   ├── tryout_import_snapshot.py
│   │   ├── tryout_snapshot_question.py
│   │   ├── tryout_stats.py  # Normalization statistics
│   │   ├── user.py
│   │   ├── user_answer.py   # Student responses
│   │   └── website.py
│   │
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── admin.py         # Admin-only endpoints
│   │   ├── ai.py            # AI generation endpoints
│   │   ├── import_export.py # Excel import/export
│   │   ├── reports.py       # Report generation
│   │   ├── sessions.py      # Session management
│   │   ├── tryouts.py       # Tryout configuration
│   │   └── wordpress.py     # WordPress integration
│   │
│   ├── schemas/             # Pydantic request/response models
│   │   ├── __init__.py
│   │   ├── ai.py
│   │   ├── report.py
│   │   ├── session.py
│   │   ├── tryout.py
│   │   └── wordpress.py
│   │
│   └── services/
│       ├── __init__.py
│       ├── ai_generation.py # OpenRouter integration
│       ├── cat_selection.py # Computer Adaptive Testing
│       ├── config_management.py
│       ├── ctt_scoring.py   # CTT scoring engine
│       ├── excel_import.py  # Excel parsing
│       ├── irt_calibration.py # IRT calibration
│       ├── normalization.py
│       ├── reporting.py
│       ├── tryout_json_import.py
│       └── wordpress_auth.py
│
├── alembic/                 # Database migrations
│   ├── env.py
│   ├── script.py.mako
│   └── versions/
│
├── tests/                   # Unit & integration tests
│   ├── test_auth_scope.py
│   ├── test_auth_tokens.py
│   ├── test_model_mappings.py
│   ├── test_normalization.py
│   ├── test_operational_hardening.py
│   ├── test_route_wiring.py
│   ├── test_security_regressions.py
│   └── test_tryout_json_import.py
│
├── requirements.txt
├── alembic.ini
├── irt_1pl_mle.py          # Standalone IRT MLE script
├── PRD.md                  # Product Requirements Document
├── project-brief.md        # Technical specification
└── handoff.md             # Project handoff context
```

---

## Core Concepts

### 1. Tryout (Exam)

A **Tryout** represents a complete exam/test with configurable behavior:

```python
scoring_mode:       "ctt" | "irt" | "hybrid"
selection_mode:     "fixed" | "adaptive" | "hybrid"
normalization_mode: "static" | "dynamic" | "hybrid"
```

### 2. Item (Question)

An **Item** represents a single question with:

- **Content**: stem (question text), options (A/B/C/D), correct_answer
- **CTT Parameters**: p-value (difficulty), bobot (weight)
- **IRT Parameters**: b (difficulty), se (standard error)
- **Metadata**: slot position, difficulty level, AI generation info

### 3. Session (Student Attempt)

A **Session** tracks a student's attempt:

- Links student (`wp_user_id`) to a Tryout
- Records all answers via `UserAnswer` records
- Stores computed scores: NM, NN, theta

### 4. Website (Multi-Tenant)

The system supports **multiple WordPress websites** from a single backend:

- Each website has isolated data
- Authenticated via `X-Website-ID` header
- WordPress JWT tokens for authentication

---

## Data Models

### Entity Relationship Diagram

```mermaid
erDiagram
    Website ||--o{ Tryout : "hosts"
    Website ||--o{ User : "contains"
    Website ||--o{ Session : "serves"
    Website ||--o{ Item : "contains"

    Tryout ||--o{ Item : "contains"
    Tryout ||--o{ Session : "has"
    Tryout ||--o{ TryoutStats : "tracks"

    Session ||--o{ UserAnswer : "contains"
    Session ||--o{ User : "belongs to"

    Item ||--o{ UserAnswer : "answered by"
    Item ||--o{ Item : "has variants"

    AIGenerationRun ||--o{ Item : "generates"
```

### Model Summary

| Model | Purpose | Key Fields |
|-------|---------|------------|
| `Website` | Multi-tenant isolation | domain, wordpress_url |
| `User` | WordPress user mapping | wp_user_id, website_id |
| `Tryout` | Exam configuration | scoring_mode, selection_mode, normalization_mode |
| `Item` | Question | stem, options, ctt_p, ctt_bobot, irt_b, irt_se |
| `Session` | Student attempt | session_id, NM, NN, theta |
| `UserAnswer` | Single response | response, is_correct, bobot_earned |
| `TryoutStats` | Normalization data | participant_count, rataan, sb |
| `AIGenerationRun` | AI generation batch | model, status, items_generated |

---

## API Endpoints

### Public API (via `/api/v1`)

| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/tryout/{tryout_id}/config` | Get tryout configuration |
| `PUT` | `/tryout/{tryout_id}/normalization` | Update normalization settings |
| `GET` | `/tryout/` | List tryouts for website |
| `GET` | `/tryout/{tryout_id}/calibration-status` | Get IRT calibration status |
| `POST` | `/tryout/{tryout_id}/calibrate` | Trigger IRT calibration |
| `POST` | `/session/` | Create new session |
| `GET` | `/session/{session_id}` | Get session details |
| `POST` | `/session/{session_id}/complete` | Submit answers, calculate scores |

### Admin API (requires admin role)

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/ai/generate` | Generate AI questions |
| `POST` | `/import/excel` | Import questions from Excel |
| `GET` | `/export/excel/{tryout_id}` | Export questions to Excel |
| `GET` | `/reports/*` | Generate various reports |

### Adaptive Session API (via `/api/v1/session`)

| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/adaptive/start` | Start adaptive session |
| `POST` | `/adaptive/respond` | Submit answer, get next item |
| `POST` | `/adaptive/complete` | Complete adaptive session |

---

## Key Services

### 1. CTT Scoring Engine (`ctt_scoring.py`)

Implements Classical Test Theory scoring with exact Excel formulas.

**Key Functions:**
- `calculate_ctt_p()` - Difficulty: p = Σ Benar / Total Peserta
- `calculate_ctt_bobot()` - Weight: Bobot = 1 - p
- `calculate_ctt_nm()` - Raw Score: NM = (Total_Bobot / Total_Bobot_Max) × 1000
- `calculate_ctt_nn()` - Normalized: NN = 500 + 100 × ((NM - Rataan) / SB)
- `categorize_difficulty()` - Categorize by p-value
- `update_tryout_stats()` - Incrementally update normalization stats

### 2. IRT Calibration (`irt_calibration.py`)

Implements Item Response Theory (1PL Rasch model) for adaptive testing.

**Key Functions:**
- `estimate_theta_mle()` - MLE theta estimation for students
- `estimate_b()` - IRT difficulty calibration for items
- `calibrate_item()` - Calibrate single item from response data
- `calibrate_all()` - Batch calibrate all items in tryout
- `calculate_fisher_information()` - Fisher information for item selection

**Parameters:**
- θ (theta): Student ability [-3, +3]
- b: Item difficulty [-3, +3]
- Probability: P(θ) = 1 / (1 + exp(-(θ - b)))

### 3. AI Generation (`ai_generation.py`)

Generates question variants using OpenRouter API.

**Key Functions:**
- `generate_question()` - Generate single question via OpenRouter
- `generate_questions_batch()` - Generate multiple questions
- `save_ai_question()` - Save generated question to database
- `check_cache_reuse()` - Check for reusable similar questions

**Models Supported:**
- Qwen 2.5 32B (balanced)
- Mistral Small (low cost)
- Llama 3.3 70B (premium)

### 4. Excel Import/Export (`excel_import.py`)

Bulk import/export questions from Excel files.

**Key Functions:**
- `parse_excel_import()` - Parse Excel file to items
- `bulk_insert_items()` - Insert parsed items to database
- `export_questions_to_excel()` - Export tryout to Excel

### 5. CAT Selection (`cat_selection.py`)

Computer Adaptive Testing item selection algorithm.

**Key Functions:**
- `select_next_item()` - Select next item based on theta estimate
- `calculate_theta_update()` - Update theta after response
- `check_termination()` - Check if test should end

---

## Scoring Formulas

### CTT (Classical Test Theory)

Based on exact client Excel formulas:

```python
# STEP 1: Tingkat Kesukaran (p-value)
p = Σ Benar / Total Peserta

# STEP 2: Bobot (Weight)
Bobot = 1 - p

# STEP 3: Total Benar per Siswa
Total_Benar = count of correct answers

# STEP 4: Total Bobot Earned per Siswa
Total_Bobot_Siswa = Σ Bobot for each correct answer

# STEP 5: Nilai Mentah (Raw Score)
NM = (Total_Bobot_Siswa / Total_Bobot_Max) × 1000

# STEP 6: Nilai Nasional (Normalized Score)
NN = 500 + 100 × ((NM - Rataan) / SB)
```

### IRT (Item Response Theory)

1PL Rasch Model:

```python
# Probability of correct response
P(θ, b) = 1 / (1 + exp(-(θ - b)))

# Log-likelihood for MLE
LL = Σ [u_i × log(P) + (1-u_i) × log(1-P)]

# Theta estimation via MLE
θ_mle = argmax_θ LL(θ)
```

### Difficulty Categories (CTT Standard)

| p-value | Category | Description |
|---------|----------|-------------|
| p < 0.30 | Sulit | Difficult |
| 0.30 ≤ p ≤ 0.70 | Sedang | Medium |
| p > 0.70 | Mudah | Easy |

---

## Configuration

### Environment Variables

```bash
# Database
DATABASE_URL=postgresql+asyncpg://postgres:postgres@localhost:5432/irt_bank_soal

# FastAPI
SECRET_KEY=your-secret-key-here
ENVIRONMENT=development  # development, staging, production
ENABLE_ADMIN=true
ADMIN_USERNAME=admin
ADMIN_PASSWORD=your-password

# OpenRouter (AI)
OPENROUTER_API_KEY=sk-or-v1-xxx
OPENROUTER_MODEL_QWEN=qwen/qwen2.5-32b-instruct
OPENROUTER_MODEL_CHEAP=mistralai/mistral-small-2603
OPENROUTER_MODEL_LLAMA=meta-llama/llama-3.3-70b-instruct

# Redis/Celery
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/0

# CORS
ALLOWED_ORIGINS=http://localhost:3000,https://yourdomain.com
```

### Tryout Configuration Options

```python
# Scoring Mode
scoring_mode = "ctt"        # Classical Test Theory
scoring_mode = "irt"        # Item Response Theory
scoring_mode = "hybrid"     # Both (IRT for calibration, CTT for scoring)

# Selection Mode
selection_mode = "fixed"    # Fixed order questions
selection_mode = "adaptive" # Computer Adaptive Testing
selection_mode = "hybrid"   # Start fixed, switch to adaptive

# Normalization Mode
normalization_mode = "static"   # Use hardcoded rataan/sb
normalization_mode = "dynamic"  # Calculate from participant data
normalization_mode = "hybrid"   # Dynamic when sufficient data
```

---

## Workflows

### 1. Student Taking a Tryout

```mermaid
sequenceDiagram
    participant S as Student
    participant API as FastAPI
    participant WP as WordPress

    S->>API: POST /session/ (start session)
    API-->>S: session_id

    loop For each question
        S->>API: GET /session/{id}/next-item
        API-->>S: Question data

        S->>API: POST /session/{id}/answer
        API-->>S: Next question or completion
    end

    S->>API: POST /session/{id}/complete
    API-->>S: NM, NN scores
```

### 2. Admin Importing Questions

```mermaid
flowchart TD
    A[Upload Excel File] --> B[Parse Excel]
    B --> C{Validate Structure}
    C -->|Invalid| D[Return Error]
    C -->|Valid| E[Calculate CTT p & bobot]
    E --> F[Bulk Insert Items]
    F --> G[Commit to Database]
    G --> H[Return Import Summary]
```

### 3. AI Question Generation

```mermaid
flowchart TD
    A[Request Generation] --> B{Check Cache}
    B -->|Found similar| C[Return Cached]
    B -->|Not found| D[Call OpenRouter API]
    D --> E{Parse Response}
    E -->|Parse Error| F[Return Error]
    E -->|Success| G[Save to Database]
    G --> H[Return Generated Item]
```

### 4. IRT Calibration

```mermaid
flowchart TD
    A[Collect Responses] --> B{Enough Data?}
    B -->|No| C[Wait for more]
    B -->|Yes| D[For each Item]
    D --> E[Get Response Matrix]
    E --> F[Estimate b via MLE]
    F --> G[Calculate Standard Error]
    G --> H[Update Item]
    H --> D
    D --> I[Mark Items Calibrated]
```

---

## Deployment

### Requirements

- Python 3.10+
- PostgreSQL 14+
- Redis 6+ (for Celery)
- Nginx (reverse proxy)
- aaPanel with Python Manager (recommended)

### Running the Application

```bash
# Install dependencies
pip install -r requirements.txt

# Run migrations
alembic upgrade head

# Start server
uvicorn app.main:app --host 0.0.0.0 --port 8000

# Or with reload (development)
uvicorn app.main:app --reload
```

### Running Tests

```bash
pytest tests/ -v
```

### API Documentation

- Swagger UI: `http://localhost:8000/docs`
- ReDoc: `http://localhost:8000/redoc`
- OpenAPI JSON: `http://localhost:8000/openapi.json`

---

## Security Considerations

### Authentication

- WordPress JWT tokens for user authentication
- `X-Website-ID` header for multi-tenant isolation
- Admin routes protected by admin role check

### Production Hardening

1. **SECRET_KEY** must be set to a strong, unique value
2. **ADMIN_PASSWORD** must not be the default
3. **CORS** origins should be explicitly configured
4. **Database** connections should use SSL in production
5. **Rate limiting** enabled for AI generation endpoints

---

## Glossary

| Term | Definition |
|------|------------|
| **Tryout** | An exam/test assessment |
| **Item** | A single question in a tryout |
| **Session** | A student's attempt at a tryout |
| **CTT** | Classical Test Theory - traditional scoring |
| **IRT** | Item Response Theory - modern adaptive scoring |
| **NM** | Nilai Mentah - raw score [0-1000] |
| **NN** | Nilai Nasional - normalized score [0-1000] |
| **θ (theta)** | IRT ability estimate [-3 to +3] |
| **b** | IRT item difficulty [-3 to +3] |
| **p-value** | CTT proportion correct [0 to 1] |
| **Bobot** | CTT weight (1 - p) |
| **Rataan** | Mean (Indonesian) |
| **SB** | Simpangan Baku - Standard Deviation |
| **CAT** | Computer Adaptive Testing |
| **MLE** | Maximum Likelihood Estimation |

---

## References

- [PRD.md](./PRD.md) - Complete Product Requirements Document
- [project-brief.md](./project-brief.md) - Original technical specification
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [SQLAlchemy 2.0](https://docs.sqlalchemy.org/en/20/)
- [Item Response Theory](https://en.wikipedia.org/wiki/Item_response_theory)