# Yellow Bank Soal Perfection Tasklist Date: 2026-04-29 Purpose: hands-off development guide for hardening the system, improving correctness, and polishing the admin/user experience. ## 1. Security and Auth - [x] Add centralized authentication dependencies for student, website admin, and system admin roles. - [x] Replace raw `X-Website-ID` trust with token-derived website access. - [x] Require authorization on reports, tryout configuration updates, imports, calibration, and session endpoints. - [x] Add session ownership checks using verified WordPress identity. - [x] Add rate limiting for admin login, AI generation, imports, and WordPress verification. - [x] Add admin login rate limiting (IP-based, Redis-backed attempt window). - [x] Add CSRF tokens to all admin POST forms. - [x] Mark admin session cookies `secure` in production. - [x] Fail production startup when default or empty secrets are used. - [x] Add tests proving cross-website access is blocked. - [x] Add token integrity tests (issue/decode, tamper rejection, expiry rejection). ## 2. Session Integrity - [ ] Verify `submit_answer` item belongs to the session's `website_id` and `tryout_id`. - [ ] Prevent answer submission for items not issued by `next_item`. - [ ] Stop returning `correct_answer` during live adaptive sessions. - [ ] Decide whether explanations should be shown only after completion or never during an active session. - [ ] Add duplicate-answer validation before DB commit. - [ ] Make repeated submissions return `409 Conflict` instead of DB errors. - [ ] Validate or auto-create WordPress users before creating sessions. - [ ] Add tests for invalid item IDs, foreign-tryout items, repeated answers, and completed sessions. ## 3. Scoring Correctness - [ ] Revisit CTT `total_bobot_max` logic so earned and max weights use the same item set. - [ ] Define scoring behavior for mixed-level tryouts. - [ ] Confirm whether fixed tryouts should require every item to be answered before completion. - [ ] Add tests for all-correct, all-wrong, partial, mixed-level, missing-bobot, and duplicate-answer cases. - [ ] Add regression tests for static, dynamic, and hybrid normalization switching. - [ ] Confirm NM, NN, theta, and report formulas against PRD examples. - [ ] Add explicit handling for zero/near-zero standard deviation in reporting and normalization. ## 4. Database and Migrations - [ ] Resolve model/migration drift for item uniqueness indexes. - [ ] Decide whether items are unique by `(website_id, tryout_id, slot)` or `(website_id, tryout_id, slot, level)`. - [ ] Align Excel import duplicate detection with the final uniqueness rule. - [ ] Remove production `create_all` startup behavior or gate it to development only. - [ ] Add migration smoke tests for fresh database upgrade to head. - [ ] Add DB constraint tests for FK failures and uniqueness conflicts. - [ ] Create seed/dev fixtures for websites, users, tryouts, items, and sessions. - [ ] Document migration rollback expectations. ## 5. API Reliability - [ ] Standardize error response shape across routers. - [ ] Convert expected DB constraint failures into clear `400`, `404`, or `409` responses. - [ ] Add request size limits for Excel and JSON imports. - [ ] Add structured logging with request IDs. - [ ] Add health checks that distinguish DB, Redis, WordPress, and OpenRouter status. - [ ] Add OpenAPI examples for core workflows. - [ ] Add pagination to list/report endpoints that can grow large. - [ ] Add timeout and retry policy for external service calls. ## 6. Import and Export - [ ] Validate website existence before Excel preview and import. - [ ] Validate tryout existence before Excel question import. - [ ] Add downloadable validation error reports. - [ ] Add import preview diff for new records, skipped duplicates, and updates. - [ ] Clean up generated export temp files after response lifecycle. - [ ] Add tests for malformed Excel, duplicate slots, invalid p-values, invalid bobot values, and missing tryout. - [x] Add tests for JSON snapshot import edge cases. - [ ] Add file size/type hardening beyond extension checks. ## 7. Reporting - [ ] Persist report schedules in the database instead of process memory. - [ ] Add real scheduler/worker execution for scheduled reports. - [ ] Add email delivery or remove recipient fields until delivery is implemented. - [ ] Add report permission checks. - [ ] Add tests for empty reports, partial data, and multi-tryout comparisons. - [ ] Add pagination/export limits for large report datasets. - [ ] Verify `avg_nn`, pass rate, medians, and standard deviations against fixture data. - [ ] Add user-facing messages when report data is incomplete. ## 8. Admin UI and UX - [ ] Add responsive mobile/tablet layout. - [ ] Add active navigation state and breadcrumbs. - [ ] Add pagination, sorting, and search to admin tables. - [ ] Replace destructive browser confirms with safer confirmation modals. - [ ] Add inline validation and success/error banners that persist after redirects. - [ ] Add import progress indicators and clearer preview screens. - [ ] Add empty states with recommended next actions. - [ ] Improve visual hierarchy for dashboard stats and high-risk actions. - [ ] Add accessibility pass: labels, focus states, contrast, keyboard navigation. ## 9. Testing and Tooling - [ ] Add `pyproject.toml` or `pytest.ini` with test config. - [ ] Add pinned dependency lock workflow. - [ ] Add `make test`, `make lint`, `make migrate`, and `make dev` commands. - [ ] Add CI for lint, tests, mapper config, Alembic upgrade, and import smoke tests. - [ ] Add integration tests using a test database. - [ ] Add auth boundary tests for every tenant-scoped endpoint. - [x] Add regression tests for previously found defects. - [ ] Document the canonical local setup path. ## 10. Production Readiness - [ ] Validate required secrets in production startup. - [ ] Document deployment environment variables. - [ ] Add backup and restore guidance for PostgreSQL. - [ ] Add observability: logs, metrics, traces, and error monitoring. - [ ] Add operational runbooks for import failures, calibration failures, WordPress API outages, and AI provider outages. - [ ] Add Redis availability checks when admin or background jobs are enabled. - [ ] Add deployment checklist for migrations, admin credentials, CORS, HTTPS, and rollback. ## Suggested Execution Order 1. Security and auth hardening. 2. Session integrity and scoring correctness. 3. Database/migration alignment. 4. Test and tooling foundation. 5. Import/export and reporting reliability. 6. Admin UI/UX polish. 7. Production readiness and operations. ## Definition of Perfect Enough - [ ] Every tenant-scoped endpoint has an authorization test. - [ ] Every scoring path has deterministic fixture tests. - [ ] Fresh database migration to head succeeds in CI. - [ ] Admin destructive actions are CSRF-protected. - [ ] Live sessions cannot reveal answers before completion. - [ ] Imports fail safely with actionable validation output. - [ ] Reports are reproducible, permissioned, and persisted where scheduled. - [ ] The app can be installed, tested, migrated, and run from documented commands.