yellow-bank-soal/hands-off.md

# Yellow Bank Soal Perfection Tasklist

Date: 2026-04-29
Purpose: hands-off development guide for hardening the system, improving correctness, and polishing the admin/user experience.

## 1. Security and Auth

- [x] Add centralized authentication dependencies for student, website admin, and system admin roles.
- [x] Replace raw `X-Website-ID` trust with token-derived website access.
- [x] Require authorization on reports, tryout configuration updates, imports, calibration, and session endpoints.
- [x] Add session ownership checks using verified WordPress identity.
- [x] Add rate limiting for admin login, AI generation, imports, and WordPress verification.
- [x] Add admin login rate limiting (IP-based, Redis-backed attempt window).
- [x] Add CSRF tokens to all admin POST forms.
- [x] Mark admin session cookies `secure` in production.
- [x] Fail production startup when default or empty secrets are used.
- [x] Add tests proving cross-website access is blocked.
- [x] Add token integrity tests (issue/decode, tamper rejection, expiry rejection).

## 2. Session Integrity

- [ ] Verify `submit_answer` item belongs to the session's `website_id` and `tryout_id`.
- [ ] Prevent answer submission for items not issued by `next_item`.
- [ ] Stop returning `correct_answer` during live adaptive sessions.
- [ ] Decide whether explanations should be shown only after completion or never during an active session.
- [ ] Add duplicate-answer validation before DB commit.
- [ ] Make repeated submissions return `409 Conflict` instead of DB errors.
- [ ] Validate or auto-create WordPress users before creating sessions.
- [ ] Add tests for invalid item IDs, foreign-tryout items, repeated answers, and completed sessions.

## 3. Scoring Correctness

- [ ] Revisit CTT `total_bobot_max` logic so earned and max weights use the same item set.
- [ ] Define scoring behavior for mixed-level tryouts.
- [ ] Confirm whether fixed tryouts should require every item to be answered before completion.
- [ ] Add tests for all-correct, all-wrong, partial, mixed-level, missing-bobot, and duplicate-answer cases.
- [ ] Add regression tests for static, dynamic, and hybrid normalization switching.
- [ ] Confirm NM, NN, theta, and report formulas against PRD examples.
- [ ] Add explicit handling for zero/near-zero standard deviation in reporting and normalization.

## 4. Database and Migrations

- [ ] Resolve model/migration drift for item uniqueness indexes.
- [ ] Decide whether items are unique by `(website_id, tryout_id, slot)` or `(website_id, tryout_id, slot, level)`.
- [ ] Align Excel import duplicate detection with the final uniqueness rule.
- [ ] Remove production `create_all` startup behavior or gate it to development only.
- [ ] Add migration smoke tests for fresh database upgrade to head.
- [ ] Add DB constraint tests for FK failures and uniqueness conflicts.
- [ ] Create seed/dev fixtures for websites, users, tryouts, items, and sessions.
- [ ] Document migration rollback expectations.

## 5. API Reliability

- [ ] Standardize error response shape across routers.
- [ ] Convert expected DB constraint failures into clear `400`, `404`, or `409` responses.
- [ ] Add request size limits for Excel and JSON imports.
- [ ] Add structured logging with request IDs.
- [ ] Add health checks that distinguish DB, Redis, WordPress, and OpenRouter status.
- [ ] Add OpenAPI examples for core workflows.
- [ ] Add pagination to list/report endpoints that can grow large.
- [ ] Add timeout and retry policy for external service calls.

## 6. Import and Export

- [ ] Validate website existence before Excel preview and import.
- [ ] Validate tryout existence before Excel question import.
- [ ] Add downloadable validation error reports.
- [ ] Add import preview diff for new records, skipped duplicates, and updates.
- [ ] Clean up generated export temp files after response lifecycle.
- [ ] Add tests for malformed Excel, duplicate slots, invalid p-values, invalid bobot values, and missing tryout.
- [x] Add tests for JSON snapshot import edge cases.
- [ ] Add file size/type hardening beyond extension checks.

## 7. Reporting

- [ ] Persist report schedules in the database instead of process memory.
- [ ] Add real scheduler/worker execution for scheduled reports.
- [ ] Add email delivery or remove recipient fields until delivery is implemented.
- [ ] Add report permission checks.
- [ ] Add tests for empty reports, partial data, and multi-tryout comparisons.
- [ ] Add pagination/export limits for large report datasets.
- [ ] Verify `avg_nn`, pass rate, medians, and standard deviations against fixture data.
- [ ] Add user-facing messages when report data is incomplete.

## 8. Admin UI and UX

- [ ] Add responsive mobile/tablet layout.
- [ ] Add active navigation state and breadcrumbs.
- [ ] Add pagination, sorting, and search to admin tables.
- [ ] Replace destructive browser confirms with safer confirmation modals.
- [ ] Add inline validation and success/error banners that persist after redirects.
- [ ] Add import progress indicators and clearer preview screens.
- [ ] Add empty states with recommended next actions.
- [ ] Improve visual hierarchy for dashboard stats and high-risk actions.
- [ ] Add accessibility pass: labels, focus states, contrast, keyboard navigation.

## 9. Testing and Tooling

- [ ] Add `pyproject.toml` or `pytest.ini` with test config.
- [ ] Add pinned dependency lock workflow.
- [ ] Add `make test`, `make lint`, `make migrate`, and `make dev` commands.
- [ ] Add CI for lint, tests, mapper config, Alembic upgrade, and import smoke tests.
- [ ] Add integration tests using a test database.
- [ ] Add auth boundary tests for every tenant-scoped endpoint.
- [x] Add regression tests for previously found defects.
- [ ] Document the canonical local setup path.

## 10. Production Readiness

- [ ] Validate required secrets in production startup.
- [ ] Document deployment environment variables.
- [ ] Add backup and restore guidance for PostgreSQL.
- [ ] Add observability: logs, metrics, traces, and error monitoring.
- [ ] Add operational runbooks for import failures, calibration failures, WordPress API outages, and AI provider outages.
- [ ] Add Redis availability checks when admin or background jobs are enabled.
- [ ] Add deployment checklist for migrations, admin credentials, CORS, HTTPS, and rollback.

## Suggested Execution Order

1. Security and auth hardening.
2. Session integrity and scoring correctness.
3. Database/migration alignment.
4. Test and tooling foundation.
5. Import/export and reporting reliability.
6. Admin UI/UX polish.
7. Production readiness and operations.

## Definition of Perfect Enough

- [ ] Every tenant-scoped endpoint has an authorization test.
- [ ] Every scoring path has deterministic fixture tests.
- [ ] Fresh database migration to head succeeds in CI.
- [ ] Admin destructive actions are CSRF-protected.
- [ ] Live sessions cannot reveal answers before completion.
- [ ] Imports fail safely with actionable validation output.
- [ ] Reports are reproducible, permissioned, and persisted where scheduled.
- [ ] The app can be installed, tested, migrated, and run from documented commands.