Complete Section 1 security/auth hardening

2026-04-30 11:35:56 +07:00
parent 432ffbcdb9
commit 12d2d9458f
15 changed files with 863 additions and 232 deletions
--- a/hands-off.md
+++ b/hands-off.md
@@ -0,0 +1,137 @@
+# Yellow Bank Soal Perfection Tasklist
+
+Date: 2026-04-29  
+Purpose: hands-off development guide for hardening the system, improving correctness, and polishing the admin/user experience.
+
+## 1. Security and Auth
+
+- [x] Add centralized authentication dependencies for student, website admin, and system admin roles.
+- [x] Replace raw `X-Website-ID` trust with token-derived website access.
+- [x] Require authorization on reports, tryout configuration updates, imports, calibration, and session endpoints.
+- [x] Add session ownership checks using verified WordPress identity.
+- [x] Add rate limiting for admin login, AI generation, imports, and WordPress verification.
+- [x] Add admin login rate limiting (IP-based, Redis-backed attempt window).
+- [x] Add CSRF tokens to all admin POST forms.
+- [x] Mark admin session cookies `secure` in production.
+- [x] Fail production startup when default or empty secrets are used.
+- [x] Add tests proving cross-website access is blocked.
+- [x] Add token integrity tests (issue/decode, tamper rejection, expiry rejection).
+
+## 2. Session Integrity
+
+- [ ] Verify `submit_answer` item belongs to the session's `website_id` and `tryout_id`.
+- [ ] Prevent answer submission for items not issued by `next_item`.
+- [ ] Stop returning `correct_answer` during live adaptive sessions.
+- [ ] Decide whether explanations should be shown only after completion or never during an active session.
+- [ ] Add duplicate-answer validation before DB commit.
+- [ ] Make repeated submissions return `409 Conflict` instead of DB errors.
+- [ ] Validate or auto-create WordPress users before creating sessions.
+- [ ] Add tests for invalid item IDs, foreign-tryout items, repeated answers, and completed sessions.
+
+## 3. Scoring Correctness
+
+- [ ] Revisit CTT `total_bobot_max` logic so earned and max weights use the same item set.
+- [ ] Define scoring behavior for mixed-level tryouts.
+- [ ] Confirm whether fixed tryouts should require every item to be answered before completion.
+- [ ] Add tests for all-correct, all-wrong, partial, mixed-level, missing-bobot, and duplicate-answer cases.
+- [ ] Add regression tests for static, dynamic, and hybrid normalization switching.
+- [ ] Confirm NM, NN, theta, and report formulas against PRD examples.
+- [ ] Add explicit handling for zero/near-zero standard deviation in reporting and normalization.
+
+## 4. Database and Migrations
+
+- [ ] Resolve model/migration drift for item uniqueness indexes.
+- [ ] Decide whether items are unique by `(website_id, tryout_id, slot)` or `(website_id, tryout_id, slot, level)`.
+- [ ] Align Excel import duplicate detection with the final uniqueness rule.
+- [ ] Remove production `create_all` startup behavior or gate it to development only.
+- [ ] Add migration smoke tests for fresh database upgrade to head.
+- [ ] Add DB constraint tests for FK failures and uniqueness conflicts.
+- [ ] Create seed/dev fixtures for websites, users, tryouts, items, and sessions.
+- [ ] Document migration rollback expectations.
+
+## 5. API Reliability
+
+- [ ] Standardize error response shape across routers.
+- [ ] Convert expected DB constraint failures into clear `400`, `404`, or `409` responses.
+- [ ] Add request size limits for Excel and JSON imports.
+- [ ] Add structured logging with request IDs.
+- [ ] Add health checks that distinguish DB, Redis, WordPress, and OpenRouter status.
+- [ ] Add OpenAPI examples for core workflows.
+- [ ] Add pagination to list/report endpoints that can grow large.
+- [ ] Add timeout and retry policy for external service calls.
+
+## 6. Import and Export
+
+- [ ] Validate website existence before Excel preview and import.
+- [ ] Validate tryout existence before Excel question import.
+- [ ] Add downloadable validation error reports.
+- [ ] Add import preview diff for new records, skipped duplicates, and updates.
+- [ ] Clean up generated export temp files after response lifecycle.
+- [ ] Add tests for malformed Excel, duplicate slots, invalid p-values, invalid bobot values, and missing tryout.
+- [x] Add tests for JSON snapshot import edge cases.
+- [ ] Add file size/type hardening beyond extension checks.
+
+## 7. Reporting
+
+- [ ] Persist report schedules in the database instead of process memory.
+- [ ] Add real scheduler/worker execution for scheduled reports.
+- [ ] Add email delivery or remove recipient fields until delivery is implemented.
+- [ ] Add report permission checks.
+- [ ] Add tests for empty reports, partial data, and multi-tryout comparisons.
+- [ ] Add pagination/export limits for large report datasets.
+- [ ] Verify `avg_nn`, pass rate, medians, and standard deviations against fixture data.
+- [ ] Add user-facing messages when report data is incomplete.
+
+## 8. Admin UI and UX
+
+- [ ] Add responsive mobile/tablet layout.
+- [ ] Add active navigation state and breadcrumbs.
+- [ ] Add pagination, sorting, and search to admin tables.
+- [ ] Replace destructive browser confirms with safer confirmation modals.
+- [ ] Add inline validation and success/error banners that persist after redirects.
+- [ ] Add import progress indicators and clearer preview screens.
+- [ ] Add empty states with recommended next actions.
+- [ ] Improve visual hierarchy for dashboard stats and high-risk actions.
+- [ ] Add accessibility pass: labels, focus states, contrast, keyboard navigation.
+
+## 9. Testing and Tooling
+
+- [ ] Add `pyproject.toml` or `pytest.ini` with test config.
+- [ ] Add pinned dependency lock workflow.
+- [ ] Add `make test`, `make lint`, `make migrate`, and `make dev` commands.
+- [ ] Add CI for lint, tests, mapper config, Alembic upgrade, and import smoke tests.
+- [ ] Add integration tests using a test database.
+- [ ] Add auth boundary tests for every tenant-scoped endpoint.
+- [x] Add regression tests for previously found defects.
+- [ ] Document the canonical local setup path.
+
+## 10. Production Readiness
+
+- [ ] Validate required secrets in production startup.
+- [ ] Document deployment environment variables.
+- [ ] Add backup and restore guidance for PostgreSQL.
+- [ ] Add observability: logs, metrics, traces, and error monitoring.
+- [ ] Add operational runbooks for import failures, calibration failures, WordPress API outages, and AI provider outages.
+- [ ] Add Redis availability checks when admin or background jobs are enabled.
+- [ ] Add deployment checklist for migrations, admin credentials, CORS, HTTPS, and rollback.
+
+## Suggested Execution Order
+
+1. Security and auth hardening.
+2. Session integrity and scoring correctness.
+3. Database/migration alignment.
+4. Test and tooling foundation.
+5. Import/export and reporting reliability.
+6. Admin UI/UX polish.
+7. Production readiness and operations.
+
+## Definition of Perfect Enough
+
+- [ ] Every tenant-scoped endpoint has an authorization test.
+- [ ] Every scoring path has deterministic fixture tests.
+- [ ] Fresh database migration to head succeeds in CI.
+- [ ] Admin destructive actions are CSRF-protected.
+- [ ] Live sessions cannot reveal answers before completion.
+- [ ] Imports fail safely with actionable validation output.
+- [ ] Reports are reproducible, permissioned, and persisted where scheduled.
+- [ ] The app can be installed, tested, migrated, and run from documented commands.