Files
yellow-bank-soal/hands-off.md
2026-04-30 11:35:56 +07:00

7.0 KiB

Yellow Bank Soal Perfection Tasklist

Date: 2026-04-29
Purpose: hands-off development guide for hardening the system, improving correctness, and polishing the admin/user experience.

1. Security and Auth

  • Add centralized authentication dependencies for student, website admin, and system admin roles.
  • Replace raw X-Website-ID trust with token-derived website access.
  • Require authorization on reports, tryout configuration updates, imports, calibration, and session endpoints.
  • Add session ownership checks using verified WordPress identity.
  • Add rate limiting for admin login, AI generation, imports, and WordPress verification.
  • Add admin login rate limiting (IP-based, Redis-backed attempt window).
  • Add CSRF tokens to all admin POST forms.
  • Mark admin session cookies secure in production.
  • Fail production startup when default or empty secrets are used.
  • Add tests proving cross-website access is blocked.
  • Add token integrity tests (issue/decode, tamper rejection, expiry rejection).

2. Session Integrity

  • Verify submit_answer item belongs to the session's website_id and tryout_id.
  • Prevent answer submission for items not issued by next_item.
  • Stop returning correct_answer during live adaptive sessions.
  • Decide whether explanations should be shown only after completion or never during an active session.
  • Add duplicate-answer validation before DB commit.
  • Make repeated submissions return 409 Conflict instead of DB errors.
  • Validate or auto-create WordPress users before creating sessions.
  • Add tests for invalid item IDs, foreign-tryout items, repeated answers, and completed sessions.

3. Scoring Correctness

  • Revisit CTT total_bobot_max logic so earned and max weights use the same item set.
  • Define scoring behavior for mixed-level tryouts.
  • Confirm whether fixed tryouts should require every item to be answered before completion.
  • Add tests for all-correct, all-wrong, partial, mixed-level, missing-bobot, and duplicate-answer cases.
  • Add regression tests for static, dynamic, and hybrid normalization switching.
  • Confirm NM, NN, theta, and report formulas against PRD examples.
  • Add explicit handling for zero/near-zero standard deviation in reporting and normalization.

4. Database and Migrations

  • Resolve model/migration drift for item uniqueness indexes.
  • Decide whether items are unique by (website_id, tryout_id, slot) or (website_id, tryout_id, slot, level).
  • Align Excel import duplicate detection with the final uniqueness rule.
  • Remove production create_all startup behavior or gate it to development only.
  • Add migration smoke tests for fresh database upgrade to head.
  • Add DB constraint tests for FK failures and uniqueness conflicts.
  • Create seed/dev fixtures for websites, users, tryouts, items, and sessions.
  • Document migration rollback expectations.

5. API Reliability

  • Standardize error response shape across routers.
  • Convert expected DB constraint failures into clear 400, 404, or 409 responses.
  • Add request size limits for Excel and JSON imports.
  • Add structured logging with request IDs.
  • Add health checks that distinguish DB, Redis, WordPress, and OpenRouter status.
  • Add OpenAPI examples for core workflows.
  • Add pagination to list/report endpoints that can grow large.
  • Add timeout and retry policy for external service calls.

6. Import and Export

  • Validate website existence before Excel preview and import.
  • Validate tryout existence before Excel question import.
  • Add downloadable validation error reports.
  • Add import preview diff for new records, skipped duplicates, and updates.
  • Clean up generated export temp files after response lifecycle.
  • Add tests for malformed Excel, duplicate slots, invalid p-values, invalid bobot values, and missing tryout.
  • Add tests for JSON snapshot import edge cases.
  • Add file size/type hardening beyond extension checks.

7. Reporting

  • Persist report schedules in the database instead of process memory.
  • Add real scheduler/worker execution for scheduled reports.
  • Add email delivery or remove recipient fields until delivery is implemented.
  • Add report permission checks.
  • Add tests for empty reports, partial data, and multi-tryout comparisons.
  • Add pagination/export limits for large report datasets.
  • Verify avg_nn, pass rate, medians, and standard deviations against fixture data.
  • Add user-facing messages when report data is incomplete.

8. Admin UI and UX

  • Add responsive mobile/tablet layout.
  • Add active navigation state and breadcrumbs.
  • Add pagination, sorting, and search to admin tables.
  • Replace destructive browser confirms with safer confirmation modals.
  • Add inline validation and success/error banners that persist after redirects.
  • Add import progress indicators and clearer preview screens.
  • Add empty states with recommended next actions.
  • Improve visual hierarchy for dashboard stats and high-risk actions.
  • Add accessibility pass: labels, focus states, contrast, keyboard navigation.

9. Testing and Tooling

  • Add pyproject.toml or pytest.ini with test config.
  • Add pinned dependency lock workflow.
  • Add make test, make lint, make migrate, and make dev commands.
  • Add CI for lint, tests, mapper config, Alembic upgrade, and import smoke tests.
  • Add integration tests using a test database.
  • Add auth boundary tests for every tenant-scoped endpoint.
  • Add regression tests for previously found defects.
  • Document the canonical local setup path.

10. Production Readiness

  • Validate required secrets in production startup.
  • Document deployment environment variables.
  • Add backup and restore guidance for PostgreSQL.
  • Add observability: logs, metrics, traces, and error monitoring.
  • Add operational runbooks for import failures, calibration failures, WordPress API outages, and AI provider outages.
  • Add Redis availability checks when admin or background jobs are enabled.
  • Add deployment checklist for migrations, admin credentials, CORS, HTTPS, and rollback.

Suggested Execution Order

  1. Security and auth hardening.
  2. Session integrity and scoring correctness.
  3. Database/migration alignment.
  4. Test and tooling foundation.
  5. Import/export and reporting reliability.
  6. Admin UI/UX polish.
  7. Production readiness and operations.

Definition of Perfect Enough

  • Every tenant-scoped endpoint has an authorization test.
  • Every scoring path has deterministic fixture tests.
  • Fresh database migration to head succeeds in CI.
  • Admin destructive actions are CSRF-protected.
  • Live sessions cannot reveal answers before completion.
  • Imports fail safely with actionable validation output.
  • Reports are reproducible, permissioned, and persisted where scheduled.
  • The app can be installed, tested, migrated, and run from documented commands.