Files
wp-agentic-writer/docs/architecture/PLUGIN_AUDIT_REPORT_2026-05-22.md

23 KiB

WP Agentic Writer Plugin Audit Report

Status: COMPLETE / SUPERSEDED
Completion marker date: 2026-05-24
Follow-up trace audit: docs/architecture/PLUGIN_AUDIT_FOLLOWUP_2026-05-24.md

This report is retained as the historical baseline. Its implementation has been traced in the 2026-05-24 follow-up audit, so remaining work should be tracked from the follow-up report instead of reopening duplicate jobs from this file.

Audit date: 2026-05-22
Plugin version observed: 0.1.3
Scope: UI, UX, admin settings, Gutenberg sidebar workflow, conversation context/history, cost tracking, provider/model routing, image generation, local backend, security, data lifecycle, maintainability.

Executive Summary

WP Agentic Writer has a strong product direction: a plan-first writing assistant inside Gutenberg with chat, planning, writing, refinement, research, image suggestions, SEO/GEO helpers, provider routing, local backend support, and cost visibility. The problem is not lack of ambition. The problem is that too many responsibilities are packed into a few files without stable contracts between state, persistence, providers, and UI.

The highest risk pattern is this: the plugin now has two overlapping persistence models for conversation history. Older post meta storage (_wpaw_chat_history, _wpaw_plan, _wpaw_memory) still exists, while newer session storage (wpaw_conversations) was added but is not reliably migrated or permission-scoped. That creates the exact failure mode you described: fixing one flow can silently break another because different screens and endpoints read from different truth sources.

Overall readiness assessment: beta/prototype with several production blockers. Syntax checks pass for key PHP and JS files, but there are serious runtime, migration, security, and model-cache defects.

Critical Findings

P0: Conversation Table Migration Is Not Wired

The new conversation manager expects a wpaw_conversations table, but activation only creates cost and image tables. wpaw_run_migrations() exists but is not called. Worse, the general DB version is set to 1.1.0, while the conversation migration checks for < 0.1.4, so sites can be marked upgraded without the conversation table ever being created.

Evidence:

  • wp-agentic-writer.php:136-180 creates default options, custom models, cost table, and image tables, but not conversations.
  • wp-agentic-writer.php:219-231 updates wpaw_db_version to 1.1.0.
  • includes/class-conversation-migration.php:17-44 defines conversation table creation.
  • includes/class-conversation-migration.php:64-69 defines migration runner, but it is not hooked or called.

Impact:

  • New chat/session UX can fail with DB insert/read errors on clean installs or upgraded installs.
  • Fixing frontend session behavior may appear broken because the database contract is missing.

Recommendation:

  • Split DB versions per table/domain, for example wpaw_cost_db_version, wpaw_image_db_version, wpaw_conversation_db_version.
  • Call conversation migrations on activation and on plugins_loaded idempotently.
  • Add a visible admin health check that verifies required tables exist.

P0: OpenRouter Model Cache Has Conflicting Shapes

get_cached_models() stores the full OpenRouter model objects in transient wpaw_openrouter_models. validate_model_availability() uses the same transient key but expects a flat list of model IDs. If the settings page has already cached full model objects, streaming and image validation will reject valid models because in_array($model_id, $available_models, true) compares a string to arrays.

Evidence:

  • includes/class-openrouter-provider.php:105-172 caches full model objects under wpaw_openrouter_models.
  • includes/class-openrouter-provider.php:239-255 reads the same transient key as if it contains IDs.
  • Validation is used before streaming and image generation at includes/class-openrouter-provider.php:548 and includes/class-openrouter-provider.php:755.

Impact:

  • Valid models can fail as "not available".
  • Refreshing the model list in settings can break generation.
  • This creates a brittle A/B loop: model UI fixes can break streaming/image execution.

Recommendation:

  • Use separate cache keys, e.g. wpaw_openrouter_model_objects and wpaw_openrouter_model_ids.
  • Normalize model validation to accept both canonical IDs and suffix variants like :online, without poisoning the settings model cache.
  • Add a regression test around cached full model objects plus streaming validation.

P0: PHP Requirement Is 7.4 But Code Uses PHP 8 Functions

The plugin header declares PHP 7.4 support, but the provider streaming parsers call str_starts_with(), which requires PHP 8.

Evidence:

  • wp-agentic-writer.php:13-14 declares Requires PHP: 7.4.
  • includes/class-openrouter-provider.php:642, includes/class-local-backend-provider.php:208, and includes/class-codex-provider.php:207 call str_starts_with().

Impact:

  • Fatal errors on PHP 7.4 sites when streaming code paths load.

Recommendation:

  • Either raise Requires PHP to 8.0+ or replace with 0 === strpos($line, 'data: ').

P0: Conversation Endpoints Lack Per-Session Ownership Checks

REST permission is only current_user_can('edit_posts'). The conversation handlers read, update, delete, and overwrite messages by session_id without checking that the session belongs to the current user or that the user can edit the linked post.

Evidence:

  • includes/class-gutenberg-sidebar.php:847-849 grants all REST routes to anyone who can edit posts.
  • includes/class-gutenberg-sidebar.php:6977-6991 returns any session by session_id.
  • includes/class-gutenberg-sidebar.php:7001-7033 updates any session by session_id.
  • includes/class-gutenberg-sidebar.php:7043-7060 deletes any session by session_id.
  • includes/class-gutenberg-sidebar.php:7070-7098 overwrites messages for any session by session_id.

Impact:

  • Any editor-level user who obtains or guesses a session ID can read or modify another user's conversation.
  • Stored article prompts, SEO keywords, unpublished plans, and drafts can leak.

Recommendation:

  • Add Conversation_Manager::current_user_can_access($session_id) and enforce it on all session routes.
  • For linked post sessions, also require current_user_can('edit_post', $post_id).
  • Increase session IDs to a stronger token, e.g. wp_generate_uuid4() or bin2hex(random_bytes(16)).

High Priority Findings

P1: Two Context Stores Compete Instead Of Cooperating

Current code keeps post meta chat history and new session messages at the same time.

Evidence:

  • handle_chat_request() updates post meta chat history at includes/class-gutenberg-sidebar.php:924-930 and later.
  • Frontend saves every message array to /conversations/{session_id}/messages at assets/js/sidebar.js:287-318.
  • Frontend initializes sessions through /conversations/post/{postId} and /conversations?uncompleted=1 at assets/js/sidebar.js:192-267.

Impact:

  • Chat mode, planning mode, writing mode, and resume mode can see different histories.
  • Clearing context deletes post meta but not necessarily the session messages.
  • "Continue conversation" can restore messages while _wpaw_plan or _wpaw_memory remains stale.

Recommendation:

  • Pick one source of truth for conversational history. Prefer wpaw_conversations for messages and context, with post meta only storing the current plan and lightweight indexes.
  • Define a single context assembly service used by chat, plan, write, refine, SEO, and image flows.
  • Make "clear context" clear both the active session messages/context and legacy post meta during migration.

P1: Provider Routing Falls Back Silently To OpenRouter

If a configured local backend is unreachable or unsupported, provider manager silently falls back to OpenRouter.

Evidence:

  • includes/class-provider-manager.php:33-45 returns OpenRouter fallback if selected provider is not configured or local connection test fails.

Impact:

  • A user choosing local/private/free generation may unknowingly send prompts to OpenRouter.
  • Cost expectations and privacy expectations can be violated.
  • Debugging provider behavior becomes confusing because UI selection is not guaranteed execution.

Recommendation:

  • Make fallback behavior explicit and configurable: "fail closed" vs "fallback to OpenRouter".
  • Return provider metadata in each API response so the UI can show the actual provider used.
  • Add a preflight provider health state in settings and sidebar.

P1: Cost Tracking Setting Does Not Stop Tracking Or Enforce Budget

cost_tracking_enabled controls parts of the frontend display, but the backend cost hook always writes records. Monthly budget is display-only and does not prevent expensive calls.

Evidence:

  • Cost tracker always registers the hook at includes/class-cost-tracker.php:42-44.
  • add_request() inserts every event without checking settings at includes/class-cost-tracker.php:58-75.
  • Frontend skips fetching if disabled at assets/js/sidebar.js:501-505, but backend still records.

Impact:

  • The setting name implies disabling tracking, but data is still stored.
  • Budget UI can be misleading because it is not a guardrail.

Recommendation:

  • Decide whether the setting means "hide UI" or "do not store usage"; rename or implement accordingly.
  • Add optional soft and hard budget policies before provider calls.
  • Track actual provider, request ID, session ID, and failure state for reconciliation.

P1: API Route Contracts Are Too Loose

Most REST routes accept raw JSON and manually read fields. Routes do not declare args schemas or sanitize/validate centrally.

Evidence:

  • Routes are registered without args schemas beginning at includes/class-gutenberg-sidebar.php:287-365.
  • Handler code manually reads arbitrary payloads, e.g. handle_chat_request() at includes/class-gutenberg-sidebar.php:858-914.

Impact:

  • Small frontend changes can break backend assumptions.
  • Security review becomes harder because validation is spread across handlers.
  • No machine-readable contract exists for tests.

Recommendation:

  • Add route args definitions for all simple endpoints.
  • Introduce request DTO/helper methods for complex generation/refinement requests.
  • Add contract tests for each endpoint with valid, missing, malformed, and unauthorized payloads.

P1: Main Backend Class Is Too Large To Change Safely

includes/class-gutenberg-sidebar.php is roughly 7,200 lines and owns asset enqueueing, route registration, request validation, prompt assembly, streaming, SEO, GEO, research, image routes, conversation routes, and persistence.

Impact:

  • Any change has a large blast radius.
  • Prompt changes, UI changes, and persistence changes are tangled.
  • This directly contributes to "fix A, lose B" cycles.

Recommendation:

  • Split by ownership:
    • Rest_Routes registers routes only.
    • Context_Service assembles messages/context/history.
    • Workflow_Service handles planning/writing/refinement state.
    • Provider_Service wraps provider selection and fallback.
    • Cost_Service handles usage policies.
    • Conversation_Rest_Controller, Image_Rest_Controller, Seo_Rest_Controller.

Medium Priority Findings

P2: Admin Settings Depend On External CDNs

The settings page enqueues Bootstrap and Select2 from CDN.

Evidence:

  • includes/class-settings-v2.php:67-75 loads CDN CSS/JS.

Impact:

  • Settings UI can break offline or in restricted admin environments.
  • Supply-chain and privacy expectations are weaker for a plugin admin page.

Recommendation:

  • Bundle vendor assets locally or use WordPress-native components where possible.

P2: Uninstall Is Incomplete And Duplicated

There is both register_uninstall_hook() in the main plugin file and an uninstall.php. Cleanup differs between them and neither fully cleans new data.

Evidence:

  • Main uninstall deletes settings and cost/image tables at wp-agentic-writer.php:259-267.
  • uninstall.php deletes settings, _wpaw_plan, and cost table only.
  • Neither path deletes wp_agentic_writer_custom_models, wpaw_db_version, wpaw_conversations, _wpaw_chat_history, _wpaw_memory, _wpaw_post_config, _wpaw_detected_language, writing state meta, or image-related post meta.

Impact:

  • Reinstall behavior is unpredictable.
  • Old settings and tables can affect fresh testing.

Recommendation:

  • Use one uninstall path.
  • Add a documented "delete all data on uninstall" option.
  • Clean all plugin options, transients, tables, upload temp files, scheduled events, and post meta.

P2: Image Generation Is Partially Integrated

The image manager has tables, recommendations, variants, commit flow, and temp cleanup, but cost tracking and error handling are incomplete.

Risks:

  • Image generation costs are not consistently inserted into the cost tracking table.
  • Temp files are written with file_put_contents() without checking result or validating MIME/content length.
  • Committed variants use media_handle_sideload() from the temp path, so failure modes can delete/move temp files unexpectedly.

Recommendation:

  • Add wp_aw_after_api_request events for image generation.
  • Validate downloaded image type and size before writing.
  • Add image state transitions: pending -> generating -> temp_ready -> committed -> failed.

P2: Settings Defaults And Model Labels Are Inconsistent

Defaults differ across activation, settings V2, OpenRouter provider, settings fallback, and UI copy.

Examples:

  • Activation uses execution_model but current code uses writing_model.
  • Activation default planning model is google/gemini-2.0-flash-exp, while settings/provider defaults use google/gemini-2.5-flash.
  • Refinement defaults vary between Haiku and Sonnet.

Impact:

  • Fresh install, upgraded install, and settings save can select different models.
  • Model bugs are hard to reproduce because initial state depends on install path.

Recommendation:

  • Create a single model preset registry in PHP and expose it to JS.
  • Run one migration that maps execution_model to writing_model and removes stale defaults.
  • Add "current saved model is unavailable" UI with fallback choice.

P2: Debug Logging Is Too Noisy For Production

Several error_log() and console.log() calls are unconditional or reveal request behavior and settings.

Examples:

  • Asset enqueue logs at includes/class-gutenberg-sidebar.php:73-74.
  • Provider routing logs at includes/class-provider-manager.php:28.
  • Streaming provider settings logs at includes/class-gutenberg-sidebar.php:3041-3042.
  • Frontend session logs at assets/js/sidebar.js:5119-5130.

Impact:

  • Logs can expose topics, model choices, local backend status, and partial AI responses.
  • Debug noise hides real defects.

Recommendation:

  • Add wpaw_debug_log() gated behind WP_DEBUG && SCRIPT_DEBUG or a plugin debug setting.
  • Never log API keys, full prompts, full responses, or private drafts by default.

UI/UX Assessment

What Works

  • The product concept is coherent: chat -> clarify -> plan -> write -> refine.
  • Gutenberg-side integration is stronger than a typical "AI text box" plugin.
  • @mentions and block toolbar actions are a strong foundation for an IDE-like writing workflow.
  • The admin settings V2 layout gives a clearer mental model for model selection, local backend, cost analytics, and docs.

UX Gaps

  • The sidebar has too many implicit modes. Users can be in chat, planning, writing, sessions list, welcome screen, empty writing state, cost tab, SEO tab, and clarification mode, but those states do not share a single state machine.
  • "Writing mode" can behave like discussion-only in some paths, while actual writing requires a plan. This is easy to misunderstand.
  • Context status is not transparent enough. Users cannot easily see "what the agent remembers", "which session is active", "which provider will run", or "what will be sent".
  • Cost UI shows spend, but not clear preflight estimates or post-call reconciliation by provider.
  • There is no review/accept/reject safety layer for high-impact article edits. Generated blocks can be inserted directly.

Replace mode ambiguity with a visible workflow state:

  1. Context: topic, keyword, language, audience, source material.
  2. Plan: outline draft, editable sections, approve plan.
  3. Write: section-by-section generation with pause/resume.
  4. Review: diff, SEO/GEO checks, image recommendations.
  5. Publish assist: metadata, schema, final checklist.

Each state should expose the active provider, cost estimate, context source, and next best action.

System Architecture Assessment

Current Shape

flowchart TD
  UI["assets/js/sidebar.js"]
  Routes["class-gutenberg-sidebar.php"]
  OR["OpenRouter Provider"]
  Local["Local Backend Provider"]
  Codex["Codex Provider"]
  Cost["Cost Tracker"]
  Meta["Post Meta"]
  Conv["wpaw_conversations"]
  Images["Image Manager"]

  UI --> Routes
  Routes --> OR
  Routes --> Local
  Routes --> Codex
  Routes --> Cost
  Routes --> Meta
  Routes --> Conv
  Routes --> Images
  UI --> Conv
  UI --> Meta

The core issue is that both UI and backend understand too much about everything. The architecture needs boundaries more than it needs new features.

Target Shape

flowchart TD
  UI["Sidebar UI"]
  REST["REST Controllers"]
  Workflow["Workflow Service"]
  Context["Context Service"]
  Provider["Provider Gateway"]
  Cost["Cost Policy + Ledger"]
  Store["Conversation + Post State Store"]

  UI --> REST
  REST --> Workflow
  Workflow --> Context
  Workflow --> Provider
  Workflow --> Cost
  Context --> Store
  Cost --> Store

The important change is that every generation path asks the same Context_Service for context and the same Provider_Gateway for provider execution. That gives you one place to fix context bugs and one place to fix provider/cost bugs.

Context And History Audit

Current context layers:

  • Frontend React state: immediate but volatile.
  • localStorage: agent mode only.
  • Post meta: _wpaw_chat_history, _wpaw_plan, _wpaw_memory, _wpaw_post_config, _wpaw_detected_language, writing state.
  • Conversation table: session messages/context/status/title/focus keyword.

Key gaps:

  • Session context field exists but frontend mostly saves messages, not a normalized workflow context.
  • Post-linked and uncompleted sessions are mixed into the same UI without a clear transition.
  • Auto-save of every messages array can overwrite richer backend state with stale frontend state.
  • There is no schema/version for message objects, so plan cards, timeline entries, assistant messages, and system info live in the same array.

Recommended contract:

{
  "session_id": "uuid",
  "post_id": 123,
  "workflow_state": "context|planning|writing|review|done",
  "messages": [],
  "context_summary": "",
  "plan_id": "uuid",
  "active_provider": "openrouter|local_backend|codex",
  "cost_session_id": "uuid",
  "updated_at": "datetime"
}

Cost Tracking Audit

Current strengths:

  • Central cost hook exists.
  • Sidebar and settings cost views exist.
  • Cost log grouping by post is useful.

Current gaps:

  • No session ID in cost records.
  • No provider column.
  • No request status or error records.
  • No distinction between estimated and actual cost.
  • No hard budget stop.
  • Disabled tracking does not stop backend inserts.
  • Local backend and Codex cost semantics differ from OpenRouter but share the same table model.

Recommended table changes:

  • provider
  • session_id
  • request_id
  • status
  • estimated_cost
  • actual_cost
  • currency
  • metadata_json

Models And Provider Audit

Current strengths:

  • Per-task model selection is directionally right.
  • OpenRouter model refresh exists.
  • Custom models can be added.
  • Provider routing supports OpenRouter, local backend, and Codex.

Current gaps:

  • Model cache bug is production-blocking.
  • Provider fallback is silent.
  • Codex provider uses older Chat Completions assumptions and hardcoded stale pricing.
  • Local backend test runs an inference call, which may be unexpectedly slow/costly for a "test connection".
  • Image model selection trusts OpenRouter modalities but custom models bypass capability validation.

Recommended provider contract:

ProviderResult {
  provider: string,
  model: string,
  content: string,
  usage: Usage,
  cost: Cost,
  capabilities: string[],
  warnings: string[]
}

Test And Verification Gaps

Checks run during this audit:

  • php -l wp-agentic-writer.php
  • php -l includes/class-gutenberg-sidebar.php
  • php -l includes/class-settings-v2.php
  • php -l includes/class-openrouter-provider.php
  • php -l includes/class-image-manager.php
  • php -l includes/class-conversation-migration.php
  • node --check assets/js/sidebar.js
  • node --check assets/js/settings-v2.js
  • node --check assets/js/sidebar-utils.js
  • node --check assets/js/block-refine.js
  • node --check assets/js/block-image-generate.js

All checked files passed syntax checks.

Missing test coverage:

  • Activation/migration tests for clean install and upgrade.
  • REST permission tests for conversations and post config.
  • Provider model-cache regression tests.
  • Context assembly snapshots per mode.
  • Streaming parser tests for OpenRouter, local backend, and Codex.
  • Cost ledger tests with tracking disabled, zero-cost local calls, and failed requests.
  • Gutenberg e2e tests for chat -> plan -> write -> refresh -> resume.

Stabilization Roadmap

Phase 1: Stop Runtime Breakage

  1. Fix PHP 7.4 compatibility or raise PHP requirement.
  2. Fix OpenRouter model cache shape conflict.
  3. Wire conversation migrations correctly.
  4. Add ownership checks on all conversation endpoints.
  5. Gate debug logging.

Phase 2: Stabilize State

  1. Declare one source of truth for conversation messages.
  2. Create a context service used by all generation paths.
  3. Migrate legacy post meta chat history into sessions.
  4. Make clear context/session/post behavior explicit.
  5. Add workflow state to session context.

Phase 3: Stabilize Cost And Provider Behavior

  1. Add provider metadata to all AI responses.
  2. Make provider fallback explicit.
  3. Add budget preflight and optional hard limit.
  4. Expand cost table with provider/session/request fields.
  5. Track image and failed request costs consistently.

Phase 4: Reduce Blast Radius

  1. Split class-gutenberg-sidebar.php into controllers and services.
  2. Add REST schemas and shared request validators.
  3. Build integration tests around the main workflows.
  4. Add a small internal fixture suite for model/provider responses.
  5. Remove backup files and duplicate settings/documentation paths after confirming they are unused.

Highest Leverage Opportunities

  • Make the plugin feel safer: add preview/diff/accept/reject for refinements and article-wide edits.
  • Make the agent feel smarter: show "current context" and let users edit what the agent remembers.
  • Make costs trustworthy: show preflight estimate, actual cost, provider, and model after every operation.
  • Make local backend trustworthy: no silent cloud fallback unless the user explicitly opts in.
  • Make model selection resilient: capability badges, availability checks, and clear fallbacks.
  • Make the codebase easier to evolve: services plus tests around the workflows that matter.

Suggested Definition Of Done For Future Fixes

For any feature or bug fix touching chat, planning, writing, refinement, context, provider, or cost:

  1. It must state which storage layer is authoritative.
  2. It must include the provider/model actually used in the response.
  3. It must update or preserve cost records intentionally.
  4. It must pass at least one workflow test from chat to final editor state.
  5. It must not add another source of truth for the same state.

This is the guardrail that prevents losing A while fixing B.