Files
wp-agentic-writer/docs/architecture/OPENROUTER_BYOK_CONTEXT_STREAMING_SPEC.md
2026-06-06 00:29:10 +07:00

16 KiB

OpenRouter BYOK Context and Streaming Spec

Date: 2026-06-05 Status: Proposed implementation direction Goal: Replace local bash/proxy-first text generation with an OpenRouter BYOK-first API path while preserving article continuity and improving streamed editor UX.

Decision

Use OpenRouter as the primary text transport for chat, clarity, planning, writing, and refinement, with the user's OpenRouter workspace configured for BYOK provider keys.

The plugin should continue to store conversation and article memory in WordPress. OpenRouter should be treated as a stateless model gateway: it streams model output, returns usage metadata, applies provider routing, and can cache identical responses. It does not own article continuity.

Local Backend should become optional or legacy. It is useful for experiments, but it should not be the recommended default because it asks users to run local scripts/proxy tooling and creates trust friction.

Current Implementation Snapshot

The current plugin already has most of the foundation:

  • includes/interface-ai-provider.php defines chat(), chat_stream(), generate_image(), is_configured(), test_connection(), and supports_task_type().
  • includes/class-provider-manager.php routes each task through configured providers and already prevents silent OpenRouter spend when fallback is disabled.
  • includes/class-openrouter-provider.php supports non-streaming and streaming chat completions through OpenRouter.
  • includes/class-local-backend-provider.php supports a local proxy at /v1/messages, including a cURL streaming parser and plain JSON fallback.
  • includes/class-conversation-manager.php stores sessions in {$wpdb->prefix}wpaw_conversations with messages and context JSON fields.
  • includes/class-context-service.php is already documented as the single source of truth for messages, _wpaw_plan, _wpaw_post_config, and legacy chat migration.
  • includes/class-gutenberg-sidebar.php exposes the main REST routes: /chat, /generate-plan, /revise-plan, /execute-article, /refine-block, /refine-from-chat, /summarize-context, /detect-intent, /writing-state/{post_id}, and conversation routes.
  • Cost tracking already records post_id, session_id, model, provider, action, input tokens, output tokens, cost, and status.

The main gap is not lack of streaming. The gap is that several routes still accept full chatHistory from the browser and inject it into prompts. That makes continuity depend on the browser payload and can re-send too much context.

Product Positioning

Recommended provider settings:

'task_providers' => array(
	'chat'       => 'openrouter',
	'clarity'    => 'openrouter',
	'planning'   => 'openrouter',
	'writing'    => 'openrouter',
	'refinement' => 'openrouter',
	'image'      => 'openrouter',
),
'allow_openrouter_fallback' => false,

The UI copy should present this as:

  • Connect OpenRouter API key.
  • Configure BYOK provider keys inside OpenRouter.
  • Stream directly into WordPress.
  • Keep all article memory in WordPress.
  • Local Backend is advanced or legacy.

OpenRouter BYOK details to reflect in docs:

  • BYOK lets users route requests through their own provider keys while still using OpenRouter's API surface.
  • BYOK provider keys are encrypted and used for requests routed through the selected provider.
  • OpenRouter's BYOK fee is documented as 5 percent of the normal OpenRouter model/provider cost, waived for the first 1M BYOK requests per month.
  • Users can prevent fallback to OpenRouter shared endpoints by enabling the provider key's "Always use for this provider" behavior in OpenRouter.
  • OpenRouter usage data is returned in normal responses and in the last SSE message for streamed responses.

Sources:

Continuity Ownership

Continuity is owned by WordPress, not OpenRouter.

Persisted state:

State Current storage Keep or change
Conversation messages wpaw_conversations.messages Keep
Session context wpaw_conversations.context Extend
Article plan _wpaw_plan post meta Keep
Post config _wpaw_post_config post meta Keep
Writing state _wpaw_writing_status, _wpaw_current_section, _wpaw_sections_written, _wpaw_resume_token Keep
Section to block mapping _wpaw_section_blocks Keep
Lightweight post memory _wpaw_memory Extend or migrate into context
Cost and token usage wpaw_cost_tracking Extend

Recommended new session context shape:

{
  "working_summary": {
    "text": "The article is about ...",
    "updated_at": "2026-06-05T10:30:00+07:00",
    "source_message_count": 14
  },
  "decisions": [
    {
      "type": "accept",
      "target": "outline.section.2",
      "summary": "Keep the practical checklist framing.",
      "created_at": "2026-06-05T10:31:00+07:00"
    }
  ],
  "rejections": [
    {
      "target": "outline.section.4",
      "summary": "Too generic; needs concrete WordPress examples.",
      "created_at": "2026-06-05T10:32:00+07:00"
    }
  ],
  "research_notes": [
    {
      "source": "manual",
      "title": "User supplied constraint",
      "excerpt": "Avoid local bash instructions in the default UX.",
      "tags": ["trust", "onboarding"]
    }
  ],
  "token_policy": {
    "max_recent_messages": 6,
    "max_summary_tokens": 600,
    "max_research_snippets": 5
  }
}

Store this in wpaw_conversations.context first. Avoid adding a new custom table until context becomes too large or needs relational querying.

Context Builder

Add a dedicated builder instead of assembling continuity inside each REST handler.

New file:

includes/class-context-builder.php

Primary API:

class WP_Agentic_Writer_Context_Builder {
	public function build_for_task( $task, $session_id, $post_id, $request_params = array() ) {
		// Returns normalized prompt parts for chat, planning, writing, refinement, SEO.
	}
}

Return shape:

array(
	'system_context'   => 'Stable task and policy instructions.',
	'working_context'  => 'Compact summary, decisions, plan, selected post config.',
	'active_content'   => 'The exact section/block/article slice being edited.',
	'research_context' => 'Only relevant excerpts.',
	'audit'            => array(
		'included_recent_messages' => 6,
		'included_research_items'  => 3,
		'estimated_input_tokens'   => 2200,
		'used_full_history'        => false,
	),
)

Context assembly rules:

  • Always include the task system prompt and language instruction.
  • Always include post config summary: audience, tone, language, article length, SEO fields, web search preference.
  • Include _wpaw_plan for planning, writing, and outline refinement.
  • Include only the active block or section for block refinement.
  • Include recent raw messages only up to max_recent_messages.
  • Include working_summary when message history is long.
  • Include decisions and rejections as compact bullet points.
  • Include post content only when the task requires whole-article awareness, such as final polish or article-wide refinement.
  • Never trust browser-provided chatHistory as authoritative if sessionId is available.

Endpoint Changes

/chat

Current behavior:

  • Receives messages from the browser.
  • Prepends a system prompt.
  • Streams or returns a chat response.
  • Persists user and assistant messages.

Required change:

  • Use browser messages only to identify the latest user message.
  • Load authoritative session context from WP_Agentic_Writer_Context_Service.
  • Build final messages through WP_Agentic_Writer_Context_Builder.
  • Persist the raw user message and assistant response after completion.

/generate-plan

Current behavior:

  • Accepts topic, context, chatHistory, and other config.
  • Serializes full chatHistory into the planning prompt.
  • Stores _wpaw_plan and _wpaw_memory.

Required change:

  • Keep topic, context, clarificationAnswers, and post_config.
  • Replace full chatHistory injection with a context package from the builder.
  • Save generated plan to _wpaw_plan.
  • Update wpaw_conversations.context.working_summary after plan generation.

/revise-plan

Required behavior:

  • Include current _wpaw_plan.
  • Include latest user instruction.
  • Include accepted/rejected outline decisions.
  • Ask for raw JSON plan only.
  • Save previous plan as a version entry inside wpaw_conversations.context.plan_versions before overwriting _wpaw_plan.

/execute-article

Current behavior:

  • Writes sections from the plan.
  • Streams section content and block events.
  • Updates _wpaw_plan section statuses.

Required change:

  • For each section, send the section brief, global article summary, relevant decisions, and relevant research.
  • Do not send the full conversation for every section.
  • After each section completes, update writing state and append a section summary to session context.

/refine-block and /refine-from-chat

Required behavior:

  • Send active block content, neighboring heading/section context, relevant plan entry, and latest instruction.
  • Include compact working summary and decisions.
  • Do not include the full draft unless the requested operation is article-wide.

/summarize-context

Current behavior:

  • Summarizes browser-provided chatHistory.
  • Returns summary but does not appear to be the authoritative persistence mechanism.

Required change:

  • Accept sessionId.
  • Load authoritative session messages.
  • Save the resulting summary into wpaw_conversations.context.working_summary.
  • Return summary, message_count, source_message_count, tokens_saved, and provider metadata.

Streaming Transport

OpenRouter streaming is already implemented in WP_Agentic_Writer_OpenRouter_Provider::chat_stream().

Keep this transport shape:

$body = array(
	'model'    => $model,
	'messages' => $messages,
	'stream'   => true,
);

Modernize usage handling:

  • OpenRouter now returns full usage metadata automatically.
  • usage: { include: true } and stream_options: { include_usage: true } are documented as deprecated and no longer required.
  • Keep parsing the final usage object from streamed chunks.
  • Extend cost tracking to store cache metadata when available.

Recommended emitted SSE events:

{"type":"provider","provider":"openrouter","model":"openai/gpt-4o-mini","byok_expected":true}
{"type":"conversational_stream","content":"partial accumulated text"}
{"type":"usage","input_tokens":1200,"output_tokens":360,"cached_tokens":0,"cost":0.0012}
{"type":"complete","session_id":"abc123","totalCost":0.0012}

Use the existing browser parsing path in assets/js/sidebar.js and add support for the optional provider and usage event types.

Response Caching Policy

OpenRouter response caching should be used for deterministic, duplicate-safe operations only. It is not article memory.

Recommended use:

  • detect_intent
  • summarize_context retry
  • connection test
  • repeated model capability lookups if routed through completion calls

Avoid by default:

  • article draft generation
  • outline revision
  • refinement requests
  • image prompt generation

Provider implementation change:

if ( ! empty( $options['openrouter_response_cache'] ) ) {
	$headers[] = 'X-OpenRouter-Cache: true';
	$headers[] = 'X-OpenRouter-Cache-TTL: ' . (int) ( $options['openrouter_cache_ttl'] ?? 300 );
}

Important limitations:

  • Cache hits only happen for identical requests.
  • Streaming and non-streaming requests are cached separately.
  • Cache hit usage counters are zeroed.
  • Response caching is beta and requires OpenRouter to store response data temporarily.

Usage and Budget Tracking

Extend wpaw_cost_tracking with optional cache and upstream fields:

ALTER TABLE {$wpdb->prefix}wpaw_cost_tracking
	ADD COLUMN cached_tokens int(11) DEFAULT 0 AFTER output_tokens,
	ADD COLUMN cache_write_tokens int(11) DEFAULT 0 AFTER cached_tokens,
	ADD COLUMN upstream_inference_cost decimal(10,6) DEFAULT NULL AFTER cost,
	ADD COLUMN generation_id varchar(64) DEFAULT '' AFTER status;

Implementation notes:

  • Put this behind a schema version bump, not plugin version alone.
  • Keep existing maybe_upgrade_table() pattern in WP_Agentic_Writer_Cost_Tracker.
  • Parse usage.prompt_tokens_details.cached_tokens.
  • Parse usage.prompt_tokens_details.cache_write_tokens.
  • Parse usage.cost_details.upstream_inference_cost for BYOK requests.
  • Include a monthly token budget view alongside the existing cost view.

Budget metric examples:

billable_input_tokens = max( 0, input_tokens - cached_tokens );
total_monthly_tokens = sum( input_tokens + output_tokens );
byok_free_request_counter = count( provider = 'openrouter' and status = 'success' );

Note: OpenRouter documents the BYOK waiver as first 1M BYOK requests per month, not first 1M tokens. Keep UI wording precise.

Settings UI Changes

Update Settings V2:

  • Rename default cloud path to OpenRouter BYOK / API.
  • Keep API key storage in wp_agentic_writer_settings.openrouter_api_key.
  • Add a help panel explaining that provider BYOK keys are configured in OpenRouter, not in WordPress.
  • Add a "Prevent shared fallback" checklist item that links users to OpenRouter BYOK provider settings.
  • Move Local Backend to an Advanced or Legacy Local Backend section.
  • Make provider routing default all text tasks to openrouter.
  • Keep image task on openrouter.
  • Show a trust note: WordPress streams directly to OpenRouter; no local shell or CLI process is required.

Do not collect provider keys directly in WordPress unless there is a deliberate product decision to bypass OpenRouter BYOK management. The safer default is only storing the OpenRouter API key.

Migration Plan

Phase 1: Documentation and defaults

  • Add this spec.
  • Update user-facing Local Backend docs to say local backend is optional/advanced.
  • Default new installs to OpenRouter for all tasks.
  • Keep existing installs unchanged unless the user opts in.

Phase 2: Context builder

  • Add includes/class-context-builder.php.
  • Load it from wp-agentic-writer.php.
  • Move repeated context assembly out of class-gutenberg-sidebar.php.
  • Make /chat, /generate-plan, /revise-plan, and refinement endpoints use the builder.

Phase 3: Authoritative summaries

  • Extend WP_Agentic_Writer_Context_Service with:
    • get_session_context( $session_id )
    • update_session_context( $session_id, $patch )
    • summarize_session_if_needed( $session_id, $post_id )
  • Make /summarize-context persist summaries to wpaw_conversations.context.
  • Store plan versions and section summaries in context.

Phase 4: Streaming and usage polish

  • Remove deprecated OpenRouter usage request parameters.
  • Emit optional provider and usage SSE events.
  • Extend cost tracking schema for cached tokens and BYOK upstream cost.
  • Add UI display for monthly token usage.

Phase 5: Local backend repositioning

  • Move local backend downloads and setup UI to advanced/legacy.
  • Keep WP_Agentic_Writer_Local_Backend_Provider for existing users.
  • Disable automatic local backend recommendation in onboarding.

Acceptance Criteria

  • A new article can be planned and written through OpenRouter streaming without any local bash/proxy setup.
  • Existing conversation history persists through wpaw_conversations.
  • Plan generation no longer sends full browser chatHistory when sessionId is available.
  • Refining a block includes active block, relevant plan, compact decisions, and recent messages, not full raw history.
  • Streaming responses show partial text in the editor and finish with usage metadata.
  • Cost tracking records provider, model, action, session, tokens, and cost as it does today.
  • New cache fields are recorded when OpenRouter returns them.
  • Local Backend still works for users who already configured it, but it is no longer the default recommendation.

Implementation Risks

  • Some existing frontend flows rely on messages as the full source of truth. Those flows need to pass sessionId reliably before backend context can become authoritative.
  • wpaw_conversations.context is LONGTEXT, so it can hold rich JSON, but large contexts should still be summarized to keep admin queries fast.
  • OpenRouter response caching is beta and should not be presented as durable memory.
  • BYOK provider fallback behavior is configured in OpenRouter, so the WordPress UI can guide and detect symptoms but cannot fully enforce provider-key policy from this plugin alone.