16 KiB
OpenRouter BYOK Context and Streaming Spec
Date: 2026-06-05 Status: Proposed implementation direction Goal: Replace local bash/proxy-first text generation with an OpenRouter BYOK-first API path while preserving article continuity and improving streamed editor UX.
Decision
Use OpenRouter as the primary text transport for chat, clarity, planning, writing, and refinement, with the user's OpenRouter workspace configured for BYOK provider keys.
The plugin should continue to store conversation and article memory in WordPress. OpenRouter should be treated as a stateless model gateway: it streams model output, returns usage metadata, applies provider routing, and can cache identical responses. It does not own article continuity.
Local Backend should become optional or legacy. It is useful for experiments, but it should not be the recommended default because it asks users to run local scripts/proxy tooling and creates trust friction.
Current Implementation Snapshot
The current plugin already has most of the foundation:
includes/interface-ai-provider.phpdefineschat(),chat_stream(),generate_image(),is_configured(),test_connection(), andsupports_task_type().includes/class-provider-manager.phproutes each task through configured providers and already prevents silent OpenRouter spend when fallback is disabled.includes/class-openrouter-provider.phpsupports non-streaming and streaming chat completions through OpenRouter.includes/class-local-backend-provider.phpsupports a local proxy at/v1/messages, including a cURL streaming parser and plain JSON fallback.includes/class-conversation-manager.phpstores sessions in{$wpdb->prefix}wpaw_conversationswithmessagesandcontextJSON fields.includes/class-context-service.phpis already documented as the single source of truth for messages,_wpaw_plan,_wpaw_post_config, and legacy chat migration.includes/class-gutenberg-sidebar.phpexposes the main REST routes:/chat,/generate-plan,/revise-plan,/execute-article,/refine-block,/refine-from-chat,/summarize-context,/detect-intent,/writing-state/{post_id}, and conversation routes.- Cost tracking already records
post_id,session_id,model,provider,action, input tokens, output tokens, cost, and status.
The main gap is not lack of streaming. The gap is that several routes still accept full chatHistory from the browser and inject it into prompts. That makes continuity depend on the browser payload and can re-send too much context.
Product Positioning
Recommended provider settings:
'task_providers' => array(
'chat' => 'openrouter',
'clarity' => 'openrouter',
'planning' => 'openrouter',
'writing' => 'openrouter',
'refinement' => 'openrouter',
'image' => 'openrouter',
),
'allow_openrouter_fallback' => false,
The UI copy should present this as:
- Connect OpenRouter API key.
- Configure BYOK provider keys inside OpenRouter.
- Stream directly into WordPress.
- Keep all article memory in WordPress.
- Local Backend is advanced or legacy.
OpenRouter BYOK details to reflect in docs:
- BYOK lets users route requests through their own provider keys while still using OpenRouter's API surface.
- BYOK provider keys are encrypted and used for requests routed through the selected provider.
- OpenRouter's BYOK fee is documented as 5 percent of the normal OpenRouter model/provider cost, waived for the first 1M BYOK requests per month.
- Users can prevent fallback to OpenRouter shared endpoints by enabling the provider key's "Always use for this provider" behavior in OpenRouter.
- OpenRouter usage data is returned in normal responses and in the last SSE message for streamed responses.
Sources:
- https://openrouter.ai/docs/guides/overview/auth/byok
- https://openrouter.ai/docs/cookbook/administration/usage-accounting
- https://openrouter.ai/docs/guides/features/response-caching/
Continuity Ownership
Continuity is owned by WordPress, not OpenRouter.
Persisted state:
| State | Current storage | Keep or change |
|---|---|---|
| Conversation messages | wpaw_conversations.messages |
Keep |
| Session context | wpaw_conversations.context |
Extend |
| Article plan | _wpaw_plan post meta |
Keep |
| Post config | _wpaw_post_config post meta |
Keep |
| Writing state | _wpaw_writing_status, _wpaw_current_section, _wpaw_sections_written, _wpaw_resume_token |
Keep |
| Section to block mapping | _wpaw_section_blocks |
Keep |
| Lightweight post memory | _wpaw_memory |
Extend or migrate into context |
| Cost and token usage | wpaw_cost_tracking |
Extend |
Recommended new session context shape:
{
"working_summary": {
"text": "The article is about ...",
"updated_at": "2026-06-05T10:30:00+07:00",
"source_message_count": 14
},
"decisions": [
{
"type": "accept",
"target": "outline.section.2",
"summary": "Keep the practical checklist framing.",
"created_at": "2026-06-05T10:31:00+07:00"
}
],
"rejections": [
{
"target": "outline.section.4",
"summary": "Too generic; needs concrete WordPress examples.",
"created_at": "2026-06-05T10:32:00+07:00"
}
],
"research_notes": [
{
"source": "manual",
"title": "User supplied constraint",
"excerpt": "Avoid local bash instructions in the default UX.",
"tags": ["trust", "onboarding"]
}
],
"token_policy": {
"max_recent_messages": 6,
"max_summary_tokens": 600,
"max_research_snippets": 5
}
}
Store this in wpaw_conversations.context first. Avoid adding a new custom table until context becomes too large or needs relational querying.
Context Builder
Add a dedicated builder instead of assembling continuity inside each REST handler.
New file:
includes/class-context-builder.php
Primary API:
class WP_Agentic_Writer_Context_Builder {
public function build_for_task( $task, $session_id, $post_id, $request_params = array() ) {
// Returns normalized prompt parts for chat, planning, writing, refinement, SEO.
}
}
Return shape:
array(
'system_context' => 'Stable task and policy instructions.',
'working_context' => 'Compact summary, decisions, plan, selected post config.',
'active_content' => 'The exact section/block/article slice being edited.',
'research_context' => 'Only relevant excerpts.',
'audit' => array(
'included_recent_messages' => 6,
'included_research_items' => 3,
'estimated_input_tokens' => 2200,
'used_full_history' => false,
),
)
Context assembly rules:
- Always include the task system prompt and language instruction.
- Always include post config summary: audience, tone, language, article length, SEO fields, web search preference.
- Include
_wpaw_planfor planning, writing, and outline refinement. - Include only the active block or section for block refinement.
- Include recent raw messages only up to
max_recent_messages. - Include
working_summarywhen message history is long. - Include decisions and rejections as compact bullet points.
- Include post content only when the task requires whole-article awareness, such as final polish or article-wide refinement.
- Never trust browser-provided
chatHistoryas authoritative ifsessionIdis available.
Endpoint Changes
/chat
Current behavior:
- Receives
messagesfrom the browser. - Prepends a system prompt.
- Streams or returns a chat response.
- Persists user and assistant messages.
Required change:
- Use browser
messagesonly to identify the latest user message. - Load authoritative session context from
WP_Agentic_Writer_Context_Service. - Build final messages through
WP_Agentic_Writer_Context_Builder. - Persist the raw user message and assistant response after completion.
/generate-plan
Current behavior:
- Accepts
topic,context,chatHistory, and other config. - Serializes full
chatHistoryinto the planning prompt. - Stores
_wpaw_planand_wpaw_memory.
Required change:
- Keep
topic,context,clarificationAnswers, andpost_config. - Replace full
chatHistoryinjection with a context package from the builder. - Save generated plan to
_wpaw_plan. - Update
wpaw_conversations.context.working_summaryafter plan generation.
/revise-plan
Required behavior:
- Include current
_wpaw_plan. - Include latest user instruction.
- Include accepted/rejected outline decisions.
- Ask for raw JSON plan only.
- Save previous plan as a version entry inside
wpaw_conversations.context.plan_versionsbefore overwriting_wpaw_plan.
/execute-article
Current behavior:
- Writes sections from the plan.
- Streams section content and block events.
- Updates
_wpaw_plansection statuses.
Required change:
- For each section, send the section brief, global article summary, relevant decisions, and relevant research.
- Do not send the full conversation for every section.
- After each section completes, update writing state and append a section summary to session context.
/refine-block and /refine-from-chat
Required behavior:
- Send active block content, neighboring heading/section context, relevant plan entry, and latest instruction.
- Include compact working summary and decisions.
- Do not include the full draft unless the requested operation is article-wide.
/summarize-context
Current behavior:
- Summarizes browser-provided
chatHistory. - Returns summary but does not appear to be the authoritative persistence mechanism.
Required change:
- Accept
sessionId. - Load authoritative session messages.
- Save the resulting summary into
wpaw_conversations.context.working_summary. - Return
summary,message_count,source_message_count,tokens_saved, and provider metadata.
Streaming Transport
OpenRouter streaming is already implemented in WP_Agentic_Writer_OpenRouter_Provider::chat_stream().
Keep this transport shape:
$body = array(
'model' => $model,
'messages' => $messages,
'stream' => true,
);
Modernize usage handling:
- OpenRouter now returns full usage metadata automatically.
usage: { include: true }andstream_options: { include_usage: true }are documented as deprecated and no longer required.- Keep parsing the final
usageobject from streamed chunks. - Extend cost tracking to store cache metadata when available.
Recommended emitted SSE events:
{"type":"provider","provider":"openrouter","model":"openai/gpt-4o-mini","byok_expected":true}
{"type":"conversational_stream","content":"partial accumulated text"}
{"type":"usage","input_tokens":1200,"output_tokens":360,"cached_tokens":0,"cost":0.0012}
{"type":"complete","session_id":"abc123","totalCost":0.0012}
Use the existing browser parsing path in assets/js/sidebar.js and add support for the optional provider and usage event types.
Response Caching Policy
OpenRouter response caching should be used for deterministic, duplicate-safe operations only. It is not article memory.
Recommended use:
detect_intentsummarize_contextretry- connection test
- repeated model capability lookups if routed through completion calls
Avoid by default:
- article draft generation
- outline revision
- refinement requests
- image prompt generation
Provider implementation change:
if ( ! empty( $options['openrouter_response_cache'] ) ) {
$headers[] = 'X-OpenRouter-Cache: true';
$headers[] = 'X-OpenRouter-Cache-TTL: ' . (int) ( $options['openrouter_cache_ttl'] ?? 300 );
}
Important limitations:
- Cache hits only happen for identical requests.
- Streaming and non-streaming requests are cached separately.
- Cache hit usage counters are zeroed.
- Response caching is beta and requires OpenRouter to store response data temporarily.
Usage and Budget Tracking
Extend wpaw_cost_tracking with optional cache and upstream fields:
ALTER TABLE {$wpdb->prefix}wpaw_cost_tracking
ADD COLUMN cached_tokens int(11) DEFAULT 0 AFTER output_tokens,
ADD COLUMN cache_write_tokens int(11) DEFAULT 0 AFTER cached_tokens,
ADD COLUMN upstream_inference_cost decimal(10,6) DEFAULT NULL AFTER cost,
ADD COLUMN generation_id varchar(64) DEFAULT '' AFTER status;
Implementation notes:
- Put this behind a schema version bump, not plugin version alone.
- Keep existing
maybe_upgrade_table()pattern inWP_Agentic_Writer_Cost_Tracker. - Parse
usage.prompt_tokens_details.cached_tokens. - Parse
usage.prompt_tokens_details.cache_write_tokens. - Parse
usage.cost_details.upstream_inference_costfor BYOK requests. - Include a monthly token budget view alongside the existing cost view.
Budget metric examples:
billable_input_tokens = max( 0, input_tokens - cached_tokens );
total_monthly_tokens = sum( input_tokens + output_tokens );
byok_free_request_counter = count( provider = 'openrouter' and status = 'success' );
Note: OpenRouter documents the BYOK waiver as first 1M BYOK requests per month, not first 1M tokens. Keep UI wording precise.
Settings UI Changes
Update Settings V2:
- Rename default cloud path to
OpenRouter BYOK / API. - Keep API key storage in
wp_agentic_writer_settings.openrouter_api_key. - Add a help panel explaining that provider BYOK keys are configured in OpenRouter, not in WordPress.
- Add a "Prevent shared fallback" checklist item that links users to OpenRouter BYOK provider settings.
- Move Local Backend to an
AdvancedorLegacy Local Backendsection. - Make provider routing default all text tasks to
openrouter. - Keep image task on
openrouter. - Show a trust note: WordPress streams directly to OpenRouter; no local shell or CLI process is required.
Do not collect provider keys directly in WordPress unless there is a deliberate product decision to bypass OpenRouter BYOK management. The safer default is only storing the OpenRouter API key.
Migration Plan
Phase 1: Documentation and defaults
- Add this spec.
- Update user-facing Local Backend docs to say local backend is optional/advanced.
- Default new installs to OpenRouter for all tasks.
- Keep existing installs unchanged unless the user opts in.
Phase 2: Context builder
- Add
includes/class-context-builder.php. - Load it from
wp-agentic-writer.php. - Move repeated context assembly out of
class-gutenberg-sidebar.php. - Make
/chat,/generate-plan,/revise-plan, and refinement endpoints use the builder.
Phase 3: Authoritative summaries
- Extend
WP_Agentic_Writer_Context_Servicewith:get_session_context( $session_id )update_session_context( $session_id, $patch )summarize_session_if_needed( $session_id, $post_id )
- Make
/summarize-contextpersist summaries towpaw_conversations.context. - Store plan versions and section summaries in context.
Phase 4: Streaming and usage polish
- Remove deprecated OpenRouter usage request parameters.
- Emit optional
providerandusageSSE events. - Extend cost tracking schema for cached tokens and BYOK upstream cost.
- Add UI display for monthly token usage.
Phase 5: Local backend repositioning
- Move local backend downloads and setup UI to advanced/legacy.
- Keep
WP_Agentic_Writer_Local_Backend_Providerfor existing users. - Disable automatic local backend recommendation in onboarding.
Acceptance Criteria
- A new article can be planned and written through OpenRouter streaming without any local bash/proxy setup.
- Existing conversation history persists through
wpaw_conversations. - Plan generation no longer sends full browser
chatHistorywhensessionIdis available. - Refining a block includes active block, relevant plan, compact decisions, and recent messages, not full raw history.
- Streaming responses show partial text in the editor and finish with usage metadata.
- Cost tracking records provider, model, action, session, tokens, and cost as it does today.
- New cache fields are recorded when OpenRouter returns them.
- Local Backend still works for users who already configured it, but it is no longer the default recommendation.
Implementation Risks
- Some existing frontend flows rely on
messagesas the full source of truth. Those flows need to passsessionIdreliably before backend context can become authoritative. wpaw_conversations.contextisLONGTEXT, so it can hold rich JSON, but large contexts should still be summarized to keep admin queries fast.- OpenRouter response caching is beta and should not be presented as durable memory.
- BYOK provider fallback behavior is configured in OpenRouter, so the WordPress UI can guide and detect symptoms but cannot fully enforce provider-key policy from this plugin alone.