Files
wp-agentic-writer/docs/architecture/OPENROUTER_BYOK_CONTEXT_STREAMING_SPEC.md
2026-06-06 00:29:10 +07:00

422 lines
16 KiB
Markdown

# OpenRouter BYOK Context and Streaming Spec
**Date:** 2026-06-05
**Status:** Proposed implementation direction
**Goal:** Replace local bash/proxy-first text generation with an OpenRouter BYOK-first API path while preserving article continuity and improving streamed editor UX.
## Decision
Use OpenRouter as the primary text transport for `chat`, `clarity`, `planning`, `writing`, and `refinement`, with the user's OpenRouter workspace configured for BYOK provider keys.
The plugin should continue to store conversation and article memory in WordPress. OpenRouter should be treated as a stateless model gateway: it streams model output, returns usage metadata, applies provider routing, and can cache identical responses. It does not own article continuity.
Local Backend should become optional or legacy. It is useful for experiments, but it should not be the recommended default because it asks users to run local scripts/proxy tooling and creates trust friction.
## Current Implementation Snapshot
The current plugin already has most of the foundation:
- `includes/interface-ai-provider.php` defines `chat()`, `chat_stream()`, `generate_image()`, `is_configured()`, `test_connection()`, and `supports_task_type()`.
- `includes/class-provider-manager.php` routes each task through configured providers and already prevents silent OpenRouter spend when fallback is disabled.
- `includes/class-openrouter-provider.php` supports non-streaming and streaming chat completions through OpenRouter.
- `includes/class-local-backend-provider.php` supports a local proxy at `/v1/messages`, including a cURL streaming parser and plain JSON fallback.
- `includes/class-conversation-manager.php` stores sessions in `{$wpdb->prefix}wpaw_conversations` with `messages` and `context` JSON fields.
- `includes/class-context-service.php` is already documented as the single source of truth for messages, `_wpaw_plan`, `_wpaw_post_config`, and legacy chat migration.
- `includes/class-gutenberg-sidebar.php` exposes the main REST routes: `/chat`, `/generate-plan`, `/revise-plan`, `/execute-article`, `/refine-block`, `/refine-from-chat`, `/summarize-context`, `/detect-intent`, `/writing-state/{post_id}`, and conversation routes.
- Cost tracking already records `post_id`, `session_id`, `model`, `provider`, `action`, input tokens, output tokens, cost, and status.
The main gap is not lack of streaming. The gap is that several routes still accept full `chatHistory` from the browser and inject it into prompts. That makes continuity depend on the browser payload and can re-send too much context.
## Product Positioning
Recommended provider settings:
```php
'task_providers' => array(
'chat' => 'openrouter',
'clarity' => 'openrouter',
'planning' => 'openrouter',
'writing' => 'openrouter',
'refinement' => 'openrouter',
'image' => 'openrouter',
),
'allow_openrouter_fallback' => false,
```
The UI copy should present this as:
- Connect OpenRouter API key.
- Configure BYOK provider keys inside OpenRouter.
- Stream directly into WordPress.
- Keep all article memory in WordPress.
- Local Backend is advanced or legacy.
OpenRouter BYOK details to reflect in docs:
- BYOK lets users route requests through their own provider keys while still using OpenRouter's API surface.
- BYOK provider keys are encrypted and used for requests routed through the selected provider.
- OpenRouter's BYOK fee is documented as 5 percent of the normal OpenRouter model/provider cost, waived for the first 1M BYOK requests per month.
- Users can prevent fallback to OpenRouter shared endpoints by enabling the provider key's "Always use for this provider" behavior in OpenRouter.
- OpenRouter usage data is returned in normal responses and in the last SSE message for streamed responses.
Sources:
- https://openrouter.ai/docs/guides/overview/auth/byok
- https://openrouter.ai/docs/cookbook/administration/usage-accounting
- https://openrouter.ai/docs/guides/features/response-caching/
## Continuity Ownership
Continuity is owned by WordPress, not OpenRouter.
Persisted state:
| State | Current storage | Keep or change |
| --- | --- | --- |
| Conversation messages | `wpaw_conversations.messages` | Keep |
| Session context | `wpaw_conversations.context` | Extend |
| Article plan | `_wpaw_plan` post meta | Keep |
| Post config | `_wpaw_post_config` post meta | Keep |
| Writing state | `_wpaw_writing_status`, `_wpaw_current_section`, `_wpaw_sections_written`, `_wpaw_resume_token` | Keep |
| Section to block mapping | `_wpaw_section_blocks` | Keep |
| Lightweight post memory | `_wpaw_memory` | Extend or migrate into `context` |
| Cost and token usage | `wpaw_cost_tracking` | Extend |
Recommended new session context shape:
```json
{
"working_summary": {
"text": "The article is about ...",
"updated_at": "2026-06-05T10:30:00+07:00",
"source_message_count": 14
},
"decisions": [
{
"type": "accept",
"target": "outline.section.2",
"summary": "Keep the practical checklist framing.",
"created_at": "2026-06-05T10:31:00+07:00"
}
],
"rejections": [
{
"target": "outline.section.4",
"summary": "Too generic; needs concrete WordPress examples.",
"created_at": "2026-06-05T10:32:00+07:00"
}
],
"research_notes": [
{
"source": "manual",
"title": "User supplied constraint",
"excerpt": "Avoid local bash instructions in the default UX.",
"tags": ["trust", "onboarding"]
}
],
"token_policy": {
"max_recent_messages": 6,
"max_summary_tokens": 600,
"max_research_snippets": 5
}
}
```
Store this in `wpaw_conversations.context` first. Avoid adding a new custom table until `context` becomes too large or needs relational querying.
## Context Builder
Add a dedicated builder instead of assembling continuity inside each REST handler.
New file:
```text
includes/class-context-builder.php
```
Primary API:
```php
class WP_Agentic_Writer_Context_Builder {
public function build_for_task( $task, $session_id, $post_id, $request_params = array() ) {
// Returns normalized prompt parts for chat, planning, writing, refinement, SEO.
}
}
```
Return shape:
```php
array(
'system_context' => 'Stable task and policy instructions.',
'working_context' => 'Compact summary, decisions, plan, selected post config.',
'active_content' => 'The exact section/block/article slice being edited.',
'research_context' => 'Only relevant excerpts.',
'audit' => array(
'included_recent_messages' => 6,
'included_research_items' => 3,
'estimated_input_tokens' => 2200,
'used_full_history' => false,
),
)
```
Context assembly rules:
- Always include the task system prompt and language instruction.
- Always include post config summary: audience, tone, language, article length, SEO fields, web search preference.
- Include `_wpaw_plan` for planning, writing, and outline refinement.
- Include only the active block or section for block refinement.
- Include recent raw messages only up to `max_recent_messages`.
- Include `working_summary` when message history is long.
- Include decisions and rejections as compact bullet points.
- Include post content only when the task requires whole-article awareness, such as final polish or article-wide refinement.
- Never trust browser-provided `chatHistory` as authoritative if `sessionId` is available.
## Endpoint Changes
### `/chat`
Current behavior:
- Receives `messages` from the browser.
- Prepends a system prompt.
- Streams or returns a chat response.
- Persists user and assistant messages.
Required change:
- Use browser `messages` only to identify the latest user message.
- Load authoritative session context from `WP_Agentic_Writer_Context_Service`.
- Build final messages through `WP_Agentic_Writer_Context_Builder`.
- Persist the raw user message and assistant response after completion.
### `/generate-plan`
Current behavior:
- Accepts `topic`, `context`, `chatHistory`, and other config.
- Serializes full `chatHistory` into the planning prompt.
- Stores `_wpaw_plan` and `_wpaw_memory`.
Required change:
- Keep `topic`, `context`, `clarificationAnswers`, and `post_config`.
- Replace full `chatHistory` injection with a context package from the builder.
- Save generated plan to `_wpaw_plan`.
- Update `wpaw_conversations.context.working_summary` after plan generation.
### `/revise-plan`
Required behavior:
- Include current `_wpaw_plan`.
- Include latest user instruction.
- Include accepted/rejected outline decisions.
- Ask for raw JSON plan only.
- Save previous plan as a version entry inside `wpaw_conversations.context.plan_versions` before overwriting `_wpaw_plan`.
### `/execute-article`
Current behavior:
- Writes sections from the plan.
- Streams section content and block events.
- Updates `_wpaw_plan` section statuses.
Required change:
- For each section, send the section brief, global article summary, relevant decisions, and relevant research.
- Do not send the full conversation for every section.
- After each section completes, update writing state and append a section summary to session context.
### `/refine-block` and `/refine-from-chat`
Required behavior:
- Send active block content, neighboring heading/section context, relevant plan entry, and latest instruction.
- Include compact working summary and decisions.
- Do not include the full draft unless the requested operation is article-wide.
### `/summarize-context`
Current behavior:
- Summarizes browser-provided `chatHistory`.
- Returns summary but does not appear to be the authoritative persistence mechanism.
Required change:
- Accept `sessionId`.
- Load authoritative session messages.
- Save the resulting summary into `wpaw_conversations.context.working_summary`.
- Return `summary`, `message_count`, `source_message_count`, `tokens_saved`, and provider metadata.
## Streaming Transport
OpenRouter streaming is already implemented in `WP_Agentic_Writer_OpenRouter_Provider::chat_stream()`.
Keep this transport shape:
```php
$body = array(
'model' => $model,
'messages' => $messages,
'stream' => true,
);
```
Modernize usage handling:
- OpenRouter now returns full usage metadata automatically.
- `usage: { include: true }` and `stream_options: { include_usage: true }` are documented as deprecated and no longer required.
- Keep parsing the final `usage` object from streamed chunks.
- Extend cost tracking to store cache metadata when available.
Recommended emitted SSE events:
```json
{"type":"provider","provider":"openrouter","model":"openai/gpt-4o-mini","byok_expected":true}
{"type":"conversational_stream","content":"partial accumulated text"}
{"type":"usage","input_tokens":1200,"output_tokens":360,"cached_tokens":0,"cost":0.0012}
{"type":"complete","session_id":"abc123","totalCost":0.0012}
```
Use the existing browser parsing path in `assets/js/sidebar.js` and add support for the optional `provider` and `usage` event types.
## Response Caching Policy
OpenRouter response caching should be used for deterministic, duplicate-safe operations only. It is not article memory.
Recommended use:
- `detect_intent`
- `summarize_context` retry
- connection test
- repeated model capability lookups if routed through completion calls
Avoid by default:
- article draft generation
- outline revision
- refinement requests
- image prompt generation
Provider implementation change:
```php
if ( ! empty( $options['openrouter_response_cache'] ) ) {
$headers[] = 'X-OpenRouter-Cache: true';
$headers[] = 'X-OpenRouter-Cache-TTL: ' . (int) ( $options['openrouter_cache_ttl'] ?? 300 );
}
```
Important limitations:
- Cache hits only happen for identical requests.
- Streaming and non-streaming requests are cached separately.
- Cache hit usage counters are zeroed.
- Response caching is beta and requires OpenRouter to store response data temporarily.
## Usage and Budget Tracking
Extend `wpaw_cost_tracking` with optional cache and upstream fields:
```sql
ALTER TABLE {$wpdb->prefix}wpaw_cost_tracking
ADD COLUMN cached_tokens int(11) DEFAULT 0 AFTER output_tokens,
ADD COLUMN cache_write_tokens int(11) DEFAULT 0 AFTER cached_tokens,
ADD COLUMN upstream_inference_cost decimal(10,6) DEFAULT NULL AFTER cost,
ADD COLUMN generation_id varchar(64) DEFAULT '' AFTER status;
```
Implementation notes:
- Put this behind a schema version bump, not plugin version alone.
- Keep existing `maybe_upgrade_table()` pattern in `WP_Agentic_Writer_Cost_Tracker`.
- Parse `usage.prompt_tokens_details.cached_tokens`.
- Parse `usage.prompt_tokens_details.cache_write_tokens`.
- Parse `usage.cost_details.upstream_inference_cost` for BYOK requests.
- Include a monthly token budget view alongside the existing cost view.
Budget metric examples:
```php
billable_input_tokens = max( 0, input_tokens - cached_tokens );
total_monthly_tokens = sum( input_tokens + output_tokens );
byok_free_request_counter = count( provider = 'openrouter' and status = 'success' );
```
Note: OpenRouter documents the BYOK waiver as first 1M BYOK requests per month, not first 1M tokens. Keep UI wording precise.
## Settings UI Changes
Update Settings V2:
- Rename default cloud path to `OpenRouter BYOK / API`.
- Keep API key storage in `wp_agentic_writer_settings.openrouter_api_key`.
- Add a help panel explaining that provider BYOK keys are configured in OpenRouter, not in WordPress.
- Add a "Prevent shared fallback" checklist item that links users to OpenRouter BYOK provider settings.
- Move Local Backend to an `Advanced` or `Legacy Local Backend` section.
- Make provider routing default all text tasks to `openrouter`.
- Keep image task on `openrouter`.
- Show a trust note: WordPress streams directly to OpenRouter; no local shell or CLI process is required.
Do not collect provider keys directly in WordPress unless there is a deliberate product decision to bypass OpenRouter BYOK management. The safer default is only storing the OpenRouter API key.
## Migration Plan
### Phase 1: Documentation and defaults
- Add this spec.
- Update user-facing Local Backend docs to say local backend is optional/advanced.
- Default new installs to OpenRouter for all tasks.
- Keep existing installs unchanged unless the user opts in.
### Phase 2: Context builder
- Add `includes/class-context-builder.php`.
- Load it from `wp-agentic-writer.php`.
- Move repeated context assembly out of `class-gutenberg-sidebar.php`.
- Make `/chat`, `/generate-plan`, `/revise-plan`, and refinement endpoints use the builder.
### Phase 3: Authoritative summaries
- Extend `WP_Agentic_Writer_Context_Service` with:
- `get_session_context( $session_id )`
- `update_session_context( $session_id, $patch )`
- `summarize_session_if_needed( $session_id, $post_id )`
- Make `/summarize-context` persist summaries to `wpaw_conversations.context`.
- Store plan versions and section summaries in context.
### Phase 4: Streaming and usage polish
- Remove deprecated OpenRouter usage request parameters.
- Emit optional `provider` and `usage` SSE events.
- Extend cost tracking schema for cached tokens and BYOK upstream cost.
- Add UI display for monthly token usage.
### Phase 5: Local backend repositioning
- Move local backend downloads and setup UI to advanced/legacy.
- Keep `WP_Agentic_Writer_Local_Backend_Provider` for existing users.
- Disable automatic local backend recommendation in onboarding.
## Acceptance Criteria
- A new article can be planned and written through OpenRouter streaming without any local bash/proxy setup.
- Existing conversation history persists through `wpaw_conversations`.
- Plan generation no longer sends full browser `chatHistory` when `sessionId` is available.
- Refining a block includes active block, relevant plan, compact decisions, and recent messages, not full raw history.
- Streaming responses show partial text in the editor and finish with usage metadata.
- Cost tracking records provider, model, action, session, tokens, and cost as it does today.
- New cache fields are recorded when OpenRouter returns them.
- Local Backend still works for users who already configured it, but it is no longer the default recommendation.
## Implementation Risks
- Some existing frontend flows rely on `messages` as the full source of truth. Those flows need to pass `sessionId` reliably before backend context can become authoritative.
- `wpaw_conversations.context` is `LONGTEXT`, so it can hold rich JSON, but large contexts should still be summarized to keep admin queries fast.
- OpenRouter response caching is beta and should not be presented as durable memory.
- BYOK provider fallback behavior is configured in OpenRouter, so the WordPress UI can guide and detect symptoms but cannot fully enforce provider-key policy from this plugin alone.