422 lines
16 KiB
Markdown
422 lines
16 KiB
Markdown
# OpenRouter BYOK Context and Streaming Spec
|
|
|
|
**Date:** 2026-06-05
|
|
**Status:** Proposed implementation direction
|
|
**Goal:** Replace local bash/proxy-first text generation with an OpenRouter BYOK-first API path while preserving article continuity and improving streamed editor UX.
|
|
|
|
## Decision
|
|
|
|
Use OpenRouter as the primary text transport for `chat`, `clarity`, `planning`, `writing`, and `refinement`, with the user's OpenRouter workspace configured for BYOK provider keys.
|
|
|
|
The plugin should continue to store conversation and article memory in WordPress. OpenRouter should be treated as a stateless model gateway: it streams model output, returns usage metadata, applies provider routing, and can cache identical responses. It does not own article continuity.
|
|
|
|
Local Backend should become optional or legacy. It is useful for experiments, but it should not be the recommended default because it asks users to run local scripts/proxy tooling and creates trust friction.
|
|
|
|
## Current Implementation Snapshot
|
|
|
|
The current plugin already has most of the foundation:
|
|
|
|
- `includes/interface-ai-provider.php` defines `chat()`, `chat_stream()`, `generate_image()`, `is_configured()`, `test_connection()`, and `supports_task_type()`.
|
|
- `includes/class-provider-manager.php` routes each task through configured providers and already prevents silent OpenRouter spend when fallback is disabled.
|
|
- `includes/class-openrouter-provider.php` supports non-streaming and streaming chat completions through OpenRouter.
|
|
- `includes/class-local-backend-provider.php` supports a local proxy at `/v1/messages`, including a cURL streaming parser and plain JSON fallback.
|
|
- `includes/class-conversation-manager.php` stores sessions in `{$wpdb->prefix}wpaw_conversations` with `messages` and `context` JSON fields.
|
|
- `includes/class-context-service.php` is already documented as the single source of truth for messages, `_wpaw_plan`, `_wpaw_post_config`, and legacy chat migration.
|
|
- `includes/class-gutenberg-sidebar.php` exposes the main REST routes: `/chat`, `/generate-plan`, `/revise-plan`, `/execute-article`, `/refine-block`, `/refine-from-chat`, `/summarize-context`, `/detect-intent`, `/writing-state/{post_id}`, and conversation routes.
|
|
- Cost tracking already records `post_id`, `session_id`, `model`, `provider`, `action`, input tokens, output tokens, cost, and status.
|
|
|
|
The main gap is not lack of streaming. The gap is that several routes still accept full `chatHistory` from the browser and inject it into prompts. That makes continuity depend on the browser payload and can re-send too much context.
|
|
|
|
## Product Positioning
|
|
|
|
Recommended provider settings:
|
|
|
|
```php
|
|
'task_providers' => array(
|
|
'chat' => 'openrouter',
|
|
'clarity' => 'openrouter',
|
|
'planning' => 'openrouter',
|
|
'writing' => 'openrouter',
|
|
'refinement' => 'openrouter',
|
|
'image' => 'openrouter',
|
|
),
|
|
'allow_openrouter_fallback' => false,
|
|
```
|
|
|
|
The UI copy should present this as:
|
|
|
|
- Connect OpenRouter API key.
|
|
- Configure BYOK provider keys inside OpenRouter.
|
|
- Stream directly into WordPress.
|
|
- Keep all article memory in WordPress.
|
|
- Local Backend is advanced or legacy.
|
|
|
|
OpenRouter BYOK details to reflect in docs:
|
|
|
|
- BYOK lets users route requests through their own provider keys while still using OpenRouter's API surface.
|
|
- BYOK provider keys are encrypted and used for requests routed through the selected provider.
|
|
- OpenRouter's BYOK fee is documented as 5 percent of the normal OpenRouter model/provider cost, waived for the first 1M BYOK requests per month.
|
|
- Users can prevent fallback to OpenRouter shared endpoints by enabling the provider key's "Always use for this provider" behavior in OpenRouter.
|
|
- OpenRouter usage data is returned in normal responses and in the last SSE message for streamed responses.
|
|
|
|
Sources:
|
|
|
|
- https://openrouter.ai/docs/guides/overview/auth/byok
|
|
- https://openrouter.ai/docs/cookbook/administration/usage-accounting
|
|
- https://openrouter.ai/docs/guides/features/response-caching/
|
|
|
|
## Continuity Ownership
|
|
|
|
Continuity is owned by WordPress, not OpenRouter.
|
|
|
|
Persisted state:
|
|
|
|
| State | Current storage | Keep or change |
|
|
| --- | --- | --- |
|
|
| Conversation messages | `wpaw_conversations.messages` | Keep |
|
|
| Session context | `wpaw_conversations.context` | Extend |
|
|
| Article plan | `_wpaw_plan` post meta | Keep |
|
|
| Post config | `_wpaw_post_config` post meta | Keep |
|
|
| Writing state | `_wpaw_writing_status`, `_wpaw_current_section`, `_wpaw_sections_written`, `_wpaw_resume_token` | Keep |
|
|
| Section to block mapping | `_wpaw_section_blocks` | Keep |
|
|
| Lightweight post memory | `_wpaw_memory` | Extend or migrate into `context` |
|
|
| Cost and token usage | `wpaw_cost_tracking` | Extend |
|
|
|
|
Recommended new session context shape:
|
|
|
|
```json
|
|
{
|
|
"working_summary": {
|
|
"text": "The article is about ...",
|
|
"updated_at": "2026-06-05T10:30:00+07:00",
|
|
"source_message_count": 14
|
|
},
|
|
"decisions": [
|
|
{
|
|
"type": "accept",
|
|
"target": "outline.section.2",
|
|
"summary": "Keep the practical checklist framing.",
|
|
"created_at": "2026-06-05T10:31:00+07:00"
|
|
}
|
|
],
|
|
"rejections": [
|
|
{
|
|
"target": "outline.section.4",
|
|
"summary": "Too generic; needs concrete WordPress examples.",
|
|
"created_at": "2026-06-05T10:32:00+07:00"
|
|
}
|
|
],
|
|
"research_notes": [
|
|
{
|
|
"source": "manual",
|
|
"title": "User supplied constraint",
|
|
"excerpt": "Avoid local bash instructions in the default UX.",
|
|
"tags": ["trust", "onboarding"]
|
|
}
|
|
],
|
|
"token_policy": {
|
|
"max_recent_messages": 6,
|
|
"max_summary_tokens": 600,
|
|
"max_research_snippets": 5
|
|
}
|
|
}
|
|
```
|
|
|
|
Store this in `wpaw_conversations.context` first. Avoid adding a new custom table until `context` becomes too large or needs relational querying.
|
|
|
|
## Context Builder
|
|
|
|
Add a dedicated builder instead of assembling continuity inside each REST handler.
|
|
|
|
New file:
|
|
|
|
```text
|
|
includes/class-context-builder.php
|
|
```
|
|
|
|
Primary API:
|
|
|
|
```php
|
|
class WP_Agentic_Writer_Context_Builder {
|
|
public function build_for_task( $task, $session_id, $post_id, $request_params = array() ) {
|
|
// Returns normalized prompt parts for chat, planning, writing, refinement, SEO.
|
|
}
|
|
}
|
|
```
|
|
|
|
Return shape:
|
|
|
|
```php
|
|
array(
|
|
'system_context' => 'Stable task and policy instructions.',
|
|
'working_context' => 'Compact summary, decisions, plan, selected post config.',
|
|
'active_content' => 'The exact section/block/article slice being edited.',
|
|
'research_context' => 'Only relevant excerpts.',
|
|
'audit' => array(
|
|
'included_recent_messages' => 6,
|
|
'included_research_items' => 3,
|
|
'estimated_input_tokens' => 2200,
|
|
'used_full_history' => false,
|
|
),
|
|
)
|
|
```
|
|
|
|
Context assembly rules:
|
|
|
|
- Always include the task system prompt and language instruction.
|
|
- Always include post config summary: audience, tone, language, article length, SEO fields, web search preference.
|
|
- Include `_wpaw_plan` for planning, writing, and outline refinement.
|
|
- Include only the active block or section for block refinement.
|
|
- Include recent raw messages only up to `max_recent_messages`.
|
|
- Include `working_summary` when message history is long.
|
|
- Include decisions and rejections as compact bullet points.
|
|
- Include post content only when the task requires whole-article awareness, such as final polish or article-wide refinement.
|
|
- Never trust browser-provided `chatHistory` as authoritative if `sessionId` is available.
|
|
|
|
## Endpoint Changes
|
|
|
|
### `/chat`
|
|
|
|
Current behavior:
|
|
|
|
- Receives `messages` from the browser.
|
|
- Prepends a system prompt.
|
|
- Streams or returns a chat response.
|
|
- Persists user and assistant messages.
|
|
|
|
Required change:
|
|
|
|
- Use browser `messages` only to identify the latest user message.
|
|
- Load authoritative session context from `WP_Agentic_Writer_Context_Service`.
|
|
- Build final messages through `WP_Agentic_Writer_Context_Builder`.
|
|
- Persist the raw user message and assistant response after completion.
|
|
|
|
### `/generate-plan`
|
|
|
|
Current behavior:
|
|
|
|
- Accepts `topic`, `context`, `chatHistory`, and other config.
|
|
- Serializes full `chatHistory` into the planning prompt.
|
|
- Stores `_wpaw_plan` and `_wpaw_memory`.
|
|
|
|
Required change:
|
|
|
|
- Keep `topic`, `context`, `clarificationAnswers`, and `post_config`.
|
|
- Replace full `chatHistory` injection with a context package from the builder.
|
|
- Save generated plan to `_wpaw_plan`.
|
|
- Update `wpaw_conversations.context.working_summary` after plan generation.
|
|
|
|
### `/revise-plan`
|
|
|
|
Required behavior:
|
|
|
|
- Include current `_wpaw_plan`.
|
|
- Include latest user instruction.
|
|
- Include accepted/rejected outline decisions.
|
|
- Ask for raw JSON plan only.
|
|
- Save previous plan as a version entry inside `wpaw_conversations.context.plan_versions` before overwriting `_wpaw_plan`.
|
|
|
|
### `/execute-article`
|
|
|
|
Current behavior:
|
|
|
|
- Writes sections from the plan.
|
|
- Streams section content and block events.
|
|
- Updates `_wpaw_plan` section statuses.
|
|
|
|
Required change:
|
|
|
|
- For each section, send the section brief, global article summary, relevant decisions, and relevant research.
|
|
- Do not send the full conversation for every section.
|
|
- After each section completes, update writing state and append a section summary to session context.
|
|
|
|
### `/refine-block` and `/refine-from-chat`
|
|
|
|
Required behavior:
|
|
|
|
- Send active block content, neighboring heading/section context, relevant plan entry, and latest instruction.
|
|
- Include compact working summary and decisions.
|
|
- Do not include the full draft unless the requested operation is article-wide.
|
|
|
|
### `/summarize-context`
|
|
|
|
Current behavior:
|
|
|
|
- Summarizes browser-provided `chatHistory`.
|
|
- Returns summary but does not appear to be the authoritative persistence mechanism.
|
|
|
|
Required change:
|
|
|
|
- Accept `sessionId`.
|
|
- Load authoritative session messages.
|
|
- Save the resulting summary into `wpaw_conversations.context.working_summary`.
|
|
- Return `summary`, `message_count`, `source_message_count`, `tokens_saved`, and provider metadata.
|
|
|
|
## Streaming Transport
|
|
|
|
OpenRouter streaming is already implemented in `WP_Agentic_Writer_OpenRouter_Provider::chat_stream()`.
|
|
|
|
Keep this transport shape:
|
|
|
|
```php
|
|
$body = array(
|
|
'model' => $model,
|
|
'messages' => $messages,
|
|
'stream' => true,
|
|
);
|
|
```
|
|
|
|
Modernize usage handling:
|
|
|
|
- OpenRouter now returns full usage metadata automatically.
|
|
- `usage: { include: true }` and `stream_options: { include_usage: true }` are documented as deprecated and no longer required.
|
|
- Keep parsing the final `usage` object from streamed chunks.
|
|
- Extend cost tracking to store cache metadata when available.
|
|
|
|
Recommended emitted SSE events:
|
|
|
|
```json
|
|
{"type":"provider","provider":"openrouter","model":"openai/gpt-4o-mini","byok_expected":true}
|
|
{"type":"conversational_stream","content":"partial accumulated text"}
|
|
{"type":"usage","input_tokens":1200,"output_tokens":360,"cached_tokens":0,"cost":0.0012}
|
|
{"type":"complete","session_id":"abc123","totalCost":0.0012}
|
|
```
|
|
|
|
Use the existing browser parsing path in `assets/js/sidebar.js` and add support for the optional `provider` and `usage` event types.
|
|
|
|
## Response Caching Policy
|
|
|
|
OpenRouter response caching should be used for deterministic, duplicate-safe operations only. It is not article memory.
|
|
|
|
Recommended use:
|
|
|
|
- `detect_intent`
|
|
- `summarize_context` retry
|
|
- connection test
|
|
- repeated model capability lookups if routed through completion calls
|
|
|
|
Avoid by default:
|
|
|
|
- article draft generation
|
|
- outline revision
|
|
- refinement requests
|
|
- image prompt generation
|
|
|
|
Provider implementation change:
|
|
|
|
```php
|
|
if ( ! empty( $options['openrouter_response_cache'] ) ) {
|
|
$headers[] = 'X-OpenRouter-Cache: true';
|
|
$headers[] = 'X-OpenRouter-Cache-TTL: ' . (int) ( $options['openrouter_cache_ttl'] ?? 300 );
|
|
}
|
|
```
|
|
|
|
Important limitations:
|
|
|
|
- Cache hits only happen for identical requests.
|
|
- Streaming and non-streaming requests are cached separately.
|
|
- Cache hit usage counters are zeroed.
|
|
- Response caching is beta and requires OpenRouter to store response data temporarily.
|
|
|
|
## Usage and Budget Tracking
|
|
|
|
Extend `wpaw_cost_tracking` with optional cache and upstream fields:
|
|
|
|
```sql
|
|
ALTER TABLE {$wpdb->prefix}wpaw_cost_tracking
|
|
ADD COLUMN cached_tokens int(11) DEFAULT 0 AFTER output_tokens,
|
|
ADD COLUMN cache_write_tokens int(11) DEFAULT 0 AFTER cached_tokens,
|
|
ADD COLUMN upstream_inference_cost decimal(10,6) DEFAULT NULL AFTER cost,
|
|
ADD COLUMN generation_id varchar(64) DEFAULT '' AFTER status;
|
|
```
|
|
|
|
Implementation notes:
|
|
|
|
- Put this behind a schema version bump, not plugin version alone.
|
|
- Keep existing `maybe_upgrade_table()` pattern in `WP_Agentic_Writer_Cost_Tracker`.
|
|
- Parse `usage.prompt_tokens_details.cached_tokens`.
|
|
- Parse `usage.prompt_tokens_details.cache_write_tokens`.
|
|
- Parse `usage.cost_details.upstream_inference_cost` for BYOK requests.
|
|
- Include a monthly token budget view alongside the existing cost view.
|
|
|
|
Budget metric examples:
|
|
|
|
```php
|
|
billable_input_tokens = max( 0, input_tokens - cached_tokens );
|
|
total_monthly_tokens = sum( input_tokens + output_tokens );
|
|
byok_free_request_counter = count( provider = 'openrouter' and status = 'success' );
|
|
```
|
|
|
|
Note: OpenRouter documents the BYOK waiver as first 1M BYOK requests per month, not first 1M tokens. Keep UI wording precise.
|
|
|
|
## Settings UI Changes
|
|
|
|
Update Settings V2:
|
|
|
|
- Rename default cloud path to `OpenRouter BYOK / API`.
|
|
- Keep API key storage in `wp_agentic_writer_settings.openrouter_api_key`.
|
|
- Add a help panel explaining that provider BYOK keys are configured in OpenRouter, not in WordPress.
|
|
- Add a "Prevent shared fallback" checklist item that links users to OpenRouter BYOK provider settings.
|
|
- Move Local Backend to an `Advanced` or `Legacy Local Backend` section.
|
|
- Make provider routing default all text tasks to `openrouter`.
|
|
- Keep image task on `openrouter`.
|
|
- Show a trust note: WordPress streams directly to OpenRouter; no local shell or CLI process is required.
|
|
|
|
Do not collect provider keys directly in WordPress unless there is a deliberate product decision to bypass OpenRouter BYOK management. The safer default is only storing the OpenRouter API key.
|
|
|
|
## Migration Plan
|
|
|
|
### Phase 1: Documentation and defaults
|
|
|
|
- Add this spec.
|
|
- Update user-facing Local Backend docs to say local backend is optional/advanced.
|
|
- Default new installs to OpenRouter for all tasks.
|
|
- Keep existing installs unchanged unless the user opts in.
|
|
|
|
### Phase 2: Context builder
|
|
|
|
- Add `includes/class-context-builder.php`.
|
|
- Load it from `wp-agentic-writer.php`.
|
|
- Move repeated context assembly out of `class-gutenberg-sidebar.php`.
|
|
- Make `/chat`, `/generate-plan`, `/revise-plan`, and refinement endpoints use the builder.
|
|
|
|
### Phase 3: Authoritative summaries
|
|
|
|
- Extend `WP_Agentic_Writer_Context_Service` with:
|
|
- `get_session_context( $session_id )`
|
|
- `update_session_context( $session_id, $patch )`
|
|
- `summarize_session_if_needed( $session_id, $post_id )`
|
|
- Make `/summarize-context` persist summaries to `wpaw_conversations.context`.
|
|
- Store plan versions and section summaries in context.
|
|
|
|
### Phase 4: Streaming and usage polish
|
|
|
|
- Remove deprecated OpenRouter usage request parameters.
|
|
- Emit optional `provider` and `usage` SSE events.
|
|
- Extend cost tracking schema for cached tokens and BYOK upstream cost.
|
|
- Add UI display for monthly token usage.
|
|
|
|
### Phase 5: Local backend repositioning
|
|
|
|
- Move local backend downloads and setup UI to advanced/legacy.
|
|
- Keep `WP_Agentic_Writer_Local_Backend_Provider` for existing users.
|
|
- Disable automatic local backend recommendation in onboarding.
|
|
|
|
## Acceptance Criteria
|
|
|
|
- A new article can be planned and written through OpenRouter streaming without any local bash/proxy setup.
|
|
- Existing conversation history persists through `wpaw_conversations`.
|
|
- Plan generation no longer sends full browser `chatHistory` when `sessionId` is available.
|
|
- Refining a block includes active block, relevant plan, compact decisions, and recent messages, not full raw history.
|
|
- Streaming responses show partial text in the editor and finish with usage metadata.
|
|
- Cost tracking records provider, model, action, session, tokens, and cost as it does today.
|
|
- New cache fields are recorded when OpenRouter returns them.
|
|
- Local Backend still works for users who already configured it, but it is no longer the default recommendation.
|
|
|
|
## Implementation Risks
|
|
|
|
- Some existing frontend flows rely on `messages` as the full source of truth. Those flows need to pass `sessionId` reliably before backend context can become authoritative.
|
|
- `wpaw_conversations.context` is `LONGTEXT`, so it can hold rich JSON, but large contexts should still be summarized to keep admin queries fast.
|
|
- OpenRouter response caching is beta and should not be presented as durable memory.
|
|
- BYOK provider fallback behavior is configured in OpenRouter, so the WordPress UI can guide and detect symptoms but cannot fully enforce provider-key policy from this plugin alone.
|