checkpoint: pre-audit baseline state
This commit is contained in:
421
docs/architecture/OPENROUTER_BYOK_CONTEXT_STREAMING_SPEC.md
Normal file
421
docs/architecture/OPENROUTER_BYOK_CONTEXT_STREAMING_SPEC.md
Normal file
@@ -0,0 +1,421 @@
|
||||
# OpenRouter BYOK Context and Streaming Spec
|
||||
|
||||
**Date:** 2026-06-05
|
||||
**Status:** Proposed implementation direction
|
||||
**Goal:** Replace local bash/proxy-first text generation with an OpenRouter BYOK-first API path while preserving article continuity and improving streamed editor UX.
|
||||
|
||||
## Decision
|
||||
|
||||
Use OpenRouter as the primary text transport for `chat`, `clarity`, `planning`, `writing`, and `refinement`, with the user's OpenRouter workspace configured for BYOK provider keys.
|
||||
|
||||
The plugin should continue to store conversation and article memory in WordPress. OpenRouter should be treated as a stateless model gateway: it streams model output, returns usage metadata, applies provider routing, and can cache identical responses. It does not own article continuity.
|
||||
|
||||
Local Backend should become optional or legacy. It is useful for experiments, but it should not be the recommended default because it asks users to run local scripts/proxy tooling and creates trust friction.
|
||||
|
||||
## Current Implementation Snapshot
|
||||
|
||||
The current plugin already has most of the foundation:
|
||||
|
||||
- `includes/interface-ai-provider.php` defines `chat()`, `chat_stream()`, `generate_image()`, `is_configured()`, `test_connection()`, and `supports_task_type()`.
|
||||
- `includes/class-provider-manager.php` routes each task through configured providers and already prevents silent OpenRouter spend when fallback is disabled.
|
||||
- `includes/class-openrouter-provider.php` supports non-streaming and streaming chat completions through OpenRouter.
|
||||
- `includes/class-local-backend-provider.php` supports a local proxy at `/v1/messages`, including a cURL streaming parser and plain JSON fallback.
|
||||
- `includes/class-conversation-manager.php` stores sessions in `{$wpdb->prefix}wpaw_conversations` with `messages` and `context` JSON fields.
|
||||
- `includes/class-context-service.php` is already documented as the single source of truth for messages, `_wpaw_plan`, `_wpaw_post_config`, and legacy chat migration.
|
||||
- `includes/class-gutenberg-sidebar.php` exposes the main REST routes: `/chat`, `/generate-plan`, `/revise-plan`, `/execute-article`, `/refine-block`, `/refine-from-chat`, `/summarize-context`, `/detect-intent`, `/writing-state/{post_id}`, and conversation routes.
|
||||
- Cost tracking already records `post_id`, `session_id`, `model`, `provider`, `action`, input tokens, output tokens, cost, and status.
|
||||
|
||||
The main gap is not lack of streaming. The gap is that several routes still accept full `chatHistory` from the browser and inject it into prompts. That makes continuity depend on the browser payload and can re-send too much context.
|
||||
|
||||
## Product Positioning
|
||||
|
||||
Recommended provider settings:
|
||||
|
||||
```php
|
||||
'task_providers' => array(
|
||||
'chat' => 'openrouter',
|
||||
'clarity' => 'openrouter',
|
||||
'planning' => 'openrouter',
|
||||
'writing' => 'openrouter',
|
||||
'refinement' => 'openrouter',
|
||||
'image' => 'openrouter',
|
||||
),
|
||||
'allow_openrouter_fallback' => false,
|
||||
```
|
||||
|
||||
The UI copy should present this as:
|
||||
|
||||
- Connect OpenRouter API key.
|
||||
- Configure BYOK provider keys inside OpenRouter.
|
||||
- Stream directly into WordPress.
|
||||
- Keep all article memory in WordPress.
|
||||
- Local Backend is advanced or legacy.
|
||||
|
||||
OpenRouter BYOK details to reflect in docs:
|
||||
|
||||
- BYOK lets users route requests through their own provider keys while still using OpenRouter's API surface.
|
||||
- BYOK provider keys are encrypted and used for requests routed through the selected provider.
|
||||
- OpenRouter's BYOK fee is documented as 5 percent of the normal OpenRouter model/provider cost, waived for the first 1M BYOK requests per month.
|
||||
- Users can prevent fallback to OpenRouter shared endpoints by enabling the provider key's "Always use for this provider" behavior in OpenRouter.
|
||||
- OpenRouter usage data is returned in normal responses and in the last SSE message for streamed responses.
|
||||
|
||||
Sources:
|
||||
|
||||
- https://openrouter.ai/docs/guides/overview/auth/byok
|
||||
- https://openrouter.ai/docs/cookbook/administration/usage-accounting
|
||||
- https://openrouter.ai/docs/guides/features/response-caching/
|
||||
|
||||
## Continuity Ownership
|
||||
|
||||
Continuity is owned by WordPress, not OpenRouter.
|
||||
|
||||
Persisted state:
|
||||
|
||||
| State | Current storage | Keep or change |
|
||||
| --- | --- | --- |
|
||||
| Conversation messages | `wpaw_conversations.messages` | Keep |
|
||||
| Session context | `wpaw_conversations.context` | Extend |
|
||||
| Article plan | `_wpaw_plan` post meta | Keep |
|
||||
| Post config | `_wpaw_post_config` post meta | Keep |
|
||||
| Writing state | `_wpaw_writing_status`, `_wpaw_current_section`, `_wpaw_sections_written`, `_wpaw_resume_token` | Keep |
|
||||
| Section to block mapping | `_wpaw_section_blocks` | Keep |
|
||||
| Lightweight post memory | `_wpaw_memory` | Extend or migrate into `context` |
|
||||
| Cost and token usage | `wpaw_cost_tracking` | Extend |
|
||||
|
||||
Recommended new session context shape:
|
||||
|
||||
```json
|
||||
{
|
||||
"working_summary": {
|
||||
"text": "The article is about ...",
|
||||
"updated_at": "2026-06-05T10:30:00+07:00",
|
||||
"source_message_count": 14
|
||||
},
|
||||
"decisions": [
|
||||
{
|
||||
"type": "accept",
|
||||
"target": "outline.section.2",
|
||||
"summary": "Keep the practical checklist framing.",
|
||||
"created_at": "2026-06-05T10:31:00+07:00"
|
||||
}
|
||||
],
|
||||
"rejections": [
|
||||
{
|
||||
"target": "outline.section.4",
|
||||
"summary": "Too generic; needs concrete WordPress examples.",
|
||||
"created_at": "2026-06-05T10:32:00+07:00"
|
||||
}
|
||||
],
|
||||
"research_notes": [
|
||||
{
|
||||
"source": "manual",
|
||||
"title": "User supplied constraint",
|
||||
"excerpt": "Avoid local bash instructions in the default UX.",
|
||||
"tags": ["trust", "onboarding"]
|
||||
}
|
||||
],
|
||||
"token_policy": {
|
||||
"max_recent_messages": 6,
|
||||
"max_summary_tokens": 600,
|
||||
"max_research_snippets": 5
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Store this in `wpaw_conversations.context` first. Avoid adding a new custom table until `context` becomes too large or needs relational querying.
|
||||
|
||||
## Context Builder
|
||||
|
||||
Add a dedicated builder instead of assembling continuity inside each REST handler.
|
||||
|
||||
New file:
|
||||
|
||||
```text
|
||||
includes/class-context-builder.php
|
||||
```
|
||||
|
||||
Primary API:
|
||||
|
||||
```php
|
||||
class WP_Agentic_Writer_Context_Builder {
|
||||
public function build_for_task( $task, $session_id, $post_id, $request_params = array() ) {
|
||||
// Returns normalized prompt parts for chat, planning, writing, refinement, SEO.
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Return shape:
|
||||
|
||||
```php
|
||||
array(
|
||||
'system_context' => 'Stable task and policy instructions.',
|
||||
'working_context' => 'Compact summary, decisions, plan, selected post config.',
|
||||
'active_content' => 'The exact section/block/article slice being edited.',
|
||||
'research_context' => 'Only relevant excerpts.',
|
||||
'audit' => array(
|
||||
'included_recent_messages' => 6,
|
||||
'included_research_items' => 3,
|
||||
'estimated_input_tokens' => 2200,
|
||||
'used_full_history' => false,
|
||||
),
|
||||
)
|
||||
```
|
||||
|
||||
Context assembly rules:
|
||||
|
||||
- Always include the task system prompt and language instruction.
|
||||
- Always include post config summary: audience, tone, language, article length, SEO fields, web search preference.
|
||||
- Include `_wpaw_plan` for planning, writing, and outline refinement.
|
||||
- Include only the active block or section for block refinement.
|
||||
- Include recent raw messages only up to `max_recent_messages`.
|
||||
- Include `working_summary` when message history is long.
|
||||
- Include decisions and rejections as compact bullet points.
|
||||
- Include post content only when the task requires whole-article awareness, such as final polish or article-wide refinement.
|
||||
- Never trust browser-provided `chatHistory` as authoritative if `sessionId` is available.
|
||||
|
||||
## Endpoint Changes
|
||||
|
||||
### `/chat`
|
||||
|
||||
Current behavior:
|
||||
|
||||
- Receives `messages` from the browser.
|
||||
- Prepends a system prompt.
|
||||
- Streams or returns a chat response.
|
||||
- Persists user and assistant messages.
|
||||
|
||||
Required change:
|
||||
|
||||
- Use browser `messages` only to identify the latest user message.
|
||||
- Load authoritative session context from `WP_Agentic_Writer_Context_Service`.
|
||||
- Build final messages through `WP_Agentic_Writer_Context_Builder`.
|
||||
- Persist the raw user message and assistant response after completion.
|
||||
|
||||
### `/generate-plan`
|
||||
|
||||
Current behavior:
|
||||
|
||||
- Accepts `topic`, `context`, `chatHistory`, and other config.
|
||||
- Serializes full `chatHistory` into the planning prompt.
|
||||
- Stores `_wpaw_plan` and `_wpaw_memory`.
|
||||
|
||||
Required change:
|
||||
|
||||
- Keep `topic`, `context`, `clarificationAnswers`, and `post_config`.
|
||||
- Replace full `chatHistory` injection with a context package from the builder.
|
||||
- Save generated plan to `_wpaw_plan`.
|
||||
- Update `wpaw_conversations.context.working_summary` after plan generation.
|
||||
|
||||
### `/revise-plan`
|
||||
|
||||
Required behavior:
|
||||
|
||||
- Include current `_wpaw_plan`.
|
||||
- Include latest user instruction.
|
||||
- Include accepted/rejected outline decisions.
|
||||
- Ask for raw JSON plan only.
|
||||
- Save previous plan as a version entry inside `wpaw_conversations.context.plan_versions` before overwriting `_wpaw_plan`.
|
||||
|
||||
### `/execute-article`
|
||||
|
||||
Current behavior:
|
||||
|
||||
- Writes sections from the plan.
|
||||
- Streams section content and block events.
|
||||
- Updates `_wpaw_plan` section statuses.
|
||||
|
||||
Required change:
|
||||
|
||||
- For each section, send the section brief, global article summary, relevant decisions, and relevant research.
|
||||
- Do not send the full conversation for every section.
|
||||
- After each section completes, update writing state and append a section summary to session context.
|
||||
|
||||
### `/refine-block` and `/refine-from-chat`
|
||||
|
||||
Required behavior:
|
||||
|
||||
- Send active block content, neighboring heading/section context, relevant plan entry, and latest instruction.
|
||||
- Include compact working summary and decisions.
|
||||
- Do not include the full draft unless the requested operation is article-wide.
|
||||
|
||||
### `/summarize-context`
|
||||
|
||||
Current behavior:
|
||||
|
||||
- Summarizes browser-provided `chatHistory`.
|
||||
- Returns summary but does not appear to be the authoritative persistence mechanism.
|
||||
|
||||
Required change:
|
||||
|
||||
- Accept `sessionId`.
|
||||
- Load authoritative session messages.
|
||||
- Save the resulting summary into `wpaw_conversations.context.working_summary`.
|
||||
- Return `summary`, `message_count`, `source_message_count`, `tokens_saved`, and provider metadata.
|
||||
|
||||
## Streaming Transport
|
||||
|
||||
OpenRouter streaming is already implemented in `WP_Agentic_Writer_OpenRouter_Provider::chat_stream()`.
|
||||
|
||||
Keep this transport shape:
|
||||
|
||||
```php
|
||||
$body = array(
|
||||
'model' => $model,
|
||||
'messages' => $messages,
|
||||
'stream' => true,
|
||||
);
|
||||
```
|
||||
|
||||
Modernize usage handling:
|
||||
|
||||
- OpenRouter now returns full usage metadata automatically.
|
||||
- `usage: { include: true }` and `stream_options: { include_usage: true }` are documented as deprecated and no longer required.
|
||||
- Keep parsing the final `usage` object from streamed chunks.
|
||||
- Extend cost tracking to store cache metadata when available.
|
||||
|
||||
Recommended emitted SSE events:
|
||||
|
||||
```json
|
||||
{"type":"provider","provider":"openrouter","model":"openai/gpt-4o-mini","byok_expected":true}
|
||||
{"type":"conversational_stream","content":"partial accumulated text"}
|
||||
{"type":"usage","input_tokens":1200,"output_tokens":360,"cached_tokens":0,"cost":0.0012}
|
||||
{"type":"complete","session_id":"abc123","totalCost":0.0012}
|
||||
```
|
||||
|
||||
Use the existing browser parsing path in `assets/js/sidebar.js` and add support for the optional `provider` and `usage` event types.
|
||||
|
||||
## Response Caching Policy
|
||||
|
||||
OpenRouter response caching should be used for deterministic, duplicate-safe operations only. It is not article memory.
|
||||
|
||||
Recommended use:
|
||||
|
||||
- `detect_intent`
|
||||
- `summarize_context` retry
|
||||
- connection test
|
||||
- repeated model capability lookups if routed through completion calls
|
||||
|
||||
Avoid by default:
|
||||
|
||||
- article draft generation
|
||||
- outline revision
|
||||
- refinement requests
|
||||
- image prompt generation
|
||||
|
||||
Provider implementation change:
|
||||
|
||||
```php
|
||||
if ( ! empty( $options['openrouter_response_cache'] ) ) {
|
||||
$headers[] = 'X-OpenRouter-Cache: true';
|
||||
$headers[] = 'X-OpenRouter-Cache-TTL: ' . (int) ( $options['openrouter_cache_ttl'] ?? 300 );
|
||||
}
|
||||
```
|
||||
|
||||
Important limitations:
|
||||
|
||||
- Cache hits only happen for identical requests.
|
||||
- Streaming and non-streaming requests are cached separately.
|
||||
- Cache hit usage counters are zeroed.
|
||||
- Response caching is beta and requires OpenRouter to store response data temporarily.
|
||||
|
||||
## Usage and Budget Tracking
|
||||
|
||||
Extend `wpaw_cost_tracking` with optional cache and upstream fields:
|
||||
|
||||
```sql
|
||||
ALTER TABLE {$wpdb->prefix}wpaw_cost_tracking
|
||||
ADD COLUMN cached_tokens int(11) DEFAULT 0 AFTER output_tokens,
|
||||
ADD COLUMN cache_write_tokens int(11) DEFAULT 0 AFTER cached_tokens,
|
||||
ADD COLUMN upstream_inference_cost decimal(10,6) DEFAULT NULL AFTER cost,
|
||||
ADD COLUMN generation_id varchar(64) DEFAULT '' AFTER status;
|
||||
```
|
||||
|
||||
Implementation notes:
|
||||
|
||||
- Put this behind a schema version bump, not plugin version alone.
|
||||
- Keep existing `maybe_upgrade_table()` pattern in `WP_Agentic_Writer_Cost_Tracker`.
|
||||
- Parse `usage.prompt_tokens_details.cached_tokens`.
|
||||
- Parse `usage.prompt_tokens_details.cache_write_tokens`.
|
||||
- Parse `usage.cost_details.upstream_inference_cost` for BYOK requests.
|
||||
- Include a monthly token budget view alongside the existing cost view.
|
||||
|
||||
Budget metric examples:
|
||||
|
||||
```php
|
||||
billable_input_tokens = max( 0, input_tokens - cached_tokens );
|
||||
total_monthly_tokens = sum( input_tokens + output_tokens );
|
||||
byok_free_request_counter = count( provider = 'openrouter' and status = 'success' );
|
||||
```
|
||||
|
||||
Note: OpenRouter documents the BYOK waiver as first 1M BYOK requests per month, not first 1M tokens. Keep UI wording precise.
|
||||
|
||||
## Settings UI Changes
|
||||
|
||||
Update Settings V2:
|
||||
|
||||
- Rename default cloud path to `OpenRouter BYOK / API`.
|
||||
- Keep API key storage in `wp_agentic_writer_settings.openrouter_api_key`.
|
||||
- Add a help panel explaining that provider BYOK keys are configured in OpenRouter, not in WordPress.
|
||||
- Add a "Prevent shared fallback" checklist item that links users to OpenRouter BYOK provider settings.
|
||||
- Move Local Backend to an `Advanced` or `Legacy Local Backend` section.
|
||||
- Make provider routing default all text tasks to `openrouter`.
|
||||
- Keep image task on `openrouter`.
|
||||
- Show a trust note: WordPress streams directly to OpenRouter; no local shell or CLI process is required.
|
||||
|
||||
Do not collect provider keys directly in WordPress unless there is a deliberate product decision to bypass OpenRouter BYOK management. The safer default is only storing the OpenRouter API key.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Phase 1: Documentation and defaults
|
||||
|
||||
- Add this spec.
|
||||
- Update user-facing Local Backend docs to say local backend is optional/advanced.
|
||||
- Default new installs to OpenRouter for all tasks.
|
||||
- Keep existing installs unchanged unless the user opts in.
|
||||
|
||||
### Phase 2: Context builder
|
||||
|
||||
- Add `includes/class-context-builder.php`.
|
||||
- Load it from `wp-agentic-writer.php`.
|
||||
- Move repeated context assembly out of `class-gutenberg-sidebar.php`.
|
||||
- Make `/chat`, `/generate-plan`, `/revise-plan`, and refinement endpoints use the builder.
|
||||
|
||||
### Phase 3: Authoritative summaries
|
||||
|
||||
- Extend `WP_Agentic_Writer_Context_Service` with:
|
||||
- `get_session_context( $session_id )`
|
||||
- `update_session_context( $session_id, $patch )`
|
||||
- `summarize_session_if_needed( $session_id, $post_id )`
|
||||
- Make `/summarize-context` persist summaries to `wpaw_conversations.context`.
|
||||
- Store plan versions and section summaries in context.
|
||||
|
||||
### Phase 4: Streaming and usage polish
|
||||
|
||||
- Remove deprecated OpenRouter usage request parameters.
|
||||
- Emit optional `provider` and `usage` SSE events.
|
||||
- Extend cost tracking schema for cached tokens and BYOK upstream cost.
|
||||
- Add UI display for monthly token usage.
|
||||
|
||||
### Phase 5: Local backend repositioning
|
||||
|
||||
- Move local backend downloads and setup UI to advanced/legacy.
|
||||
- Keep `WP_Agentic_Writer_Local_Backend_Provider` for existing users.
|
||||
- Disable automatic local backend recommendation in onboarding.
|
||||
|
||||
## Acceptance Criteria
|
||||
|
||||
- A new article can be planned and written through OpenRouter streaming without any local bash/proxy setup.
|
||||
- Existing conversation history persists through `wpaw_conversations`.
|
||||
- Plan generation no longer sends full browser `chatHistory` when `sessionId` is available.
|
||||
- Refining a block includes active block, relevant plan, compact decisions, and recent messages, not full raw history.
|
||||
- Streaming responses show partial text in the editor and finish with usage metadata.
|
||||
- Cost tracking records provider, model, action, session, tokens, and cost as it does today.
|
||||
- New cache fields are recorded when OpenRouter returns them.
|
||||
- Local Backend still works for users who already configured it, but it is no longer the default recommendation.
|
||||
|
||||
## Implementation Risks
|
||||
|
||||
- Some existing frontend flows rely on `messages` as the full source of truth. Those flows need to pass `sessionId` reliably before backend context can become authoritative.
|
||||
- `wpaw_conversations.context` is `LONGTEXT`, so it can hold rich JSON, but large contexts should still be summarized to keep admin queries fast.
|
||||
- OpenRouter response caching is beta and should not be presented as durable memory.
|
||||
- BYOK provider fallback behavior is configured in OpenRouter, so the WordPress UI can guide and detect symptoms but cannot fully enforce provider-key policy from this plugin alone.
|
||||
Reference in New Issue
Block a user