Sliding-window context management
Subtitle: A technical whitepaper on bounded live context, preserved transcript history, and recall-backed continuity
Status: Technical whitepaper draft
Date: 2026-06-21
Author: Clint Bodungen
Project context: MindStone Agent, MindStone-Agent, MS4CC, MS4PI
Companion to: Layered Continuity Architecture for Persistent LLM Agents
Executive summary
Section titled “Executive summary”Long-running LLM agents eventually hit context pressure. Conversation history grows, tool results accumulate, project state expands, and the model’s prompt window remains finite. A system has to decide what stays in the live prompt and what leaves.
The common answer is compaction: summarize older context and continue from the summary. Compaction can be useful, especially when the underlying harness controls the session and gives the agent no better option. But compaction is lossy. It can flatten uncertainty, omit causality, lose source detail, and allow a generated summary to start acting like history.
MindStone’s preferred answer, when it owns the session and prompt assembly path, is sliding-window context management.
A sliding window treats the live prompt as a bounded working set, not as the historical record. Older entries may leave the active prompt when context pressure rises, but the authoritative transcript remains append-only and complete. Removed material can be indexed, recalled, audited, replayed, or later consolidated into durable memory.
The core principle is simple:
The transcript is history.The prompt is a working set.Sliding-window context management is one layer in MindStone’s broader Layered Continuity Architecture. It does not replace memory, recall, handoff, or consolidation. It makes those systems safer by preventing prompt overflow without pretending that pruned context has been deleted from the agent’s operational history.
1. Problem statement
Section titled “1. Problem statement”LLM agents are stateless at the model level. Each inference is produced from the prompt supplied at that moment. If important context is not in the prompt, the model does not have direct access to it.
For short sessions, the solution is straightforward: include recent conversation. For long-running agents, this breaks down.
A persistent agent may need to preserve:
- standing identity and operating rules;
- user or organization preferences;
- recent conversation;
- tool calls and tool results;
- project files and source excerpts;
- retrieved memory;
- unresolved decisions;
- active task state;
- prior mistakes and corrections;
- safety constraints;
- handoff or recovery context.
All of that competes for the same prompt window.
When the prompt grows too large, the system needs a context-management policy. Without one, the agent eventually fails with context overflow or relies on ad hoc truncation.
The design question is not whether context must be removed from the prompt. It must. The question is what kind of removal preserves continuity.
2. Design principle: prompt removal is not historical deletion
Section titled “2. Design principle: prompt removal is not historical deletion”MindStone separates two concepts that are often collapsed:
- Authoritative transcript — the durable record of what happened.
- Live prompt window — the bounded context sent to the model for the next response.
The transcript should be append-only. It records user messages, assistant messages, tool activity, system events, source metadata, checkpoint markers, and other continuity-relevant events.
The live prompt is temporary. It is assembled from selected pieces of the transcript, standing context, structured memory, recall results, source excerpts, and task state.
Sliding-window pruning only changes the live prompt.
It must not:
- delete transcript entries;
- rewrite transcript history;
- silently replace history with a summary;
- split continuity by channel;
- promote a pruning artifact into durable memory;
- pretend old context never happened.
In operational terms:
Pruned from prompt does not mean forgotten.3. What sliding-window context management does
Section titled “3. What sliding-window context management does”Sliding-window context management keeps the model’s active prompt below a configured context budget by removing older eligible prompt entries when utilization crosses a threshold.
At a high level:
1. Build the candidate prompt.2. Estimate context utilization.3. If utilization is below the ceiling, do nothing.4. If utilization crosses the ceiling, prune older eligible entries.5. Continue pruning until utilization reaches the target floor.6. Preserve transcript history.7. Record what was pruned.8. Let recall bring older material back when relevant.The window “slides” because newer context remains active while older eligible context falls off the back of the live prompt.
The mechanism is intentionally simple. It does not try to interpret every message semantically at pruning time. It does not summarize. It does not destructively classify tool results. It applies a bounded working-set policy over an authoritative history.
4. Why not just compact?
Section titled “4. Why not just compact?”Compaction and sliding-window pruning both respond to context pressure, but they preserve continuity differently.
4.1 Compaction
Section titled “4.1 Compaction”Compaction compresses older context into a summary:
large prompt history→ model-generated summary→ smaller prompt containing summary + recent contextThis is useful when:
- the harness controls context and forces compaction;
- the agent cannot directly manage prompt assembly;
- an emergency continuity bridge is needed;
- a compact handoff is required before a context reset.
But compaction introduces summary risk.
A compaction summary can:
- omit user corrections;
- lose source details;
- flatten unresolved uncertainty;
- compress tool results too aggressively;
- misstate causal order;
- overemphasize what seemed important at summary time;
- underrepresent facts needed later;
- accumulate drift across repeated compactions.
The problem is not that summaries are useless. The problem is treating a summary as if it were the record.
4.2 Sliding window
Section titled “4.2 Sliding window”Sliding-window pruning does not summarize older context. It removes older eligible entries from the prompt while preserving the transcript and recall substrate.
append-only transcript remains intact→ live prompt exceeds budget→ oldest eligible prompt units leave the active window→ pruned material remains preserved and recallableThis avoids summary drift at the pruning boundary.
The tradeoff is that older context is no longer visible by default. The system must rely on recall, structured memory, source reads, or consolidation to bring back what matters.
That is why sliding-window context management is not a complete continuity system by itself. It must be paired with preserved transcripts and recall.
4.3 Practical comparison
Section titled “4.3 Practical comparison”| Capability | Compaction | Sliding window |
|---|---|---|
| Reduces live prompt size | Yes | Yes |
| Preserves full transcript | Only if implemented separately | Yes, by design |
| Replaces old context with model summary | Yes | No |
| Risk of summary drift | High | Low at pruning boundary |
| Requires good recall | Helpful | Essential |
| Best fit | Substrate fallback, emergency continuity, handoff bridge | Primary long-running context policy when prompt assembly is owned |
| Failure mode | Summary becomes false history | Relevant old context is not recalled |
MindStone uses both patterns, but for different roles.
Sliding window is the preferred primary context-management strategy when MindStone owns the session and prompt assembly.
Compaction is a fallback or substrate-adapter strategy when the underlying harness requires it.
5. Core architecture
Section titled “5. Core architecture”A MindStone-style sliding-window system has six main parts.
5.1 Authoritative transcript
Section titled “5.1 Authoritative transcript”The transcript is the source of truth. It is append-only and retained outside the prompt window.
It should include enough metadata to support replay, recall, audit, and consolidation:
entry idsession keytimestamprolesource/channelcontent or event typemetadataPrompt pruning must never remove entries from this transcript.
5.2 Prompt entries
Section titled “5.2 Prompt entries”The prompt builder converts transcript entries and other context sources into prompt entries.
Prompt entries may include:
system instructionsidentity/user contextrecent user messagesrecent assistant messagestool messagesretrieved memoryhandoff contextsource excerptsNot all transcript events become prompt entries. Some events are durable metadata only.
5.3 Budgeting
Section titled “5.3 Budgeting”The prompt builder estimates token usage before sending a request to the model.
A typical utilization calculation is:
utilization = estimatedPromptTokens / maxContextTokensThe model context window may come from:
- explicit route configuration;
- provider model metadata;
- model registry defaults;
- a conservative fallback.
If exact tokenization is unavailable, a conservative approximation can be used. MindStone-Agent currently uses conservative estimation rather than provider-perfect token counting in all paths.
5.4 Protection rules
Section titled “5.4 Protection rules”Some prompt entries should not be pruned.
Usually protected:
- system instructions;
- identity and user context;
- current in-flight user message;
- assistant/tool messages paired with the current turn;
- active task constraints;
- safety-critical instructions;
- minimum recent message floor.
Usually eligible:
- older ordinary conversation;
- older resolved tool results;
- stale intermediate discussion;
- context already preserved in transcript and recall index.
Protection rules should be structural and predictable. A sliding-window policy should not rely on the model improvising what is safe to drop.
5.5 Pruning unit
Section titled “5.5 Pruning unit”The pruning unit should preserve local coherence.
MindStone proper treats a pruning unit as a conversation exchange:
one user message+ assistant/tool messages that follow it+ stop before the next user messageThis prevents the system from deleting a user question while keeping the answer, or deleting a tool result while keeping the assistant message that depends on it.
MindStone-Agent’s current implementation uses prompt entries and turn-ish grouping, with protected system entries and a minimum recent-message floor. Tool-call/tool-result grouping is expected to become more important as more live tool transcripts flow through the system.
5.6 Recall bridge
Section titled “5.6 Recall bridge”A sliding window needs a recall bridge. Otherwise, it is just truncation.
When older material leaves the prompt, it should remain available through:
- structured memory;
- transcript indexing;
- vector or lexical search;
- source-aware recall;
- manual source reads;
- checkpoint/consolidation review.
MindStone proper’s strongest form is vectorize-before-prune:
Before an exchange is removed from the live window, index/vectorize it.If indexing fails, do not prune that exchange.That rule makes pruning safety-first. The system does not remove what it cannot preserve in a recallable form.
6. Configuration model
Section titled “6. Configuration model”A practical sliding-window configuration needs at least:
{ "contextManagement": { "mode": "sliding_window", "ceilingPercent": 92, "floorPercent": 70, "minRecentMessages": 24, "preserveTranscript": true }}Selects the context-management strategy.
MindStone-Agent supports:
sliding_windowauto_compactThe preferred default for MindStone-Agent is sliding_window.
ceilingPercent
Section titled “ceilingPercent”The utilization level where pruning begins.
Example:
92% of model context windowfloorPercent
Section titled “floorPercent”The target utilization after pruning.
Example:
70% of model context windowThe gap between ceiling and floor prevents constant micro-pruning.
minRecentMessages
Section titled “minRecentMessages”A recent-message floor. Even if the target floor would prune more aggressively, the system retains at least this many recent prompt messages unless impossible.
preserveTranscript
Section titled “preserveTranscript”For normal MindStone behavior, this should remain true.
If transcript preservation is disabled, the system no longer has MindStone-style continuity guarantees.
7. Reference flow
Section titled “7. Reference flow”A typical MindStone-style sliding-window route looks like this:
1. Receive user input.2. Append input to canonical transcript.3. Load standing identity/user context.4. Run Auto Recall or prepare recall budget.5. Build candidate prompt entries.6. Estimate prompt utilization.7. If utilization exceeds ceiling: a. identify oldest eligible pruning units; b. preserve/index if required; c. remove units from live prompt; d. continue until floor target or eligibility limit.8. Record context_window_pruned event if pruning occurred.9. Send bounded prompt to model/provider.10. Append assistant output and events to transcript.A simplified algorithm:
function buildPromptWindow(entries, config, maxContextTokens): pinned = selectProtected(entries) eligible = selectEligible(entries) window = pinned + eligible
usage = estimate(window) / maxContextTokens
if usage < config.ceilingPercent: return window
pruned = []
while usage > config.floorPercent: unit = oldestEligibleUnit(eligible) if unit is null: break
if config.vectorizeBeforePrune: if not vectorize(unit): markIneligible(unit) continue
remove unit from window pruned.append(unit) usage = estimate(window) / maxContextTokens
appendTranscriptEvent("context_window_pruned", { prunedIds: ids(pruned), keptIds: ids(window), usageAfter: usage })
return windowReal implementations also reserve budget for provider overhead, tool schemas, recall injection, handoff replay, and model-specific formatting.
8. Integration with MindStone
Section titled “8. Integration with MindStone”Sliding-window context management fits into MindStone’s Layered Continuity Architecture as the live-context layer.
It interacts with the other layers rather than replacing them.
8.1 Identity and standing context
Section titled “8.1 Identity and standing context”MindStone agents use identity and user context as standing orientation. In current MindStone-Agent work, configured IDENTITY.md and USER.md are injected into routed calls as system context.
This context is budgeted before transcript windowing and recall selection.
Design guidance:
Keep standing context thin.Do not use permanent prompt stuffing as a substitute for memory.8.2 Canonical transcript
Section titled “8.2 Canonical transcript”MindStone-Agent uses a canonical session shape:
agent:<agentId>:<mainKey>Default:
agent:default:mainThe early alias:
mindstonecanonicalizes to agent:default:main for compatibility.
This is important because sliding-window pruning should operate over one shared continuity substrate. CLI, TUI, Gateway, WebChat, OpenAI-compatible chat, and OpenResponses-style surfaces should not create disconnected histories by default.
8.3 Auto Recall
Section titled “8.3 Auto Recall”Auto Recall is the companion mechanism that prevents sliding-window pruning from becoming amnesia.
When older material leaves the prompt, it can return if it becomes relevant. MindStone recall can use structured memory files, LOG entries, journals, transcript chunks, and source metadata. Candidate material is ranked by more than text similarity in the broader resonance-weighted model.
The relationship is:
Sliding window decides what stays active by default.Recall decides what returns when relevant.8.4 Structured memory
Section titled “8.4 Structured memory”Sliding-window pruning does not decide what becomes durable memory.
Durable memory is governed separately through checkpoint/consolidation flows. A pruned exchange may eventually produce a memory update, but pruning itself is not a memory-writing event.
This prevents a dangerous shortcut:
old prompt context disappeared → therefore summarize it into memoryMindStone treats durable memory as curated, source-aware, and approval-controlled where appropriate.
8.5 Handoff and compaction
Section titled “8.5 Handoff and compaction”Handoff and compaction still matter.
In constrained substrates such as MS4PI or MS4CC, the agent may not own the live prompt window. The practical flow may be:
checkpoint → rich handoff → substrate compaction → handoff replay → archive/backfillMindStone-Agent keeps auto_compact as a secondary/fallback mode for those cases.
But when MindStone-Agent owns routing and prompt assembly, sliding-window context management is preferred.
8.6 Consolidation cycle
Section titled “8.6 Consolidation cycle”The consolidation cycle, sometimes informally called a dream cycle, is where accumulated operational history is reviewed and integrated.
Sliding-window pruning keeps the prompt healthy. Consolidation keeps durable continuity healthy.
The consolidation cycle can:
- review transcript history;
- identify durable decisions and lessons;
- update structured memory;
- compact bloated memory sources;
- rebuild or maintain indexes;
- verify recall health;
- create handoffs for future agent states.
Sliding windows reduce emergency pressure, but they do not eliminate the need for consolidation.
9. Implementation status in MindStone-Agent
Section titled “9. Implementation status in MindStone-Agent”Current MindStone-Agent context-management work includes:
- selectable
contextManagement.mode; sliding_windowas the preferred default;auto_compactas a secondary/fallback mode;buildPromptWindow();- ceiling/floor pruning behavior;
minRecentMessagesfloor;- protected system entries;
- transcript preservation;
context_window_prunedtranscript events;- canonical shared session default
agent:default:main; - identity/user context injection;
- route-planned events exposing prompt-window decisions;
- ephemeral Auto Recall context injection;
- Pi-session inline context-pruning factory derived from
contextManagement.mode = "sliding_window".
Verified non-live smoke coverage has included:
npm run smoke:context-windownpm run smoke:sliding-windownpm run smoke:unified-sessionImportant current caveats:
- Token estimation is conservative, not provider-tokenizer-perfect in all paths.
- Tool-call/tool-result grouping needs richer handling as real tool transcripts mature.
- Native sqlite-vec nearest-neighbor search is not active locally yet.
- Live authenticated Pi-session prompt/stream validation remains an MVP proof gate.
- Live authenticated compaction validation remains separate and should not be claimed complete until tested.
10. Failure modes and safeguards
Section titled “10. Failure modes and safeguards”10.1 Sliding window without recall becomes truncation
Section titled “10.1 Sliding window without recall becomes truncation”If pruned material cannot be found again, the agent can lose access to important history.
Safeguards:
- preserve transcript;
- index transcript chunks;
- run Auto Recall before inference;
- expose recall diagnostics;
- allow deliberate source reads;
- checkpoint important decisions into structured memory.
10.2 Bad pruning boundaries distort meaning
Section titled “10.2 Bad pruning boundaries distort meaning”If pruning splits related messages, the live prompt can become misleading.
Example failure:
assistant message references a tool resultbut the tool result was prunedSafeguards:
- prune coherent exchanges or turn groups;
- protect in-flight turns;
- keep tool-call/tool-result groups together;
- record pruned IDs for audit.
10.3 Over-pinning prevents pruning
Section titled “10.3 Over-pinning prevents pruning”If too much context is protected, the system cannot create headroom.
Safeguards:
- keep standing context small;
- avoid pinning large memory bodies;
- reserve budget explicitly;
- report protected-context pressure;
- use recall instead of permanent prompt stuffing.
10.4 Token estimation errors cause overflow
Section titled “10.4 Token estimation errors cause overflow”If the estimator undercounts, the prompt may still exceed provider limits.
Safeguards:
- conservative estimates;
- model-specific context caps;
- provider tokenizers where available;
- lower ceilings for uncertain providers;
- reserved overhead for tool schemas and response buffers.
10.5 Transcript mutation breaks continuity
Section titled “10.5 Transcript mutation breaks continuity”The most serious failure is calling something a sliding window while deleting or rewriting history.
Safeguard:
The authoritative transcript must remain append-only.If history is destroyed, the system no longer has MindStone-style continuity. It has prompt trimming.
11. Operational guidance
Section titled “11. Operational guidance”Use sliding window when
Section titled “Use sliding window when”- the agent owns prompt assembly;
- the transcript is preserved independently;
- recall is available;
- long-running continuity matters;
- summary drift would be risky;
- multiple surfaces need one shared continuity substrate.
Use compaction or handoff when
Section titled “Use compaction or handoff when”- the harness controls compaction;
- the agent cannot manage the live prompt directly;
- a context reset is unavoidable;
- a future agent state needs a compact operational bridge;
- emergency continuity is more important than perfect context texture.
Do not use sliding window as an excuse to skip memory
Section titled “Do not use sliding window as an excuse to skip memory”Sliding-window pruning answers:
What can leave the live prompt now?It does not answer:
What should the agent remember permanently?That is the job of structured memory and consolidation.
Do not use compaction summaries as authoritative history
Section titled “Do not use compaction summaries as authoritative history”A compaction summary is a recovery aid. It is not the transcript.
If a summary conflicts with the transcript, the transcript wins.
12. Evaluation checklist
Section titled “12. Evaluation checklist”A sliding-window implementation should be tested against at least these questions.
Context pressure
Section titled “Context pressure”- Does pruning trigger at the configured ceiling?
- Does it prune toward the configured floor?
- Does it avoid constant micro-pruning?
- Does it leave enough response headroom?
Transcript preservation
Section titled “Transcript preservation”- Are transcript entries unchanged after pruning?
- Are pruned prompt entries still present in durable history?
- Can the transcript be re-indexed after pruning?
Recall recovery
Section titled “Recall recovery”- Can an important early correction leave the prompt and later return through recall?
- Does recall preserve source pointers?
- Are recall injections bounded and auditable?
Tool coherence
Section titled “Tool coherence”- Are in-flight tool results protected?
- Are older tool results pruned only with their surrounding exchange?
- Are assistant messages not left referring to missing local tool context?
Multi-surface continuity
Section titled “Multi-surface continuity”- Do CLI, TUI, Gateway, WebChat, and API calls converge on one canonical transcript?
- Does prompt pruning avoid channel-specific history forks?
Compaction comparison
Section titled “Compaction comparison”Run the same long session under compaction and sliding-window modes.
Compare:
- factual retention;
- source traceability;
- user correction retention;
- hallucinated continuity;
- ability to explain why the agent believes something.
13. Conclusion
Section titled “13. Conclusion”Sliding-window context management is a practical answer to a basic constraint: the model’s prompt is finite, but a persistent agent’s operational history is not.
MindStone’s approach is to stop treating the prompt as the memory system. The prompt is only the current working set. The transcript is the historical record. Structured memory is curated. Recall is selective. Handoff is a continuity bridge. Consolidation is where experience becomes durable.
Sliding-window pruning keeps the live prompt bounded without converting old context into a lossy summary. It works because older material is still preserved, indexed, recallable, and available for later review.
That is the key difference from compaction-first designs:
Compaction compresses history into a summary.Sliding window preserves history and bounds the working set.For persistent agents, that distinction matters. It lets the system manage context pressure without historical amnesia.
Short version
Section titled “Short version”Sliding-window context management is MindStone’s preferred live-context policy when it owns the session and prompt assembly.
It works like this:
keep the transcript append-onlybuild a bounded live promptprune oldest eligible prompt entries under pressurepreserve and index what leavesuse recall to bring back what mattersuse consolidation to turn durable experience into memoryIt is not a memory system by itself. It is the live-context layer that lets memory, recall, handoff, and consolidation work without confusing the temporary prompt with the agent’s actual history.