Sliding-window context management

Watch video

Subtitle: A technical whitepaper on bounded live context, preserved transcript history, and recall-backed continuity

Status: Technical whitepaper draft
Date: 2026-06-21
Author: Clint Bodungen
Project context: MindStone Agent, MindStone-Agent, MS4CC, MS4PI
Companion to: Layered Continuity Architecture for Persistent LLM Agents

Executive summary

Long-running LLM agents eventually hit context pressure. Conversation history grows, tool results accumulate, project state expands, and the model’s prompt window remains finite. A system has to decide what stays in the live prompt and what leaves.

The common answer is compaction: summarize older context and continue from the summary. Compaction can be useful, especially when the underlying harness controls the session and gives the agent no better option. But compaction is lossy. It can flatten uncertainty, omit causality, lose source detail, and allow a generated summary to start acting like history.

MindStone’s preferred answer, when it owns the session and prompt assembly path, is sliding-window context management.

A sliding window treats the live prompt as a bounded working set, not as the historical record. Older entries may leave the active prompt when context pressure rises, but the authoritative transcript remains append-only and complete. Removed material can be indexed, recalled, audited, replayed, or later consolidated into durable memory.

The core principle is simple:

The transcript is history.
The prompt is a working set.

Sliding-window context management is one layer in MindStone’s broader Layered Continuity Architecture. It does not replace memory, recall, handoff, or consolidation. It makes those systems safer by preventing prompt overflow without pretending that pruned context has been deleted from the agent’s operational history.

1. Problem statement

LLM agents are stateless at the model level. Each inference is produced from the prompt supplied at that moment. If important context is not in the prompt, the model does not have direct access to it.

For short sessions, the solution is straightforward: include recent conversation. For long-running agents, this breaks down.

A persistent agent may need to preserve:

standing identity and operating rules;
user or organization preferences;
recent conversation;
tool calls and tool results;
project files and source excerpts;
retrieved memory;
unresolved decisions;
active task state;
prior mistakes and corrections;
safety constraints;
handoff or recovery context.

All of that competes for the same prompt window.

When the prompt grows too large, the system needs a context-management policy. Without one, the agent eventually fails with context overflow or relies on ad hoc truncation.

The design question is not whether context must be removed from the prompt. It must. The question is what kind of removal preserves continuity.

2. Design principle: prompt removal is not historical deletion

MindStone separates two concepts that are often collapsed:

Authoritative transcript — the durable record of what happened.
Live prompt window — the bounded context sent to the model for the next response.

The transcript should be append-only. It records user messages, assistant messages, tool activity, system events, source metadata, checkpoint markers, and other continuity-relevant events.

The live prompt is temporary. It is assembled from selected pieces of the transcript, standing context, structured memory, recall results, source excerpts, and task state.

Sliding-window pruning only changes the live prompt.

It must not:

delete transcript entries;
rewrite transcript history;
silently replace history with a summary;
split continuity by channel;
promote a pruning artifact into durable memory;
pretend old context never happened.

In operational terms:

Pruned from prompt does not mean forgotten.

3. What sliding-window context management does

Sliding-window context management keeps the model’s active prompt below a configured context budget by removing older eligible prompt entries when utilization crosses a threshold.

At a high level:

1. Build the candidate prompt.
2. Estimate context utilization.
3. If utilization is below the ceiling, do nothing.
4. If utilization crosses the ceiling, prune older eligible entries.
5. Continue pruning until utilization reaches the target floor.
6. Preserve transcript history.
7. Record what was pruned.
8. Let recall bring older material back when relevant.

The window “slides” because newer context remains active while older eligible context falls off the back of the live prompt.

The mechanism is intentionally simple. It does not try to interpret every message semantically at pruning time. It does not summarize. It does not destructively classify tool results. It applies a bounded working-set policy over an authoritative history.

4. Why not just compact?

Compaction and sliding-window pruning both respond to context pressure, but they preserve continuity differently.

4.1 Compaction

Compaction compresses older context into a summary:

large prompt history
→ model-generated summary
→ smaller prompt containing summary + recent context

This is useful when:

the harness controls context and forces compaction;
the agent cannot directly manage prompt assembly;
an emergency continuity bridge is needed;
a compact handoff is required before a context reset.

But compaction introduces summary risk.

A compaction summary can:

omit user corrections;
lose source details;
flatten unresolved uncertainty;
compress tool results too aggressively;
misstate causal order;
overemphasize what seemed important at summary time;
underrepresent facts needed later;
accumulate drift across repeated compactions.

The problem is not that summaries are useless. The problem is treating a summary as if it were the record.

4.2 Sliding window

Sliding-window pruning does not summarize older context. It removes older eligible entries from the prompt while preserving the transcript and recall substrate.

append-only transcript remains intact
→ live prompt exceeds budget
→ oldest eligible prompt units leave the active window
→ pruned material remains preserved and recallable

This avoids summary drift at the pruning boundary.

The tradeoff is that older context is no longer visible by default. The system must rely on recall, structured memory, source reads, or consolidation to bring back what matters.

That is why sliding-window context management is not a complete continuity system by itself. It must be paired with preserved transcripts and recall.

4.3 Practical comparison

Capability	Compaction	Sliding window
Reduces live prompt size	Yes	Yes
Preserves full transcript	Only if implemented separately	Yes, by design
Replaces old context with model summary	Yes	No
Risk of summary drift	High	Low at pruning boundary
Requires good recall	Helpful	Essential
Best fit	Substrate fallback, emergency continuity, handoff bridge	Primary long-running context policy when prompt assembly is owned
Failure mode	Summary becomes false history	Relevant old context is not recalled

MindStone uses both patterns, but for different roles.

Sliding window is the preferred primary context-management strategy when MindStone owns the session and prompt assembly.

Compaction is a fallback or substrate-adapter strategy when the underlying harness requires it.

5. Core architecture

A MindStone-style sliding-window system has six main parts.

5.1 Authoritative transcript

The transcript is the source of truth. It is append-only and retained outside the prompt window.

It should include enough metadata to support replay, recall, audit, and consolidation:

entry id
session key
timestamp
role
source/channel
content or event type
metadata

Prompt pruning must never remove entries from this transcript.

5.2 Prompt entries

The prompt builder converts transcript entries and other context sources into prompt entries.

Prompt entries may include:

system instructions
identity/user context
recent user messages
recent assistant messages
tool messages
retrieved memory
handoff context
source excerpts

Not all transcript events become prompt entries. Some events are durable metadata only.

5.3 Budgeting

The prompt builder estimates token usage before sending a request to the model.

A typical utilization calculation is:

utilization = estimatedPromptTokens / maxContextTokens

The model context window may come from:

explicit route configuration;
provider model metadata;
model registry defaults;
a conservative fallback.

If exact tokenization is unavailable, a conservative approximation can be used. MindStone-Agent currently uses conservative estimation rather than provider-perfect token counting in all paths.

5.4 Protection rules

Some prompt entries should not be pruned.

Usually protected:

system instructions;
identity and user context;
current in-flight user message;
assistant/tool messages paired with the current turn;
active task constraints;
safety-critical instructions;
minimum recent message floor.

Usually eligible:

older ordinary conversation;
older resolved tool results;
stale intermediate discussion;
context already preserved in transcript and recall index.

Protection rules should be structural and predictable. A sliding-window policy should not rely on the model improvising what is safe to drop.

5.5 Pruning unit

The pruning unit should preserve local coherence.

MindStone proper treats a pruning unit as a conversation exchange:

one user message
+ assistant/tool messages that follow it
+ stop before the next user message

This prevents the system from deleting a user question while keeping the answer, or deleting a tool result while keeping the assistant message that depends on it.

MindStone-Agent’s current implementation uses prompt entries and turn-ish grouping, with protected system entries and a minimum recent-message floor. Tool-call/tool-result grouping is expected to become more important as more live tool transcripts flow through the system.

5.6 Recall bridge

A sliding window needs a recall bridge. Otherwise, it is just truncation.

When older material leaves the prompt, it should remain available through:

structured memory;
transcript indexing;
vector or lexical search;
source-aware recall;
manual source reads;
checkpoint/consolidation review.

MindStone proper’s strongest form is vectorize-before-prune:

Before an exchange is removed from the live window, index/vectorize it.
If indexing fails, do not prune that exchange.

That rule makes pruning safety-first. The system does not remove what it cannot preserve in a recallable form.

6. Configuration model

A practical sliding-window configuration needs at least:

{
  "contextManagement": {
    "mode": "sliding_window",
    "ceilingPercent": 92,
    "floorPercent": 70,
    "minRecentMessages": 24,
    "preserveTranscript": true
  }
}

`mode`

Selects the context-management strategy.

MindStone-Agent supports:

sliding_window
auto_compact

The preferred default for MindStone-Agent is sliding_window.

`ceilingPercent`

The utilization level where pruning begins.

Example:

92% of model context window

`floorPercent`

The target utilization after pruning.

Example:

70% of model context window

The gap between ceiling and floor prevents constant micro-pruning.

`minRecentMessages`

A recent-message floor. Even if the target floor would prune more aggressively, the system retains at least this many recent prompt messages unless impossible.

`preserveTranscript`

For normal MindStone behavior, this should remain true.

If transcript preservation is disabled, the system no longer has MindStone-style continuity guarantees.

7. Reference flow

A typical MindStone-style sliding-window route looks like this:

1. Receive user input.
2. Append input to canonical transcript.
3. Load standing identity/user context.
4. Run Auto Recall or prepare recall budget.
5. Build candidate prompt entries.
6. Estimate prompt utilization.
7. If utilization exceeds ceiling:
   a. identify oldest eligible pruning units;
   b. preserve/index if required;
   c. remove units from live prompt;
   d. continue until floor target or eligibility limit.
8. Record context_window_pruned event if pruning occurred.
9. Send bounded prompt to model/provider.
10. Append assistant output and events to transcript.

A simplified algorithm:

function buildPromptWindow(entries, config, maxContextTokens):
    pinned = selectProtected(entries)
    eligible = selectEligible(entries)
    window = pinned + eligible

    usage = estimate(window) / maxContextTokens

    if usage < config.ceilingPercent:
        return window

    pruned = []

    while usage > config.floorPercent:
        unit = oldestEligibleUnit(eligible)
        if unit is null:
            break

        if config.vectorizeBeforePrune:
            if not vectorize(unit):
                markIneligible(unit)
                continue

        remove unit from window
        pruned.append(unit)
        usage = estimate(window) / maxContextTokens

    appendTranscriptEvent("context_window_pruned", {
        prunedIds: ids(pruned),
        keptIds: ids(window),
        usageAfter: usage
    })

    return window

Real implementations also reserve budget for provider overhead, tool schemas, recall injection, handoff replay, and model-specific formatting.

8. Integration with MindStone

Sliding-window context management fits into MindStone’s Layered Continuity Architecture as the live-context layer.

It interacts with the other layers rather than replacing them.

8.1 Identity and standing context

MindStone agents use identity and user context as standing orientation. In current MindStone-Agent work, configured IDENTITY.md and USER.md are injected into routed calls as system context.

This context is budgeted before transcript windowing and recall selection.

Design guidance:

Keep standing context thin.
Do not use permanent prompt stuffing as a substitute for memory.

8.2 Canonical transcript

MindStone-Agent uses a canonical session shape:

agent:<agentId>:<mainKey>

Default:

agent:default:main

The early alias:

mindstone

canonicalizes to agent:default:main for compatibility.

This is important because sliding-window pruning should operate over one shared continuity substrate. CLI, TUI, Gateway, WebChat, OpenAI-compatible chat, and OpenResponses-style surfaces should not create disconnected histories by default.

8.3 Auto Recall

Auto Recall is the companion mechanism that prevents sliding-window pruning from becoming amnesia.

When older material leaves the prompt, it can return if it becomes relevant. MindStone recall can use structured memory files, LOG entries, journals, transcript chunks, and source metadata. Candidate material is ranked by more than text similarity in the broader resonance-weighted model.

The relationship is:

Sliding window decides what stays active by default.
Recall decides what returns when relevant.

8.4 Structured memory

Sliding-window pruning does not decide what becomes durable memory.

Durable memory is governed separately through checkpoint/consolidation flows. A pruned exchange may eventually produce a memory update, but pruning itself is not a memory-writing event.

This prevents a dangerous shortcut:

old prompt context disappeared → therefore summarize it into memory

MindStone treats durable memory as curated, source-aware, and approval-controlled where appropriate.

8.5 Handoff and compaction

Handoff and compaction still matter.

In constrained substrates such as MS4PI or MS4CC, the agent may not own the live prompt window. The practical flow may be:

checkpoint → rich handoff → substrate compaction → handoff replay → archive/backfill

MindStone-Agent keeps auto_compact as a secondary/fallback mode for those cases.

But when MindStone-Agent owns routing and prompt assembly, sliding-window context management is preferred.

8.6 Consolidation cycle

The consolidation cycle, sometimes informally called a dream cycle, is where accumulated operational history is reviewed and integrated.

Sliding-window pruning keeps the prompt healthy. Consolidation keeps durable continuity healthy.

The consolidation cycle can:

review transcript history;
identify durable decisions and lessons;
update structured memory;
compact bloated memory sources;
rebuild or maintain indexes;
verify recall health;
create handoffs for future agent states.

Sliding windows reduce emergency pressure, but they do not eliminate the need for consolidation.

9. Implementation status in MindStone-Agent

Current MindStone-Agent context-management work includes:

selectable contextManagement.mode;
sliding_window as the preferred default;
auto_compact as a secondary/fallback mode;
buildPromptWindow();
ceiling/floor pruning behavior;
minRecentMessages floor;
protected system entries;
transcript preservation;
context_window_pruned transcript events;
canonical shared session default agent:default:main;
identity/user context injection;
route-planned events exposing prompt-window decisions;
ephemeral Auto Recall context injection;
Pi-session inline context-pruning factory derived from contextManagement.mode = "sliding_window".

Verified non-live smoke coverage has included:

npm run smoke:context-window
npm run smoke:sliding-window
npm run smoke:unified-session

Important current caveats:

Token estimation is conservative, not provider-tokenizer-perfect in all paths.
Tool-call/tool-result grouping needs richer handling as real tool transcripts mature.
Native sqlite-vec nearest-neighbor search is not active locally yet.
Live authenticated Pi-session prompt/stream validation remains an MVP proof gate.
Live authenticated compaction validation remains separate and should not be claimed complete until tested.

10. Failure modes and safeguards

10.1 Sliding window without recall becomes truncation

If pruned material cannot be found again, the agent can lose access to important history.

Safeguards:

preserve transcript;
index transcript chunks;
run Auto Recall before inference;
expose recall diagnostics;
allow deliberate source reads;
checkpoint important decisions into structured memory.

10.2 Bad pruning boundaries distort meaning

If pruning splits related messages, the live prompt can become misleading.

Example failure:

assistant message references a tool result
but the tool result was pruned

Safeguards:

prune coherent exchanges or turn groups;
protect in-flight turns;
keep tool-call/tool-result groups together;
record pruned IDs for audit.

10.3 Over-pinning prevents pruning

If too much context is protected, the system cannot create headroom.

Safeguards:

keep standing context small;
avoid pinning large memory bodies;
reserve budget explicitly;
report protected-context pressure;
use recall instead of permanent prompt stuffing.

10.4 Token estimation errors cause overflow

If the estimator undercounts, the prompt may still exceed provider limits.

Safeguards:

conservative estimates;
model-specific context caps;
provider tokenizers where available;
lower ceilings for uncertain providers;
reserved overhead for tool schemas and response buffers.

10.5 Transcript mutation breaks continuity

The most serious failure is calling something a sliding window while deleting or rewriting history.

Safeguard:

The authoritative transcript must remain append-only.

If history is destroyed, the system no longer has MindStone-style continuity. It has prompt trimming.

11. Operational guidance

Use sliding window when

the agent owns prompt assembly;
the transcript is preserved independently;
recall is available;
long-running continuity matters;
summary drift would be risky;
multiple surfaces need one shared continuity substrate.

Use compaction or handoff when

the harness controls compaction;
the agent cannot manage the live prompt directly;
a context reset is unavoidable;
a future agent state needs a compact operational bridge;
emergency continuity is more important than perfect context texture.

Do not use sliding window as an excuse to skip memory

Sliding-window pruning answers:

What can leave the live prompt now?

It does not answer:

What should the agent remember permanently?

That is the job of structured memory and consolidation.

Do not use compaction summaries as authoritative history

A compaction summary is a recovery aid. It is not the transcript.

If a summary conflicts with the transcript, the transcript wins.

12. Evaluation checklist

A sliding-window implementation should be tested against at least these questions.

Context pressure

Does pruning trigger at the configured ceiling?
Does it prune toward the configured floor?
Does it avoid constant micro-pruning?
Does it leave enough response headroom?

Transcript preservation

Are transcript entries unchanged after pruning?
Are pruned prompt entries still present in durable history?
Can the transcript be re-indexed after pruning?

Recall recovery

Can an important early correction leave the prompt and later return through recall?
Does recall preserve source pointers?
Are recall injections bounded and auditable?

Tool coherence

Are in-flight tool results protected?
Are older tool results pruned only with their surrounding exchange?
Are assistant messages not left referring to missing local tool context?

Multi-surface continuity

Do CLI, TUI, Gateway, WebChat, and API calls converge on one canonical transcript?
Does prompt pruning avoid channel-specific history forks?

Compaction comparison

Run the same long session under compaction and sliding-window modes.

Compare:

factual retention;
source traceability;
user correction retention;
hallucinated continuity;
ability to explain why the agent believes something.

13. Conclusion

Sliding-window context management is a practical answer to a basic constraint: the model’s prompt is finite, but a persistent agent’s operational history is not.

MindStone’s approach is to stop treating the prompt as the memory system. The prompt is only the current working set. The transcript is the historical record. Structured memory is curated. Recall is selective. Handoff is a continuity bridge. Consolidation is where experience becomes durable.

Sliding-window pruning keeps the live prompt bounded without converting old context into a lossy summary. It works because older material is still preserved, indexed, recallable, and available for later review.

That is the key difference from compaction-first designs:

Compaction compresses history into a summary.
Sliding window preserves history and bounds the working set.

For persistent agents, that distinction matters. It lets the system manage context pressure without historical amnesia.

Short version

Sliding-window context management is MindStone’s preferred live-context policy when it owns the session and prompt assembly.

It works like this:

keep the transcript append-only
build a bounded live prompt
prune oldest eligible prompt entries under pressure
preserve and index what leaves
use recall to bring back what matters
use consolidation to turn durable experience into memory

It is not a memory system by itself. It is the live-context layer that lets memory, recall, handoff, and consolidation work without confusing the temporary prompt with the agent’s actual history.