Checkpointing And Recovery

Checkpoints give your application a durable "safe point" inside a partition's log.

In Kommander, a checkpoint is replicated through Raft just like a normal write. Once it is committed, the WAL can eventually compact entries older than that checkpoint.

This guide explains when to write checkpoints, what they do during restore, and how they relate to compaction.

What A Checkpoint Means

Think of a checkpoint as a marker that says:

"Everything before this point has been applied, and the partition can recover starting from here."

That does not mean Kommander serializes your domain state for you. Your application still owns its state machine and restore logic.

What Kommander provides is:

replicated checkpoint entries
restore that starts from the latest committed checkpoint boundary in the WAL
automatic compaction eligibility for older log history.

When To Write A Checkpoint

Good times to write a checkpoint:

after a meaningful batch of committed work
after rebuilding or refreshing a derived local snapshot
after a workflow phase where replaying older history is no longer useful
before expecting a partition to accumulate a large amount of additional traffic.

Less useful patterns:

writing a checkpoint after every single command
never writing checkpoints at all
treating checkpoints as a replacement for application restore logic.

If you never write checkpoints, compaction has little or nothing to reclaim.

Basic Flow

The usual sequence is:

replicate normal application entries
apply them through your state machine
periodically replicate a checkpoint
let automatic compaction remove older WAL history over time.

Example:

RaftReplicationResult write = await raft.ReplicateLogs(
    partitionId: 2,
    type: "OrderPlaced",
    data: payload,
    cancellationToken: cancellationToken
);

if (write.Status != RaftOperationStatus.Success)
    return;

RaftReplicationResult checkpoint = await raft.ReplicateCheckpoint(
    partitionId: 2,
    cancellationToken: cancellationToken
);

ReplicateCheckpoint uses the same quorum path as regular replication. That means it still needs the partition leader and follower acknowledgements to commit.

What Happens During Restore

On restore, Kommander replays the WAL from the latest committed checkpoint boundary forward.

From the application's point of view, the important implication is simple:

newer checkpoints reduce how much history may need to be replayed
older history may disappear after compaction
your restore code must still be correct from the retained checkpoint boundary onward.

If you need deterministic rebuild behavior, keep your restore path compatible with starting from the newest retained checkpoint and replaying the remaining committed entries.

Compaction Relationship

Automatic compaction is checkpoint-driven.

The main settings are:

CompactEveryOperations
CompactNumberEntries
MaxEntriesPerCompaction

When compaction runs, Kommander:

finds the last committed checkpoint for the partition
removes entries older than that checkpoint in batches
stops when there is no more eligible work or the pass limit is reached.

This means checkpoints influence how much old WAL can be removed, while the compaction settings influence how fast that removal happens.

A Practical Strategy

For most applications, start with this approach:

write normal commands freely
add checkpoints at stable milestones rather than every write
observe WAL growth and restore time
increase checkpoint frequency only if replay or storage growth becomes a problem.

Examples of stable milestones:

every few hundred or few thousand applied operations
after closing an accounting period
after finishing a tenant import
after completing a durable workflow stage.

What Your Application Still Owns

Kommander does not automatically create a business snapshot file or serialize your in-memory domain objects.

Your application still decides:

what state is reconstructed during OnLogRestored
whether you maintain your own local snapshot representation
when a checkpoint is meaningful for your domain
whether restore time or WAL growth is acceptable.

What A Checkpoint Means​

When To Write A Checkpoint​

Basic Flow​

What Happens During Restore​

Compaction Relationship​

A Practical Strategy​

What Your Application Still Owns​

Related Reading​