Why Kahuna?
Many applications combine a fast key/value cache with a database. The cache makes reads fast, while the database remains the durable source of truth. It is natural to reuse that combination for locks, leader election, shared configuration, counters, and ID generation.
That works while failures are simple. The difficult cases begin when a server fails, the network splits, a request times out, or a worker pauses long enough to lose ownership without knowing it.
Kahuna is built for those cases. It provides a distributed key/value store, locks with fencing tokens, and retry-safe sequences through one replicated cluster.
A Cache and a Coordinator Solve Different Problems
A cache answers:
Where can I quickly store and retrieve this value?
A coordination system answers:
Which node is allowed to act, what is the accepted value, and what happens when nodes disagree?
The APIs can look similar. Both may have GET, SET, expiration, and atomic counters. The guarantees during failure are what make them different.
| Requirement | A fast single-node store may be enough | Kahuna is the safer fit |
|---|---|---|
| Rebuildable cache | Yes | Usually unnecessary |
| Temporary session data | Often | When losing or conflicting state is unacceptable |
| Shared configuration | Only if brief inconsistency is acceptable | When every service must observe an ordered value |
| One worker running a job | Not with a basic expiring key alone | Use a leased lock with a fencing token |
| Unique ordered IDs | A counter works until retries become ambiguous | Use an idempotent distributed sequence |
| Several keys changing together | Separate commands can expose partial state | Use a distributed transaction |
| Surviving a node failure | Depends on replication and failover semantics | Persistent writes are committed by a quorum |
One System Instead of a Cache and Database
The usual cache-plus-database pattern makes the application coordinate two copies of the same value:
- Write the durable value to the database.
- Update or invalidate the cached value.
- Handle failures between those two operations.
- Decide what to do when the cache and database disagree.
This creates familiar problems: stale reads, cache invalidation bugs, dual-write failures, cold-cache latency, and several application instances rebuilding the same missing value at once.
For coordination and small shared state, Kahuna combines both roles:
- Hot values stay in memory for fast access.
- Persistent values are replicated through Raft and materialized to RocksDB or SQLite.
- Evicted persistent values can be loaded from storage again on the next read.
- One write path updates the accepted cluster state, so application code does not maintain a separate cache copy.
- Ephemeral values use the same API when the data is disposable and does not need replication or restart durability.
This means configuration, service metadata, reservations, rate-limit state, locks, and sequences often do not need a separate cache in front of a separate database. Kahuna provides the in-memory working set and durable storage as one coordinated service.
Kahuna does not replace an application's primary business database. Relational queries, document storage, analytics, large records, and bulk data still belong in systems designed for those workloads. The benefit is removing the cache-plus-database combination where the data exists mainly to coordinate application instances.
The Risks Hidden Behind Simple Commands
Replication Is Not Automatically Consensus
Having replicas does not by itself establish which node may accept a write. During a network failure, two nodes can each appear healthy from different parts of the application.
Questions the application must otherwise answer include:
- Which node is the current authority?
- Was an acknowledged write copied before failover?
- Can an old primary continue accepting writes?
- In what order should conflicting updates be applied?
Kahuna uses Raft consensus. Each partition has one elected leader, and a persistent write succeeds only after a quorum commits it. If a quorum cannot agree, Kahuna rejects or delays the operation instead of accepting conflicting histories.
This favors correctness over availability during a serious partition. For coordination state, returning an error is usually safer than allowing two owners or two accepted values.
An Expiring Key Is Not a Complete Lock
A common lock recipe is to create a key only if it does not exist and give it an expiration. This prevents two healthy workers from acquiring the key at the same instant, but it does not protect against a paused worker:
- Worker A acquires the key.
- Worker A pauses because of a long garbage collection, machine suspension, or network delay.
- The key expires and Worker B acquires it.
- Worker A resumes and continues working because it does not know ownership was lost.
Now both workers can modify the same resource.
Kahuna returns a fencing token whenever a lock is acquired. The token always increases. Worker B receives a newer token, so the protected resource can reject later writes from Worker A.
Worker A acquires lock -> token 41
lease expires
Worker B acquires lock -> token 42
resource rejects any later operation using token 41
Expiration releases abandoned ownership. Fencing protects against an old owner that comes back.
An Atomic Counter Is Not Retry-Safe Allocation
Imagine allocating an invoice number with an atomic increment:
- The server increments the counter from
500to501. - The response is lost.
- The client times out and does not know whether the increment happened.
- Retrying may allocate
502, leaving the original result ambiguous.
This is not a problem with atomicity. It is a problem with retries across a network.
Kahuna sequences accept an idempotency key. Retrying the same request returns the same allocation instead of advancing the sequence again. Applications can also reserve non-overlapping ranges to reduce network calls.
Use sequences for invoice numbers, tickets, offsets, order numbers, and any identifier whose allocation must remain unambiguous after a timeout.
Atomic Commands Are Not Atomic Workflows
Individual commands may be atomic while the application workflow is not. Consider moving a reservation between two keys:
- Remove it from the old owner.
- Add it to the new owner.
If the process stops between those commands, the reservation disappears. If another client changes one key during the workflow, the final state may be based on stale data.
Kahuna supports distributed transactions with snapshot reads, conflict detection, MVCC, and two-phase commit. Changes to several keys either commit together or roll back together, even when those keys belong to different partitions.
What Kahuna Provides
Distributed Key/Value Store
Kahuna stores shared state with explicit consistency and durability:
- quorum-backed persistent writes,
- automatic leader election and failover,
- revisions and compare-and-set updates,
- key expiration,
- transactions across keys and partitions,
- historical reads as of an HLC timestamp,
- hash routing for general keys and range routing for ordered key spaces.
This is useful for configuration, service metadata, feature flags, reservations, rate limits, and other small state that controls application behavior.
Distributed Locks
Kahuna locks provide:
- one active owner,
- expiring leases,
- lease extension for long-running work,
- monotonically increasing fencing tokens,
- bounded wait and retry behavior,
- persistent or ephemeral durability.
Use them for singleton jobs, leader election, partition ownership, deployment coordination, and exclusive access to external resources.
Distributed Sequencer
Kahuna sequences provide:
- atomic next-value allocation,
- non-overlapping range reservations,
- configurable starting values and increments,
- independent ordering for each sequence name,
- idempotency keys for safe retries.
The application does not need to turn a database row or cache counter into coordination infrastructure.
Why Not Build These Recipes Yourself?
General coordination platforms and low-level key/value APIs provide primitives. Teams then define their own lock format, renewal loop, transaction rules, retry handling, sequence allocation, and failure recovery.
Kahuna exposes these as first-class application operations:
| Kahuna advantage | What it means for an application team |
|---|---|
| Locks, key/value state, and sequences in one system | One failure model and one client instead of several custom recipes. |
| Fencing and idempotency built in | The difficult timeout and stale-owner cases are explicit API concepts. |
| Scripts and interactive transactions | Multi-step decisions can commit atomically. |
| Native .NET client | Async operations, cancellation, transactions, and multiple cluster endpoints fit normal .NET code. |
| REST, gRPC, CLI, and scripts | Services and operators can use the interface that fits the task. |
| RocksDB, SQLite, or memory | Choose production throughput, simple persistence, or fast tests. |
| Persistent and ephemeral state | Durable coordination and disposable high-speed state share one API. |
| MIT license | Use, modify, and redistribute Kahuna without proprietary runtime fees. |
The Tradeoff
Consensus requires communication between nodes. A quorum-backed write costs more than writing to one in-memory server, and a Kahuna cluster has more operational moving parts than a local cache.
Use the simplest system that provides the guarantees the data needs:
- Use a cache for data that can be lost, rebuilt, or briefly inconsistent.
- Use Kahuna when a wrong owner, duplicate allocation, partial update, or lost acknowledged write can cause incorrect behavior.
Kahuna is intentionally designed for coordination and small strongly consistent state. It is not an analytics engine, document database, or bulk object store.
Start With One Problem
You do not need to adopt every feature at once. Start where failure would hurt most:
- protect a scheduled job with a distributed lock,
- move shared control state into the distributed key/value store,
- replace an ambiguous counter with the distributed sequencer,
- make a multi-key workflow atomic with transactions.
Follow the Tutorial to run a standalone node and execute your first commands.