Skip to main content

Why Kahuna?

Many applications combine a fast key/value cache with a database. The cache makes reads fast, while the database remains the durable source of truth. It is natural to reuse that combination for locks, leader election, shared configuration, counters, and ID generation.

That works while failures are simple. The difficult cases begin when a server fails, the network splits, a request times out, or a worker pauses long enough to lose ownership without knowing it.

Kahuna is built for those cases. It provides a distributed key/value store, locks with fencing tokens, and retry-safe sequences through one replicated cluster.

A Cache and a Coordinator Solve Different Problems

A cache answers:

Where can I quickly store and retrieve this value?

A coordination system answers:

Which node is allowed to act, what is the accepted value, and what happens when nodes disagree?

The APIs can look similar. Both may have GET, SET, expiration, and atomic counters. The guarantees during failure are what make them different.

RequirementA fast single-node store may be enoughKahuna is the safer fit
Rebuildable cacheYesUsually unnecessary
Temporary session dataOftenWhen losing or conflicting state is unacceptable
Shared configurationOnly if brief inconsistency is acceptableWhen every service must observe an ordered value
One worker running a jobNot with a basic expiring key aloneUse a leased lock with a fencing token
Unique ordered IDsA counter works until retries become ambiguousUse an idempotent distributed sequence
Several keys changing togetherSeparate commands can expose partial stateUse a distributed transaction
Surviving a node failureDepends on replication and failover semanticsPersistent writes are committed by a quorum

One System Instead of a Cache and Database

The usual cache-plus-database pattern makes the application coordinate two copies of the same value:

  1. Write the durable value to the database.
  2. Update or invalidate the cached value.
  3. Handle failures between those two operations.
  4. Decide what to do when the cache and database disagree.

This creates familiar problems: stale reads, cache invalidation bugs, dual-write failures, cold-cache latency, and several application instances rebuilding the same missing value at once.

For coordination and small shared state, Kahuna combines both roles:

  • Hot values stay in memory for fast access.
  • Persistent values are replicated through Raft and materialized to RocksDB or SQLite.
  • Evicted persistent values can be loaded from storage again on the next read.
  • One write path updates the accepted cluster state, so application code does not maintain a separate cache copy.
  • Ephemeral values use the same API when the data is disposable and does not need replication or restart durability.

This means configuration, service metadata, reservations, rate-limit state, locks, and sequences often do not need a separate cache in front of a separate database. Kahuna provides the in-memory working set and durable storage as one coordinated service.

Kahuna does not replace an application's primary business database. Relational queries, document storage, analytics, large records, and bulk data still belong in systems designed for those workloads. The benefit is removing the cache-plus-database combination where the data exists mainly to coordinate application instances.

The Risks Hidden Behind Simple Commands

Replication Is Not Automatically Consensus

Having replicas does not by itself establish which node may accept a write. During a network failure, two nodes can each appear healthy from different parts of the application.

Questions the application must otherwise answer include:

  • Which node is the current authority?
  • Was an acknowledged write copied before failover?
  • Can an old primary continue accepting writes?
  • In what order should conflicting updates be applied?

Kahuna uses Raft consensus. Each partition has one elected leader, and a persistent write succeeds only after a quorum commits it. If a quorum cannot agree, Kahuna rejects or delays the operation instead of accepting conflicting histories.

This favors correctness over availability during a serious partition. For coordination state, returning an error is usually safer than allowing two owners or two accepted values.

An Expiring Key Is Not a Complete Lock

A common lock recipe is to create a key only if it does not exist and give it an expiration. This prevents two healthy workers from acquiring the key at the same instant, but it does not protect against a paused worker:

  1. Worker A acquires the key.
  2. Worker A pauses because of a long garbage collection, machine suspension, or network delay.
  3. The key expires and Worker B acquires it.
  4. Worker A resumes and continues working because it does not know ownership was lost.

Now both workers can modify the same resource.

Kahuna returns a fencing token whenever a lock is acquired. The token always increases. Worker B receives a newer token, so the protected resource can reject later writes from Worker A.

Worker A acquires lock -> token 41
lease expires
Worker B acquires lock -> token 42
resource rejects any later operation using token 41

Expiration releases abandoned ownership. Fencing protects against an old owner that comes back.

An Atomic Counter Is Not Retry-Safe Allocation

Imagine allocating an invoice number with an atomic increment:

  1. The server increments the counter from 500 to 501.
  2. The response is lost.
  3. The client times out and does not know whether the increment happened.
  4. Retrying may allocate 502, leaving the original result ambiguous.

This is not a problem with atomicity. It is a problem with retries across a network.

Kahuna sequences accept an idempotency key. Retrying the same request returns the same allocation instead of advancing the sequence again. Applications can also reserve non-overlapping ranges to reduce network calls.

Use sequences for invoice numbers, tickets, offsets, order numbers, and any identifier whose allocation must remain unambiguous after a timeout.

Atomic Commands Are Not Atomic Workflows

Individual commands may be atomic while the application workflow is not. Consider moving a reservation between two keys:

  1. Remove it from the old owner.
  2. Add it to the new owner.

If the process stops between those commands, the reservation disappears. If another client changes one key during the workflow, the final state may be based on stale data.

Kahuna supports distributed transactions with snapshot reads, conflict detection, MVCC, and two-phase commit. Changes to several keys either commit together or roll back together, even when those keys belong to different partitions.

What Kahuna Provides

Distributed Key/Value Store

Kahuna stores shared state with explicit consistency and durability:

  • quorum-backed persistent writes,
  • automatic leader election and failover,
  • revisions and compare-and-set updates,
  • key expiration,
  • transactions across keys and partitions,
  • historical reads as of an HLC timestamp,
  • hash routing for general keys and range routing for ordered key spaces.

This is useful for configuration, service metadata, feature flags, reservations, rate limits, and other small state that controls application behavior.

Distributed Locks

Kahuna locks provide:

  • one active owner,
  • expiring leases,
  • lease extension for long-running work,
  • monotonically increasing fencing tokens,
  • bounded wait and retry behavior,
  • persistent or ephemeral durability.

Use them for singleton jobs, leader election, partition ownership, deployment coordination, and exclusive access to external resources.

Distributed Sequencer

Kahuna sequences provide:

  • atomic next-value allocation,
  • non-overlapping range reservations,
  • configurable starting values and increments,
  • independent ordering for each sequence name,
  • idempotency keys for safe retries.

The application does not need to turn a database row or cache counter into coordination infrastructure.

Why Not Build These Recipes Yourself?

General coordination platforms and low-level key/value APIs provide primitives. Teams then define their own lock format, renewal loop, transaction rules, retry handling, sequence allocation, and failure recovery.

Kahuna exposes these as first-class application operations:

Kahuna advantageWhat it means for an application team
Locks, key/value state, and sequences in one systemOne failure model and one client instead of several custom recipes.
Fencing and idempotency built inThe difficult timeout and stale-owner cases are explicit API concepts.
Scripts and interactive transactionsMulti-step decisions can commit atomically.
Native .NET clientAsync operations, cancellation, transactions, and multiple cluster endpoints fit normal .NET code.
REST, gRPC, CLI, and scriptsServices and operators can use the interface that fits the task.
RocksDB, SQLite, or memoryChoose production throughput, simple persistence, or fast tests.
Persistent and ephemeral stateDurable coordination and disposable high-speed state share one API.
MIT licenseUse, modify, and redistribute Kahuna without proprietary runtime fees.

The Tradeoff

Consensus requires communication between nodes. A quorum-backed write costs more than writing to one in-memory server, and a Kahuna cluster has more operational moving parts than a local cache.

Use the simplest system that provides the guarantees the data needs:

  • Use a cache for data that can be lost, rebuilt, or briefly inconsistent.
  • Use Kahuna when a wrong owner, duplicate allocation, partial update, or lost acknowledged write can cause incorrect behavior.

Kahuna is intentionally designed for coordination and small strongly consistent state. It is not an analytics engine, document database, or bulk object store.

Start With One Problem

You do not need to adopt every feature at once. Start where failure would hurt most:

Follow the Tutorial to run a standalone node and execute your first commands.