Skip to content

Storage Model

TaiDB stores records in an append-only log. Writes add new records to the end of the file. Reads use an in-memory index to find the latest record for a key and then load the value from disk, using memory-mapped reads by default.

New databases start with a small file header that identifies the TaiDB storage format version and feature flags. Headerless 0.1.x files still open, and compaction or repack rewrites live records into the current headered format.

  1. A write appends a record.
  2. The in-memory index points the key to the newest record.
  3. A delete appends a tombstone.
  4. Compaction rewrites only live records into a smaller file.

This model keeps writes straightforward and makes crash recovery inspectable. On open, TaiDB scans the file and rebuilds the in-memory index from the log.

Keys and values are byte-oriented at the low level. The high-level TaiDbEngine exposes string-friendly helpers:

  • put_text()
  • get_text()
  • put_bytes()
  • get_bytes()

Vector records are stored separately from regular value records so vector search can operate on vector payloads without decoding unrelated values.

Memory-mapped reads are enabled by default. They reduce copying for read-heavy local workloads and work well for single-file embedded storage.

Disable mmap reads when you need direct file reads:

let db = taidb::EngineConfig::new("./app.taidb")
.mmap_reads(false)
.open()?;

Or from the CLI:

Terminal window
taidb --no-mmap-reads get ./app.taidb user:1

Updates and deletes leave stale records in the log. Compaction rewrites live records into a compacted file.

Terminal window
taidb compact ./app.taidb

Use compaction when:

  • file_bytes is much larger than live_bytes.
  • You deleted many keys.
  • You updated many keys repeatedly.
  • You want a smaller snapshot or backup.

Before 1.0.0, compaction is an important area for hardening. The roadmap calls for interruption-safe compaction guarantees and documented space amplification behavior.

TaiDB has tests for partial-tail recovery and corruption detection. Writable opens truncate recoverable partial trailing records. Read-only opens report a deterministic corruption error when recovery would require rewriting the file. Before relying on TaiDB for production data, read the roadmap items for batch atomicity and durability modes.

The 1.0.0 target is explicit:

  • deterministic corruption errors
  • format versioning
  • migration tests
  • documented crash behavior
  • clear durability guarantees