Skip to content

Storage Format

TaiDB stores data in a single append-only file. The current format has a database file header followed by record entries. Older 0.1.x files did not have a database file header; TaiDB still recognizes and opens those files.

New databases begin with a 32-byte header.

OffsetSizeFieldDescription
04magicTDBF database file magic.
41versionCurrent database format version.
51min reader versionMinimum reader version required to open the file.
62header lengthHeader length in bytes, currently 32.
88feature flagsBitset of required storage features.
164checksumCRC32 of the header with this field zeroed.
2012reservedMust be zero in version 1.

Current feature flags:

BitNameMeaning
0record header v2Record headers include raw_value_len at bytes 28..32.

TaiDB rejects future database versions, unsupported feature flags, bad header checksums, and non-zero reserved bytes in the current header version.

0.1.x databases start directly with a record header whose magic is TDB1. TaiDB detects this and opens the file as a legacy record stream.

When a legacy file is compacted or repacked, TaiDB rewrites live records into the current headered format.

Each record has a 32-byte header followed by key bytes and stored value bytes.

OffsetSizeFieldDescription
04magicTDB1 record magic.
41versionRecord format version.
51kind1 value, 2 delete, 3 vector.
62key lengthKey length in bytes.
84stored value lengthStored payload length in bytes.
124vector dimensionsNon-zero only for vector records.
164checksumCRC32 of header, key, and stored value with checksum zeroed.
208sequenceMonotonic record sequence.
284raw value lengthUncompressed plaintext payload length.

Delete records must have zero stored value length. Non-vector records must have zero vector dimensions.

Values may be stored directly, compressed with zstd, or wrapped in an encrypted payload envelope. The record header stores the raw value length so the reader can verify decompressed or decrypted payload size.

Vector payloads are encoded f32 values and are validated against the vector dimension count stored in the record header.

On open, TaiDB scans from the data offset and rebuilds the in-memory index from the latest record for each key.

Recovery rules:

  • A partial trailing record is truncated when the database is opened writable.
  • A partial trailing record in read-only mode is reported as corruption because TaiDB cannot safely rewrite the file.
  • A partial new database header in a writable file is treated as an interrupted empty database creation and rewritten.
  • Unsupported complete headers are rejected deterministically.
  • Checksum mismatches are rejected as corruption.
  • Headerless legacy files remain readable.