Storage Format
TaiDB stores data in a single append-only file. The current format has a
database file header followed by record entries. Older 0.1.x files did not
have a database file header; TaiDB still recognizes and opens those files.
Database file header
Section titled “Database file header”New databases begin with a 32-byte header.
| Offset | Size | Field | Description |
|---|---|---|---|
0 | 4 | magic | TDBF database file magic. |
4 | 1 | version | Current database format version. |
5 | 1 | min reader version | Minimum reader version required to open the file. |
6 | 2 | header length | Header length in bytes, currently 32. |
8 | 8 | feature flags | Bitset of required storage features. |
16 | 4 | checksum | CRC32 of the header with this field zeroed. |
20 | 12 | reserved | Must be zero in version 1. |
Current feature flags:
| Bit | Name | Meaning |
|---|---|---|
0 | record header v2 | Record headers include raw_value_len at bytes 28..32. |
TaiDB rejects future database versions, unsupported feature flags, bad header checksums, and non-zero reserved bytes in the current header version.
Legacy 0.1.x files
Section titled “Legacy 0.1.x files”0.1.x databases start directly with a record header whose magic is TDB1.
TaiDB detects this and opens the file as a legacy record stream.
When a legacy file is compacted or repacked, TaiDB rewrites live records into the current headered format.
Record header
Section titled “Record header”Each record has a 32-byte header followed by key bytes and stored value bytes.
| Offset | Size | Field | Description |
|---|---|---|---|
0 | 4 | magic | TDB1 record magic. |
4 | 1 | version | Record format version. |
5 | 1 | kind | 1 value, 2 delete, 3 vector. |
6 | 2 | key length | Key length in bytes. |
8 | 4 | stored value length | Stored payload length in bytes. |
12 | 4 | vector dimensions | Non-zero only for vector records. |
16 | 4 | checksum | CRC32 of header, key, and stored value with checksum zeroed. |
20 | 8 | sequence | Monotonic record sequence. |
28 | 4 | raw value length | Uncompressed plaintext payload length. |
Delete records must have zero stored value length. Non-vector records must have zero vector dimensions.
Payloads
Section titled “Payloads”Values may be stored directly, compressed with zstd, or wrapped in an encrypted payload envelope. The record header stores the raw value length so the reader can verify decompressed or decrypted payload size.
Vector payloads are encoded f32 values and are validated against the vector
dimension count stored in the record header.
Recovery behavior
Section titled “Recovery behavior”On open, TaiDB scans from the data offset and rebuilds the in-memory index from the latest record for each key.
Recovery rules:
- A partial trailing record is truncated when the database is opened writable.
- A partial trailing record in read-only mode is reported as corruption because TaiDB cannot safely rewrite the file.
- A partial new database header in a writable file is treated as an interrupted empty database creation and rewritten.
- Unsupported complete headers are rejected deterministically.
- Checksum mismatches are rejected as corruption.
- Headerless legacy files remain readable.