Vector Search

TaiDB supports exact cosine vector search. Exact search is intentionally simple: it scans stored vectors and ranks them by cosine similarity. This keeps results deterministic and makes it a reliable correctness baseline.

Store vectors with Rust

use taidb::EngineConfig;

fn main() -> taidb::Result<()> {
    let mut db = EngineConfig::new("./vectors.taidb").open()?;

    db.put_vector("doc:rust", &[0.2, 0.8, 0.1])?;
    db.put_vector("doc:ai", &[0.1, 0.9, 0.2])?;

    let hits = db.search_vector(&[0.0, 1.0, 0.2], 2)?;
    for hit in hits {
        println!("{}\t{:.6}", String::from_utf8_lossy(&hit.key), hit.score);
    }

    Ok(())
}

Store vectors with the CLI

taidb vector-put ./vectors.taidb doc:rust "0.2,0.8,0.1"
taidb vector-put ./vectors.taidb doc:ai "0.1,0.9,0.2"
taidb vector-search ./vectors.taidb "0.0,1.0,0.2" --limit 2

Embedding cache pattern

A common pattern is to store text metadata and vector records under related keys:

doc:123:title
doc:123:body
vec:doc:123

This keeps values and vectors easy to inspect and easy to delete by prefix when the CLI or API supports that workflow.

Exact search tradeoffs

Exact search is useful when:

the dataset is local and moderate in size
deterministic ranking matters
recall must be perfect
you want simple correctness tests before adding an approximate index

Approximate nearest-neighbor search can be faster at larger scale, but it adds index maintenance, recall tradeoffs, and more operational complexity. TaiDB’s roadmap keeps exact search as the correctness baseline and only adds approximate indexing behind a feature flag after benchmarks justify it.

Practical guidance

Keep vector dimensions consistent per collection.
Store enough metadata to reconstruct or audit the source embedding.
Measure query latency with your real vector count and dimension size.
Use batch writes when importing many embeddings.
Re-run vector search tests after compaction and encryption changes.