Skip to content

Vector Search

TaiDB supports exact cosine vector search. Exact search is intentionally simple: it scans stored vectors and ranks them by cosine similarity. This keeps results deterministic and makes it a reliable correctness baseline.

use taidb::EngineConfig;
fn main() -> taidb::Result<()> {
let mut db = EngineConfig::new("./vectors.taidb").open()?;
db.put_vector("doc:rust", &[0.2, 0.8, 0.1])?;
db.put_vector("doc:ai", &[0.1, 0.9, 0.2])?;
let hits = db.search_vector(&[0.0, 1.0, 0.2], 2)?;
for hit in hits {
println!("{}\t{:.6}", String::from_utf8_lossy(&hit.key), hit.score);
}
Ok(())
}
Terminal window
taidb vector-put ./vectors.taidb doc:rust "0.2,0.8,0.1"
taidb vector-put ./vectors.taidb doc:ai "0.1,0.9,0.2"
taidb vector-search ./vectors.taidb "0.0,1.0,0.2" --limit 2

A common pattern is to store text metadata and vector records under related keys:

doc:123:title
doc:123:body
vec:doc:123

This keeps values and vectors easy to inspect and easy to delete by prefix when the CLI or API supports that workflow.

Exact search is useful when:

  • the dataset is local and moderate in size
  • deterministic ranking matters
  • recall must be perfect
  • you want simple correctness tests before adding an approximate index

Approximate nearest-neighbor search can be faster at larger scale, but it adds index maintenance, recall tradeoffs, and more operational complexity. TaiDB’s roadmap keeps exact search as the correctness baseline and only adds approximate indexing behind a feature flag after benchmarks justify it.

  • Keep vector dimensions consistent per collection.
  • Store enough metadata to reconstruct or audit the source embedding.
  • Measure query latency with your real vector count and dimension size.
  • Use batch writes when importing many embeddings.
  • Re-run vector search tests after compaction and encryption changes.