Skip to content

Build a unified KV+vector snapshot (sqlite-vec)

Use the sqlite-vec adapter to write one SQLite file per shard that holds both KV rows and vector index — the simplest unified backend.

When to use

  • You want KV + vector under one file per shard.
  • You're happy with sqlite-vec's metric set (cosine, l2).
  • You want straightforward downloadable shards or range-read VFS.

When NOT to use

Install

uv add 'shardyfusion[unified-vector-sqlite,writer-python]'

unified-vector-sqlite = vector-sqlite + cel.

Minimal example

from shardyfusion import HashShardedWriteConfig, PythonRecordInput, VectorSpec
from shardyfusion.writer.python import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory

vector_spec = VectorSpec(dim=384, metric="cosine")

config = HashShardedWriteConfig(
    num_dbs=16,
    s3_prefix="s3://my-bucket/snapshots/items",
    adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
    vector_spec=vector_spec,
)

result = write_hash_sharded(
    records,
    config,
    PythonRecordInput(
        key_fn=lambda r: r["id"].encode(),
        value_fn=lambda r: r["payload"],
        vector_fn=lambda r: (r["id"], r["embedding"], None),
    ),
)
from shardyfusion import ColumnWriteInput, HashShardedWriteConfig, VectorColumnInput, VectorSpec
from shardyfusion.writer.spark import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory
from shardyfusion.serde import ValueSpec

vector_spec = VectorSpec(dim=384, metric="cosine")

config = HashShardedWriteConfig(
    num_dbs=16,
    s3_prefix="s3://my-bucket/snapshots/items",
    adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
    vector_spec=vector_spec,
)

result = write_hash_sharded(
    df,
    config,
    ColumnWriteInput(
        key_col="id",
        value_spec=ValueSpec.binary_col("payload"),
        vector=VectorColumnInput(vector_col="embedding", id_col="id"),
    ),
)
from shardyfusion import ColumnWriteInput, HashShardedWriteConfig, VectorColumnInput, VectorSpec
from shardyfusion.writer.dask import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory
from shardyfusion.serde import ValueSpec

vector_spec = VectorSpec(dim=384, metric="cosine")

config = HashShardedWriteConfig(
    num_dbs=16,
    s3_prefix="s3://my-bucket/snapshots/items",
    adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
    vector_spec=vector_spec,
)

result = write_hash_sharded(
    ddf,
    config,
    ColumnWriteInput(
        key_col="id",
        value_spec=ValueSpec.binary_col("payload"),
        vector=VectorColumnInput(vector_col="embedding", id_col="id"),
    ),
)
from shardyfusion import ColumnWriteInput, HashShardedWriteConfig, VectorColumnInput, VectorSpec
from shardyfusion.writer.ray import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory
from shardyfusion.serde import ValueSpec

vector_spec = VectorSpec(dim=384, metric="cosine")

config = HashShardedWriteConfig(
    num_dbs=16,
    s3_prefix="s3://my-bucket/snapshots/items",
    adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
    vector_spec=vector_spec,
)

result = write_hash_sharded(
    ds,
    config,
    ColumnWriteInput(
        key_col="id",
        value_spec=ValueSpec.binary_col("payload"),
        vector=VectorColumnInput(vector_col="embedding", id_col="id"),
    ),
)

Configuration

  • SqliteVecFactory(vector_spec, page_size=4096, cache_size_pages=-2000, ...) at sqlite_vec_adapter.py:105.
  • Allowed metrics: cosine, l2. dot_product raises ConfigValidationError.
  • The manifest records vector.backend = "sqlite-vec" automatically; UnifiedShardedReader dispatches accordingly.

Functional properties

  • One .sqlite file per shard.
  • KV rows and vector index in the same file.
  • Atomic publish.

Guarantees

  • Successful return => both KV and vector queryable via UnifiedShardedReader.

Weaknesses

  • No dot_product.
  • Recall/QPS lower than LanceDB at large scale (no IVF/HNSW tuning surface).

Failure modes & recovery

Failure Surface Recovery
Unsupported metric ConfigValidationError Use cosine or l2, or switch to LanceDB.
Dim mismatch ConfigValidationError Fix VectorSpec.dim.
Shard write failure ShardCoverageError shard_retry; rerun.

See also