Build a unified KV+vector snapshot (sqlite-vec)¶
Use the sqlite-vec adapter to write one SQLite file per shard that holds both KV rows and vector index — the simplest unified backend.
When to use¶
- You want KV + vector under one file per shard.
- You're happy with sqlite-vec's metric set (
cosine,l2). - You want straightforward downloadable shards or range-read VFS.
When NOT to use¶
- You need
dot_product— sqlite-vec rejects it; use composite LanceDB. - You only need vector — use vector sqlite-vec.
- KV-only — use KV SQLite.
Install¶
uv add 'shardyfusion[unified-vector-sqlite,writer-python]'
unified-vector-sqlite = vector-sqlite + cel.
Minimal example¶
from shardyfusion import HashShardedWriteConfig, PythonRecordInput, VectorSpec
from shardyfusion.writer.python import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory
vector_spec = VectorSpec(dim=384, metric="cosine")
config = HashShardedWriteConfig(
num_dbs=16,
s3_prefix="s3://my-bucket/snapshots/items",
adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
vector_spec=vector_spec,
)
result = write_hash_sharded(
records,
config,
PythonRecordInput(
key_fn=lambda r: r["id"].encode(),
value_fn=lambda r: r["payload"],
vector_fn=lambda r: (r["id"], r["embedding"], None),
),
)
from shardyfusion import ColumnWriteInput, HashShardedWriteConfig, VectorColumnInput, VectorSpec
from shardyfusion.writer.spark import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory
from shardyfusion.serde import ValueSpec
vector_spec = VectorSpec(dim=384, metric="cosine")
config = HashShardedWriteConfig(
num_dbs=16,
s3_prefix="s3://my-bucket/snapshots/items",
adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
vector_spec=vector_spec,
)
result = write_hash_sharded(
df,
config,
ColumnWriteInput(
key_col="id",
value_spec=ValueSpec.binary_col("payload"),
vector=VectorColumnInput(vector_col="embedding", id_col="id"),
),
)
from shardyfusion import ColumnWriteInput, HashShardedWriteConfig, VectorColumnInput, VectorSpec
from shardyfusion.writer.dask import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory
from shardyfusion.serde import ValueSpec
vector_spec = VectorSpec(dim=384, metric="cosine")
config = HashShardedWriteConfig(
num_dbs=16,
s3_prefix="s3://my-bucket/snapshots/items",
adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
vector_spec=vector_spec,
)
result = write_hash_sharded(
ddf,
config,
ColumnWriteInput(
key_col="id",
value_spec=ValueSpec.binary_col("payload"),
vector=VectorColumnInput(vector_col="embedding", id_col="id"),
),
)
from shardyfusion import ColumnWriteInput, HashShardedWriteConfig, VectorColumnInput, VectorSpec
from shardyfusion.writer.ray import write_hash_sharded
from shardyfusion.sqlite_vec_adapter import SqliteVecFactory
from shardyfusion.serde import ValueSpec
vector_spec = VectorSpec(dim=384, metric="cosine")
config = HashShardedWriteConfig(
num_dbs=16,
s3_prefix="s3://my-bucket/snapshots/items",
adapter_factory=SqliteVecFactory(vector_spec=vector_spec),
vector_spec=vector_spec,
)
result = write_hash_sharded(
ds,
config,
ColumnWriteInput(
key_col="id",
value_spec=ValueSpec.binary_col("payload"),
vector=VectorColumnInput(vector_col="embedding", id_col="id"),
),
)
Configuration¶
SqliteVecFactory(vector_spec, page_size=4096, cache_size_pages=-2000, ...)atsqlite_vec_adapter.py:105.- Allowed metrics:
cosine,l2.dot_productraisesConfigValidationError. - The manifest records
vector.backend = "sqlite-vec"automatically;UnifiedShardedReaderdispatches accordingly.
Functional properties¶
- One
.sqlitefile per shard. - KV rows and vector index in the same file.
- Atomic publish.
Guarantees¶
- Successful return => both KV and vector queryable via
UnifiedShardedReader.
Weaknesses¶
- No
dot_product. - Recall/QPS lower than LanceDB at large scale (no IVF/HNSW tuning surface).
Failure modes & recovery¶
| Failure | Surface | Recovery |
|---|---|---|
| Unsupported metric | ConfigValidationError |
Use cosine or l2, or switch to LanceDB. |
| Dim mismatch | ConfigValidationError |
Fix VectorSpec.dim. |
| Shard write failure | ShardCoverageError |
shard_retry; rerun. |
See also¶
- KV+Vector Overview
- Composite LanceDB
- Read -> Sync —
UnifiedShardedReader - Read -> Async —
AsyncUnifiedShardedReader