Choosing a writer and backend¶

This section covers building sharded KV snapshots. Pick your writer based on your data source and infrastructure:

Writer	Input type	Java required	Cluster required	Best for
Python	`Iterable[T]`	No	No	Single-host, streaming, simplicity
Spark	PySpark `DataFrame`	Yes	Optional (local mode works)	Large-scale ETL, existing Spark pipeline
Dask	Dask `DataFrame`	No	Optional	Distributed scale-out without JVM
Ray	Ray `Dataset`	No	Optional	ML preprocessing pipelines, actor scheduling

All writers share the same core behavior: deterministic routing, attempt-isolated paths, deterministic winner selection, two-phase publish, and run records. See KV Storage Overview for the conceptual model.

Choosing a backend¶

Backend	Read-side access	When to use
SlateDB (default)	Point-key `get` / `multi_get`	Lowest friction, LSM characteristics, default for most users
SlateDB (local)	Point-key `get` / `multi_get`	Writes to local disk, bulk uploads to S3 — decouples write throughput from S3 latency
SQLite	Point-key + SQL queries + range-read VFS	Need SQL, single-file shards, or remote page-level access

Backend selection is a single config swap (adapter_factory=SqliteFactory() instead of default). Everything else — routing, publishing, reading — works identically.