2026-05-04 slatedb 0.12 uniffi Migration¶
- Status:
implemented - Date:
2026-05-04
Summary¶
This engineering note documents the migration of shardyfusion's writer
and reader paths from the legacy synchronous slatedb top-level API
(slatedb.SlateDB, slatedb.SlateDBReader) to the async-only
uniffi-generated bindings under slatedb.uniffi shipped in
slatedb>=0.12,<0.13. It covers the sync→async bridge design, the
removal of read-side checkpoint pinning, the switch to opaque
shardyfusion-generated UUID checkpoint_id values, the
seal()-vs-checkpoint() Protocol change, the typed
SlateDbSettings configuration model, the new
iterator_chunk_size knob, and the perf microbenchmark scaffolding
introduced to guard against bridge-overhead regressions.
1. What problem is being solved or functionality being added by the changes?¶
slatedb 0.12 deleted the synchronous top-level Python API that
shardyfusion was built on. Every public method on Db, DbReader,
WriteBatch, and the iterator types is now async def, and the
read-side checkpoint_id argument no longer exists. The migration
needed to:
- Replumb the hot path onto an async-only library while
preserving shardyfusion's synchronous
DbAdapterandShardReaderProtocols, because Spark/Dask executors and the Python writer'smultiprocessingworkers are sync code paths. Pushing async upward would have rippled into every framework integration. - Replace the read-side checkpoint pinning model that no longer
exists in slatedb. The previous design relied on
DbReaderBuilderaccepting acheckpoint_idto pin a specific on-disk snapshot. - Keep the public shardyfusion API stable so that downstream Spark/Dask/Ray writers and existing snapshots don't churn.
- Avoid catastrophic per-row scan overhead introduced by the sync→async bridge cost (~15–40 µs per round-trip; ~30× slowdown for naive per-row iteration).
- Make missing-symbol failures actionable instead of producing
AttributeErrorstack traces deep inside writer/reader code when slatedb's surface drifts again.
2. What design decisions were considered with their pros and cons and trade offs?¶
Decision 1: How to bridge sync shardyfusion Protocols to async uniffi¶
Option A: Generate a sync wrapper class (SyncDb) around uniffi¶
Pros:
- Looks idiomatic at call sites (db.write(batch) instead of
run_coro(db.write(batch))).
Cons: - Doubles the surface to maintain — every uniffi method needs a sync mirror. - Obscures where async actually happens, making it easy to accidentally call from inside an event loop and deadlock. - Adds a wrapper-object lifetime to track on top of the underlying uniffi object.
Option B: Process-global daemon-thread asyncio loop with run_coro helper (chosen)¶
Pros:
- One file (shardyfusion/_slatedb_runtime.py) owns the bridge.
- Honest at call sites: run_coro(reader.get(k)) makes the
async hop visible.
- Daemon thread gives us process-lifetime semantics without
shutdown coordination.
- Cannot be hijacked by a request-scoped loop (the loop is private).
Cons:
- Per-call cost (~15–40 µs) is non-trivial for hot paths.
- Tests that need a fake event loop must monkey-patch run_coro.
Option C: Make the loop user-pluggable¶
Pros: - Lets advanced callers reuse their own loop.
Cons: - Users will accidentally pin it to a request-scoped loop and deadlock on shutdown. - The Protocol contract becomes "sync, but configurably so" — a worse abstraction.
We chose Option B. The bridge is shardyfusion-owned and non-customizable; cost is amortized by the iterator chunking decision below.
Decision 2: Iterator chunking knob — where it lives¶
Option A: Push chunking into DbAdapter / writer side¶
Cons:
- Writes already batch via WriteBatch; adding a second knob
invites confusion about which path it controls.
Option B: iterator_chunk_size only on SlateDbReaderFactory (chosen)¶
Pros: - Single place to tune, single place to document. - Default 1024 amortizes the bridge cost across rows of typical size; failure mode is "uses more memory per chunk", not correctness.
Cons: - Callers with very large values must override explicitly.
We chose Option B.
Decision 3: How to identify shards now that slatedb has no read-side checkpoint API¶
Option A: Hash the materialized SlateDB / SQLite file (legacy SQLite/SQLiteVec behaviour)¶
Pros: - Content-addressable; identical bytes → identical id.
Cons: - Forces a read-back pass on the writer hot path on every shard close. - Not actually needed for correctness given shardyfusion's invariants (single writer per SlateDB; manifest published only after writers finish; no post-publish updates).
Option B: Opaque uuid.uuid4().hex stamped by the writer (chosen)¶
Pros:
- Zero I/O on the writer hot path.
- Uniqueness guaranteed without reading any bytes.
- Centralized in
shardyfusion._checkpoint_id.generate_checkpoint_id().
- Cache identity for SQLite/SQLiteVec/LanceDB factories is
preserved.
Cons: - Two writes that produce identical bytes get different ids — but this never happens under our invariants.
Option C: Re-hash from the manifest after publish¶
Cons: - Manifest doesn't see the bytes; we'd need a separate pass. - Adds ordering dependency between shard finalize and manifest build.
We chose Option B. SlateDbReaderFactory accepts
checkpoint_id for Protocol symmetry but ignores it — cache identity
for SlateDB shards collapses to db_url only.
Decision 4: Adapter Protocol — seal() vs checkpoint()¶
The old Protocol had checkpoint() -> str | None, which told the
adapter "flush, finalize, and tell me your checkpoint id". With
Decision 3 the writer now stamps the id itself; adapters only need
to flush + finalize.
Option A: Keep checkpoint() -> str | None and ignore the return¶
Cons: - Custom adapters returning a hash silently get their value dropped — no way to notice the contract changed.
Option B: Rename to seal() -> None (chosen)¶
Pros: - Compile-time / Protocol-check failure for any adapter still on the old contract. - Method name accurately describes the new responsibility.
Cons: - Source-incompatible for custom adapters — but the alternative is worse (silent behavior change).
We chose Option B.
Decision 5: Symbol resolution layer¶
Option A: import slatedb.uniffi directly at every call site¶
Cons:
- import shardyfusion blows up on machines without slatedb.
- Missing symbols become AttributeError deep in writer code.
- Tests must patch every site individually.
Option B: Single choke point in _slatedb_symbols.py (chosen)¶
Pros:
- Lazy import isolates the optional-dependency check to one
try/except.
- Missing symbols become a single
DbAdapterError("slatedb.uniffi.X is unavailable").
- Tests monkey-patch sys.modules["slatedb"] and
sys.modules["slatedb.uniffi"] once and every shardyfusion
call site picks up the fake.
We chose Option B.
Decision 6: Configuration model — SlateDbSettings¶
Option A: Keep accepting the JSON-ish dict from the legacy API¶
Cons: - Typos silently lost; new typed Settings class in slatedb 0.12 is not validated.
Option B: Typed SlateDbSettings dataclass with¶
raw_overrides: dict[str, Any] escape hatch + legacy-dict adapter
that emits DeprecationWarning (chosen)
Pros:
- Typos caught at construction time.
- Escape hatch for fields shardyfusion hasn't modeled.
- One-release deprecation cycle keeps the migration non-breaking
for downstream callers.
We chose Option B.
Decision 7: env_file + db_url resolution¶
slatedb 0.12's ObjectStore.resolve(db_url) reads env at resolve
time. We need env present at that moment, scoped to the resolution
call.
Option A: Pull in python-dotenv¶
Cons: - Ships a CLI and global config-file behavior we don't want. - Adds a third-party dependency for ~30 lines of logic.
Option B: In-house apply_env_file() context manager (chosen)¶
Pros: - Scopes env mutations to the resolve call; restores on exit. - No new dependency.
We chose Option B.
Decision 8: KeyRange over kwargs in scan_iter¶
A late perf-test failure caught that uniffi's DbReader.scan takes
a single KeyRange positional — not start=/end= kwargs. We
added get_key_range_class() and now build a
KeyRange(start, start_inclusive=True, end, end_inclusive=False)
once per call, mirroring half-open [start, end) Python
semantics. We also moved iterator construction outside the per-chunk
drain loop. The previous code re-opened the iterator on every
chunk, which would have silently re-served the first N rows forever
once a shard exceeded iterator_chunk_size.
Decision 9: Performance microbenchmarks — gating¶
The bridge overhead is the single largest correctness-adjacent risk in this migration.
Option A: Run perf benchmarks on every CI run¶
Cons: - Adds wall-clock time and flakiness to every PR. - Hard to tune budgets that survive shared-runner variance.
Option B: Marker-gated, on-demand just perf recipe (chosen)¶
Pros:
- Deliberate, focused check when investigating bridge regressions.
- addopts = "-ra -m 'not perf'" in pyproject.toml keeps default
pytest runs clean.
- Loose budgets (~5× measured) catch order-of-magnitude
regressions without flaking.
We chose Option B.
Decision 10: Async migration scope¶
Option A: Migrate shardyfusion to async end-to-end¶
Cons: - Spark/Dask/Ray worker contracts are sync. - This is a separate, much larger project.
Option B: Keep shardyfusion's Protocols sync; bridge per-call (chosen)¶
We chose Option B — see Decision 1.
3. What is the impact of these changes (covering testability, performance, and complexity)?¶
Testability¶
_slatedb_symbols.pygives tests one place to patch (sys.modules["slatedb"]+sys.modules["slatedb.uniffi"]) instead of N call sites.- New helpers
shardyfusion.testing.open_slatedb_db()andopen_slatedb_reader()wrap the canonical "build → resolve → run_coro" pattern so test seed code mirrors production exactly. - The integration-test file URL remap pattern
(
map_s3_db_url_to_file_url(db_url, object_store_root)followed bySlateDbReaderFactory()delegation) is now the standard for any test that materializes data on local disk under ans3://URL. - Perf suite (
tests/integration/perf/,@pytest.mark.perf,just perf) provides a deliberate guardrail for bridge-overhead regressions without polluting default CI.
Performance¶
- Per-call bridge cost: ~15–40 µs measured. Acceptable for
getandwrite(already batched viaWriteBatch). - Per-row iteration without chunking: ~30× slowdown vs. an
in-process iterator.
iterator_chunk_size=1024default recovers most of it; perf budgets catch regressions. - Writer hot path lost the read-back hashing pass that the legacy
SHA-256
checkpoint_idrequired for SQLite/SQLiteVec — a small but real win.
Complexity¶
- Net +1 module (
_slatedb_runtime.py) and +1 module (_checkpoint_id.py); slight expansion of_slatedb_symbols.py. - Net −1 read-side concept (checkpoint pinning) that was never needed under our invariants.
- Bridge call sites are visible (
run_coro(...)) — readers can trace where async actually happens. - Configuration surface narrowed: typed
SlateDbSettingswith oneraw_overridesescape hatch instead of an open-ended dict.
4. API delta: slatedb 0.11.x → 0.12.1¶
| Concern | 0.11.x (legacy) | 0.12.1 (uniffi) |
|---|---|---|
| Module surface | slatedb.SlateDB, slatedb.SlateDBReader |
slatedb.uniffi.{Db, DbBuilder, DbReader, DbReaderBuilder, WriteBatch, ObjectStore, Settings, FlushOptions, FlushType, KeyRange, DbIterator, KeyValue} |
| Sync/async | Sync methods | All methods async def |
| Open writer | SlateDB(local_dir, url=..., **opts) |
await DbBuilder(path, store).build() where store = ObjectStore.resolve(url) |
| Open reader | SlateDBReader(local_dir, url=..., checkpoint_id=...) |
await DbReaderBuilder(path, store).build() (no checkpoint_id arg) |
| Write batch | db.write(batch) (sync) |
await db.write(WriteBatch()) |
| Flush WAL | db.flush() / db.flush_with_options("wal") |
await db.flush_with_options(FlushOptions(flush_type=FlushType.WAL)) |
| Range scan | reader.scan(start=..., end=...) returning sync iterator |
await reader.scan(KeyRange(start=..., start_inclusive=True, end=..., end_inclusive=False)) returning async DbIterator |
| Iterate | for kv in scan_result (sync) |
await iterator.next() returning Optional[KeyValue] (per-row only; no next_batch in 0.12.1) |
| Checkpoint create | db.create_checkpoint() returning hash str |
Removed — no read-side checkpoint pinning API exists |
| Close | implicit GC | await db.shutdown() for Db; DbReader has no explicit close |
| Settings | dict/kwargs to constructor | Settings(...) typed object passed via builder |
| Object store creds | env vars + url string | ObjectStore.resolve(url) reads env at resolve time |
| Python version floor | 3.9+ | 3.11+ (uniffi-generated bindings) |
Symbol availability check¶
from importlib.metadata import version
print(version("slatedb")) # '0.12.1'
from slatedb.uniffi import (
Db, DbBuilder, DbReader, DbReaderBuilder,
WriteBatch, ObjectStore, Settings,
FlushOptions, FlushType, KeyRange,
)
slatedb<0.12 lacks the uniffi submodule entirely; slatedb==0.11.1
satisfied a naive slatedb<0.13 constraint and triggered the
migration's first false-start. The pin is now
slatedb>=0.12,<0.13 in both pyproject.toml extras and main deps.
5. Observed-but-deliberate gotchas¶
DbReader.scanrequiresKeyRange— not kwargs. Documented in the_SlateDbReaderHandledocstring and AGENTS.md Gotchas.DbIteratorexposes onlynextin 0.12.1 — nonext_batch.scan_iterkeeps anext_batchfast path behindtry/except AttributeErrorfor forward compatibility with later slatedb releases that may add it.- Per-call bridge cost is real (~15–40 µs); any future hot-path
call must batch through
run_coroonce, not per item. SlateDbReaderFactory.checkpoint_idis accepted but ignored. Document this in any new factory subclass or wrapper to avoid misleading callers.- Test patching must hit both
sys.modules["slatedb"]andsys.modules["slatedb.uniffi"]; patching only the top-level module doesn't intercept_slatedb_symbols._import_uniffi().
6. What was explicitly not done, and why¶
- Did not introduce a sync wrapper class around uniffi. See Decision 1, Option A.
- Did not move shardyfusion to async end-to-end. See Decision 10.
- Did not pin slatedb to an exact version.
>=0.12,<0.13lets patch releases through; the symbol-resolution layer makes any newly-renamed class fail at one obvious site. - Did not preserve content-addressed checkpoint IDs. See Decision 3 — the single-writer + serial-publish invariant makes content addressing unnecessary, and removing it deleted a write-time hashing pass we were paying for on every shard close.
7. Post-merge audit¶
After the migration landed, two concerns were raised in review and investigated quantitatively before closing the work.
7.1 Bridge-loop contention¶
Concern. All synchronous slatedb operations (writer + sync reader) funnel through one process-global asyncio loop running on a single daemon thread. Spark, Dask, Ray, and Python writers can run many worker threads inside one Python process — does the shared loop serialise them?
Topology. Cluster writers (mapPartitionsWithIndex for Spark,
analogous for Dask/Ray) put one shard per Python worker process.
Inside one process the partition writer is itself sequential, so
there is at most one in-flight write_batch per process anyway —
the loop is not contended. Multi-shard-per-process only occurs in
the Python writer's parallel=True mode, which uses multiprocessing
spawn; each subprocess has its own loop.
The interesting case is a single process serving many concurrent
sync get/scan calls — e.g. a FastAPI app wrapping
ConcurrentShardedReader with a thread pool.
Measurement. A microbenchmark on the local file:// backend
(see commit log; not committed as a perf test because the absolute
numbers are FS-dependent):
| Topology | per-op (write_batch ×100 rows) |
|---|---|
| 1 thread, shared bridge loop | 101.10 ms |
| 2 threads, shared bridge loop, separate DBs | 101.09 ms |
| 4 threads, shared bridge loop, separate DBs | 101.12 ms |
| 8 threads, shared bridge loop, separate DBs | 101.18 ms |
| 8 threads, separate loops, separate DBs | 101.13 ms |
Pure bridge cost (no slatedb work) under the same loop:
| Topology | per-call latency | aggregate ops/s |
|---|---|---|
| 1 thread | 18.42 µs | 54,300 |
| 8 threads (shared loop) | 16.05 µs | 62,300 |
Findings.
- The bridge contributes ~18 µs per call. A typical
write_batchspends ~101 ms in slatedb, so the bridge is <0.02 % of write latency. For point gets it's a larger fraction but still well below the 1 ms budget set in the perf tests. - Adding threads with separate loops (option C) gives no throughput improvement either, which means slatedb's internal Tokio runtime is already serialising per-DB writes — the bridge is not the bottleneck even in principle.
- Aggregate bridge throughput actually increases slightly with threads (62k > 54k ops/s) because the loop amortises the wakeup cost across multiple inflight submissions.
Verdict. Single shared bridge loop is correct. No mitigation
needed. The cost of switching to per-thread loops would be losing
uniffi's set_event_loop registration invariant (uniffi binds Rust
async tasks to the loop that created the resource) for no
measurable benefit.
7.2 What did the migration give up?¶
A line-by-line audit of pre-0.12 capabilities versus the current adapter surface:
| Pre-0.12 capability | Post-0.12 status | Operational impact |
|---|---|---|
db.create_checkpoint(scope="durable") returning a slatedb-managed checkpoint id |
Removed. uniffi 0.12 has no checkpoint create/pin API. Writer stamps an opaque uuid4().hex. |
Manifests still record a per-shard id, so cleanup and winner-selection are unaffected. Lost: ability to pin-read a specific historical checkpoint of a SlateDB shard via the engine. Safe under the single-writer + serial-publish + S3-strong-consistency invariant: shards at db_url are immutable after publish, so readers never need a checkpoint pin. |
Content-addressed checkpoint_id (SHA-256 of materialized DB) |
Replaced by uuid4().hex. |
_winner_sort_key is (attempt, task_attempt_id, db_url) — checkpoint_id was never consulted for tiebreaks. No regression. External consumers that persisted SHA-256 fingerprints must switch to comparing db_bytes() or computing their own digest from the downloaded shard. |
Reader-side checkpoint_id pinning (with_checkpoint_id) |
No equivalent in uniffi 0.12. SlateDbReaderFactory.checkpoint_id accepted for Protocol symmetry; ignored at runtime. |
Safe under the immutability invariant; if a future workflow needs reader pinning we will need a SlateDB upstream change first. |
flush_with_options("wal") short string form |
Replaced with typed FlushOptions(flush_type=FlushType.WAL). |
Cosmetic; same semantics. |
Free-form JsonObject settings forwarded as JSON to slatedb |
Now typed SlateDbSettings with raw_overrides escape hatch. |
Net positive: typed surface, IDE help. Legacy dict shape is rejected at the type level (library not yet released; no migration cycle needed). |
SlateDB(path, url=..., env_file=..., settings=...) synchronous constructor |
DbBuilder("", ObjectStore.resolve(db_url)) + Settings.set(...) per option, opened via await builder.build() through the bridge. |
More verbose, more explicit. local_dir is now unused for the SlateDB backend (data goes straight to object store) but is still threaded through factories for symmetry with SQLite/SQLiteVec/LanceDB adapters. |
Db.close() |
Renamed Db.shutdown(); awaitable. |
Adapter close() calls run_coro(self._db.shutdown()). Same lifecycle semantics. |
DbIterator.next_batch(n) |
Not present in 0.12.1; only next(). |
scan_iter retains a next_batch try/except fast path for forward compatibility; chunking happens in Python via iterator_chunk_size (default 1024 on SlateDbReaderFactory). |
| Sync I/O directly from worker thread | Now goes through bridge loop. | ~18 µs per call; mitigated by batching. See §7.1. |
Pre-migration DbAdapter.checkpoint() -> str \| None |
Renamed to seal() -> None; checkpoint id stamped by writer. |
Custom adapters must rename. Documented in CHANGELOG, AGENTS.md Gotchas, and adapter-authoring guide. |
Net assessment. All losses are documented, and either (a) unused under the existing invariants (checkpoint pinning, content addressing) or (b) cosmetic (constructor shape, flush options). The migration removed a write-time SHA-256 pass and added ~18 µs of bridge overhead per call — net positive on the hot path.