Skip to content

Read a SQLite snapshot synchronously

Use ShardedReader with a SQLite reader factory to read sharded SQLite snapshots synchronously. Two sub-flavors selected by the factory:

  • Download-and-cache (SqliteReaderFactory) — the entire shard .db is downloaded once, then all reads are local.
  • Range-read VFS (SqliteRangeReaderFactory, requires apsw) — SQLite issues HTTP range GETs against the S3 object; no full download required.

When to use

  • Snapshot was built with the SQLite adapter (any writer flavor).
  • You're in synchronous code.
  • Pick range-read when shards are large and you only touch a few keys per shard.
  • Pick download-and-cache when shards are small and you want lowest per-key latency once warm.

When NOT to use

Install

# Download-and-cache (uses stdlib sqlite3)
uv add 'shardyfusion[read-sqlite]'

# Range-read VFS (requires apsw + obstore)
uv add 'shardyfusion[read-sqlite-range]'

# Adaptive (auto-pick download or range per snapshot — recommended default).
# Composes both backends so AdaptiveSqliteReaderFactory can resolve to either tier.
uv add 'shardyfusion[read-sqlite-adaptive]'

Minimal example

Download-and-cache

from shardyfusion import ShardedReader
from shardyfusion.sqlite_adapter import SqliteReaderFactory

reader = ShardedReader(
    s3_prefix="s3://my-bucket/snapshots/users",
    local_root="/var/cache/shardy/users",
    reader_factory=SqliteReaderFactory(),
)

value = reader.get(b"user-123")
many = reader.multi_get([b"user-1", b"user-2"])

Range-read VFS

from shardyfusion.sqlite_adapter import SqliteRangeReaderFactory

reader = ShardedReader(
    s3_prefix="s3://my-bucket/snapshots/users",
    local_root="/var/cache/shardy/users",
    reader_factory=SqliteRangeReaderFactory(page_cache_pages=1024),
)

The range-read VFS translates each SQLite page fetch (default 4 KiB) into an S3 Range request, with an LRU page cache in memory.

Configuration

The reader constructor and full API surface (get, multi_get, route_key, shard_for_key, reader_for_key, refresh, snapshot_info, shard_details, health) are identical to the SlateDB variant — see sync SlateDB. The only thing that changes is the reader_factory.

SqliteRangeReaderFactory extra parameters:

Field Default Purpose
page_cache_pages 1024 Page-cache LRU size (~4 MiB). 0 disables caching.
s3_connection_options None Endpoint URL, region, retries.
credential_provider None S3 credentials.

Single-shard access (SQL & scans)

Borrow a shard handle to run arbitrary SQL on a single shard:

with reader.reader_for_key(b"user-123") as handle:
    cur = handle.reader.connection.cursor()
    cur.execute("SELECT count(*) FROM kv WHERE k LIKE ?", (b"user-%",))
    print(cur.fetchone())

Both download-and-cache and range-read expose the underlying sqlite3.Connection or APSW connection.

You can also enumerate shards:

for shard in reader.shard_details():
    print(shard.db_id, shard.db_url, shard.row_count)

The db_url is a stable S3 URL — you can download it independently or open it in any SQLite tool.

Functional properties

  • Download-and-cache: first access pulls the whole shard .db to local_root; subsequent reads are local-disk-fast.
  • Range-read VFS: per-page S3 GETs with an LRU cache; cold-cache latency higher but no full-shard download.
  • Same routed API as SlateDB (get, multi_get, refresh, etc.).
  • Direct SQL on any shard via reader_for_key().reader.connection.

Non-functional properties

Download-and-cache Range-read
Disk usage sum(shard_size) once warm ~page_cache_pages × 4 KiB memory; ~zero disk
Latency (cold) Full shard download One S3 GET per page (~tens of ms)
Latency (warm) Local-disk SQLite Memory hit
S3 GET cost One GET per shard Amplifies on scans; prefer for point lookups

Guarantees

  • Same routing/snapshot pinning guarantees as SlateDB. See sync SlateDB.
  • Both reader strategies see the same .db file — the byte content is identical; only the access path differs.

Weaknesses

  • Download-and-cache: cold-shard latency proportional to shard size; disk pressure proportional to total snapshot size.
  • Range-read: higher per-key latency, more S3 GETs, requires apsw (LGPL).
  • SQLite has no built-in compression — shard size = raw (key, value) bytes plus index overhead.

Failure modes & recovery

Failure Surface Recovery
Disk full (download-and-cache) DbAdapterError Free disk or switch to range-read.
Range-read 416 / connection drop DbAdapterError (wrapped S3VfsError) Retry; check S3 connectivity.
Missing _CURRENT / malformed manifest Same as SlateDB — see sync SlateDB.
apsw not installed for range-read ImportError at factory call uv add 'shardyfusion[read-sqlite-range]'.

See also