Optional imports¶
shardyfusion is one package with many feature dimensions: four writer engines (Python, Spark, Dask, Ray), three KV backends (SlateDB, SQLite-download, SQLite-range-read), two vector backends (LanceDB, sqlite-vec), two metrics backends (Prometheus, OTel), two manifest stores (S3, Postgres), and a CLI. Forcing every install to pull all of these would be untenable.
The optional-imports pattern keeps the package importable with no extras installed, and gates feature availability on per-extra dependency groups.
The pattern¶
- Define an extra in
pyproject.tomlunder[project.optional-dependencies]. - The module that depends on it is imported lazily — never at the top level of
shardyfusion/__init__.py. - The lazy import is wrapped in a helper that raises a clear "install with
pip install shardyfusion[<extra>]" message if the dependency is missing.
Example: CEL¶
# shardyfusion/_writer_core.py:113
def _get_cel_imports() -> tuple[Any, ...]:
from shardyfusion.cel import compile_cel, route_cel_batch # local import
return compile_cel, route_cel_batch
shardyfusion.cel itself does:
# shardyfusion/cel.py
def _import_cel() -> Any:
try:
from cel_expr_python.cel import NewEnv, Type
except ImportError as e:
raise ImportError("CEL routing requires `pip install shardyfusion[cel]`") from e
return NewEnv, Type
The CEL package is cel-expr-python — a fast Rust-backed CEL implementation, not the older pure-Python celpy.
Example: vector adapters¶
shardyfusion/vector/adapters/lancedb_adapter.py imports lancedb only inside the constructor of the LanceDB factory. If lancedb is missing, the user gets ImportError: ... pip install shardyfusion[vector-lancedb] — but the rest of shardyfusion (KV writers, readers) is untouched.
Example: UnifiedShardedReader¶
shardyfusion/__init__.py exposes UnifiedShardedReader via __getattr__:
def __getattr__(name):
if name == "UnifiedShardedReader":
from shardyfusion.reader.unified_reader import UnifiedShardedReader
return UnifiedShardedReader
raise AttributeError(name)
Importing shardyfusion does not pull unified_reader, which would in turn pull vector dependencies.
Extras index¶
| Extra | What it enables | Notes |
|---|---|---|
slatedb |
SlateDB driver | Base building block; pulled in by reader/writer extras. |
sqlite |
SQLite driver | Base building block. |
sqlite-range |
APSW + range-read VFS | Base building block for range-read SQLite. |
read-slatedb |
SlateDB sync reader | Sync reader. |
read-slatedb-async |
SlateDB async reader (aiobotocore) | Async reader. |
read-sqlite |
SQLite download-and-cache reader | Sync. |
read-sqlite-range |
SQLite range-read reader (APSW) | Sync. |
sqlite-adaptive |
Composes sqlite + sqlite-range |
Required by AdaptiveSqliteReaderFactory (default reader mode). |
read-sqlite-adaptive |
Alias for sqlite-adaptive |
Sync adaptive reader. |
sqlite-async |
Async SQLite readers (download + range) | Async. |
sqlite-adaptive-async |
Async adaptive SQLite reader (aiobotocore) |
Required by AsyncAdaptiveSqliteReaderFactory. |
writer-spark-slatedb |
Spark writer (SlateDB) | Requires Java. |
writer-spark-sqlite |
Spark writer (SQLite) | Requires Java. |
writer-python-slatedb |
Python writer (SlateDB) | Pure Python. |
writer-python-sqlite |
Python writer (SQLite) | Pure Python. |
writer-dask-slatedb |
Dask writer (SlateDB) | |
writer-dask-sqlite |
Dask writer (SQLite) | |
writer-ray-slatedb |
Ray writer (SlateDB) | |
writer-ray-sqlite |
Ray writer (SQLite) | |
cli-minimal |
click>=8.0 only |
CLI binary without any backend. Combine with a reader extra (e.g. [cli-minimal,read-sqlite]). PyYAML is a base dep, not part of this extra. |
cli |
Kitchen-sink CLI | Includes cli-minimal plus all read backends (slatedb sync/async, sqlite download/range/adaptive sync/async, unified-slatedb-lancedb + unified-sqlite-vec). |
cel |
cel-expr-python |
CEL routing. |
metrics-prometheus |
prometheus_client |
Prometheus metrics backend. |
metrics-otel |
opentelemetry SDK |
OTel metrics backend. |
vector-lancedb |
LanceDB vector backend | |
vector-sqlite |
sqlite-vec unified KV+vector | |
unified-slatedb-lancedb |
Composite KV+vector wiring (SlateDB + LanceDB) | For UnifiedShardedReader. |
unified-sqlite-vec |
Composite KV+vector wiring (sqlite-vec) | |
all |
Convenience runtime bundle | Includes readers, writers, CLI, metrics, CEL, and vector extras; excludes dev/test/quality/docs extras. |
test |
Test runner deps | Dev. |
quality |
Lint / typecheck deps | Dev. |
docs |
MkDocs + plugins | Dev. |
For the canonical list, see pyproject.toml. The validate skill (.opencode/skills/validate/SKILL.md) cross-checks every extra documented in docs against the canonical list.
Why not "just pip install everything"¶
- Conflicting transitive deps: LanceDB and pyspark have incompatible Arrow versions in some combinations.
- Container image size: minimal reader installs are ~50MB; pulling vector + Spark would push past 1GB.
- Python version coverage: shardyfusion targets Python 3.11–3.13 (
requires-python = ">=3.11,<3.14"). Some optional dependencies have narrower support windows; gating them keeps the base install broadly compatible.
Contributor rules¶
When adding a new optional dependency:
- Add it to
[project.optional-dependencies]under a meaningful extra. - Import it lazily — never at module top-level for any module imported by
shardyfusion/__init__.py. - Wrap the import in a helper that raises
ImportErrorwith apip install shardyfusion[<extra>]message. - Add it to the extras index in this page.
- Add a use-case page that exercises the new extra.
- Run
validate-docs; it will refuse to pass if extras are out of sync.
See also¶
use-cases/extras-matrix.md— visual map from every use case to the extra you need.contributing/extras-and-dependencies.md— operational guide.contributing/adding-an-adapter.md— worked example.