Extras and dependencies¶
shardyfusion has many optional features (writers, backends, vector engines, metrics backends, CLI). The base install must stay minimal; everything else hides behind an extra.
This page is the operational guide. The conceptual reasoning is in architecture/optional-imports.md.
Two parallel surfaces¶
pyproject.toml declares dependencies in two places, and they must stay aligned:
| Surface | Used by | Defined under |
|---|---|---|
[project.optional-dependencies] |
End users (pip install 'shardyfusion[<extra>]') |
pyproject.toml |
[dependency-groups] |
Tox / CI envs | pyproject.toml |
The user-facing extras compose into bundles (e.g. read-slatedb-async = ["shardyfusion[read-slatedb]", "aiobotocore>=2.12"]).
The dependency groups (backend-slatedb, cap-writer-spark, mod-cel, …) are the atomic units used by tox.ini to assemble per-env dependency sets.
Base dependencies¶
Always installed ([project.dependencies]):
xxhash>=3.4
pydantic>=2.0
pyyaml>=6.0
Anything else must be gated.
Adding a new optional dependency¶
Worked example: adding a new vector backend, vector-foo, that depends on foo-vector>=1.0.
1. Add the dependency group¶
# pyproject.toml — [dependency-groups]
backend-vector-foo = [
"foo-vector>=1.0",
"numpy>=1.24",
]
This is the atomic unit tox composes from.
2. Add the user-facing extra¶
# pyproject.toml — [project.optional-dependencies]
vector-foo = ["foo-vector>=1.0", "numpy>=1.24", "boto3>=1.28"]
Convention: the extra mirrors the dependency-group contents, with boto3 added if the backend reads from S3.
3. Add the lazy import wrapper¶
# shardyfusion/vector/adapters/foo_adapter.py
def _import_foo() -> Any:
try:
import foo_vector
except ImportError as exc:
raise ImportError(
"foo vector backend requires `pip install shardyfusion[vector-foo]`"
) from exc
return foo_vector
Never import foo_vector at module top-level if anything in shardyfusion/__init__.py's import graph would pull it.
4. Wire into tox¶
# tox.ini — [testenv].dependency_groups
vector-foo: backend-vector-foo
Then add envs to env_list and the unit / integration labels:
py{311,312,313}-vector-foo-unit
py{311,312,313}-vector-foo-integration
5. Document the extra¶
Add a row to:
scripts/generate_extras_matrix.py(EXTRA_META) — regenerates the Extras matrix.architecture/optional-imports.md— the canonical extras index.- A new use-case page (e.g.
docs/use-cases/vector/build/foo.md) — seeadding-a-use-case.md.
Then regenerate the matrix page:
uv run python scripts/generate_extras_matrix.py
6. Verify with docs-check¶
just docs-check
The skill cross-checks:
- Every extra documented in docs is in
pyproject.toml. - Every extra in
pyproject.tomlis documented. - No top-level
from foo_vector import …exists in modules reachable fromshardyfusion/__init__.py.
7. Update CI matrix¶
just ci-matrix
This regenerates .github/ci-matrix.json from the tox env list.
When to add an alias¶
When two extras would have identical contents, the older / more-discoverable name becomes a thin alias:
read-sqlite-adaptive = ["shardyfusion[sqlite-adaptive]"]
Aliases are convenient for users but should not multiply — keep one canonical name per backend. Avoid backend-implicit aliases (e.g. a bare vector that silently means LanceDB); prefer explicit names like vector-lancedb so every extra advertises its backend in its name.
When something belongs in [project.dependencies] instead¶
Promote to base only if all of these hold:
- Used by the reader and at least three writer flavors.
- Pure Python with broad version support (3.11–3.13).
- No transitive conflicts with any current extra.
- Footprint <1 MiB compressed.
pyyaml qualifies (used by run records and run registry); boto3 does not (huge transitive surface, optional for some backends).
When to bump a pinned upper bound¶
Upper bounds (e.g. slatedb<0.13) exist when an upstream had a known-breaking change. To bump:
- Bump the pin in
pyproject.tomland the matching[dependency-groups]entry. - Run
just ci d-e2e. - If green, commit. If red, the pin stays where it is and a follow-up issue captures the breakage.
Never silently widen a pin without running the full matrix.
Removing an extra¶
Removing an extra is a breaking change. Steps:
- Confirm no use-case page documents it.
- Remove from
pyproject.toml(both surfaces). - Remove tox envs and label entries.
- Remove the lazy-import wrapper.
- Run
validate-docs— it will surface any stale references. - Note the removal in the next ADR or release notes.
Cross-side extras: sqlite-range¶
Some extras serve both reader and writer roles even though they're named
after one side. sqlite-range is the canonical example:
- Reader: APSW powers the range-read VFS in
SqliteRangeShardReader. - Writer: APSW +
zstandardpower writer-side B-tree metadata extraction (sidecar emission) forSqliteFactoryandSqliteVecFactory— seearchitecture/sqlite-btree-sidecar.md. APSW provides thedbstatvirtual table;zstandardcompresses the sidecar body (~12× ratio).
Sidecar emission is opt-out (default-on) but degrades silently when APSW is
not installed. Users that install only the base sqlite extra will still
write valid shards; they just won't pre-extract the B-tree pages. Install
shardyfusion[sqlite-range] on writer hosts that publish snapshots intended
for range-mode reads.
See also¶
architecture/optional-imports.md— the pattern's design.architecture/sqlite-btree-sidecar.md— the sidecar artifact and dependency.operate/tox-matrix.md— full env list.adding-an-adapter.md— the canonical worked example.