Verifying is heavily network-bound if we need to update EVERY package, so let's just check if local is consist instead of making sure everything is up-to-date.
3.8 KiB
Shadowmire
Shadowmire syncs PyPI (or plain HTTP(S) PyPI mirrors using Shadowmire) with a lightweight and easy approach.
Docs
Background
Bandersnatch is the recommended solution to sync from PyPI. However, it has these 2 issues that haven't been solved for a long time:
- Bandersnatch does not support removing packages that have been removed from upstream, making it easier to be the target of supply chain attack.
- The upstream must implement XML-RPC APIs, which is not acceptable for most mirror sites.
Shadowmire is a light solution to these issues.
Syncing Protocol
From PyPI
PyPI's XML-RPC APIs have list_packages_with_serial()
method to list ALL packages with "serial" (you could consider it as a version integer that just increases every few moments). changelog_last_serial()
and changelog_since_serial()
are NOT used as they could not handle package deletion. Local packages not in the list result are removed.
Results from list_packages_with_serial()
are stored in remote.json
. local.db
is a sqlite database which just stores every local package name and its local serial. local.json
is dumped from local.db
for downstream cosumption.
From upstream using shadowmire
Obviously, list_packages_with_serial()
's alternative is the local.json
, which could be easily served by any HTTP server. Don't use local.db
, as it could have consistency issues when shadowmire upstream is syncing.
How to use
Important
Shadowmire is still in experimental state. Please consider take a snapshot before using (if you're using ZFS/BtrFS), to avoid Shadowmire eating all you packages in accident.
If you just need to fetch all indexes (and then use a cache solution for packages):
REPO=/path/to/pypi ./shadowmire.py sync
If REPO
env is not set, it defaults to current working directory.
If you need to download all packages, add --sync-packages
.
./shadowmire.py sync --sync-packages
Important
If you sync with indexes only first,
--sync-packages
would NOT update packages which have been the latest versions. Useverify
command for this.
Sync command also supports --exclude
-- you could give multiple regexes like this:
./shadowmire.py sync --exclude package1 --exclude ^0
Also it supports prerelease filtering like this:
./shadowmire.py sync --sync-packages --prerelease-exclude '^duckdb$'
And --shadowmire-upstream
, if you don't want to sync from PyPI directly.
./shadowmire.py sync --shadowmire-upstream http://example.com/pypi/
If you already have a pypi repo, use genlocal
first to generate a local db:
./shadowmire.py genlocal
Verify command could be used if you believe that something is wrong (inconsistent). It would:
-
remove packages NOT in local db
-
remove packages NOT in remote (with consideration of
--exclude
) -
make sure all local indexes are valid, and (if --sync-packages) have valid local package files
(
--prerelease-exclude
would be ignored) -
delete unreferenced files in
packages
folder
./shadowmire.py verify --sync-packages
Verify command accepts same arguments as sync.
Also, if you need debugging, you could use do-update
and do-remove
command to operate on a single package.
Acknowledgements
This project uses some code from PyPI's official mirroring tools, bandersnatch.
Naming
Suggested by LLM.
Sure, to capture the mysterious, fantastical, and intriguing nature of "Bandersnatch," here are some similar-style project name suggestions:
- Shadowmire:
- Meaning: A mysterious shadowy swamp, implying the unknown and exploration.