Replicate the full npmjs registry and watch for updates. Supports being killed at any moment during either full replication or watching.
npm install
APPLICATION_ID='ALGOLIA_APPLICATION_ID' \
API_KEY='ALGOLIA_ADMIN_API_KEY' \
INDEX_PREFIX='npmjs-'
CONFIG='{
"NPM_REGISTRY": "https://skimdb.npmjs.com/registry",
"PACKAGES_INDEXNAME": "registry",
"REPLICATION_CONCURRENCY": 10000,
"DOWNLOADS_CONCURRENCY": 100,
"WATCH_CONCURRENCY": 1,
"EXIT_AFTER": "5min"
}' \
./run
Build it:
docker build -t npmjs-connector .
Run it:
docker run \
-e APPLICATION_ID='ALGOLIA_APPLICATION_ID' \
-e API_KEY='ALGOLIA_ADMIN_API_KEY' \
-e INDEX_PREFIX='npmjs-' \
-e CONFIG='{
"NPM_REGISTRY": "https://skimdb.npmjs.com/registry",
"PACKAGES_INDEXNAME": "registry",
"REPLICATION_CONCURRENCY": 10000,
"DOWNLOADS_CONCURRENCY": 100,
"WATCH_CONCURRENCY": 1,
"EXIT_AFTER": "5min"
}' \
npmjs-connector
The goal is to be resilient to failures or interruptions of service without having to re-replicate everything.
- get current lastSequence known, either the current from repo or the one from index
- get current replicateLastPackage known if not found, browse repository to find the first package (by page) if found but special "DONE" token, pass replication
- start replication at this package
- every loop of replication = save replicateLastPackage
- once replication is done, save lastSequence known, store special DONE flag in replicateLastPackage
- start download job, start at downloadsLastPackage or first package of index
- at each download run, save downloadsLastPackage
- use lastSequence known, start watching
- every watch loop, save lastSequence known
Download count and repo watching can be done in parallel once full replication is done.