-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update tools and jobs to publish curated data #468
Conversation
this will help us so much :D |
The goal of this update is to create a curated data view along with the raw data view and the npm packages views (see #277). Curation means applying patches to the raw data and re-generating the `idlparsed`, `idlnames` and `idlnamesparsed` folders`. The latter two will only contain IDL names targeted at browsers, although note that actual spec filtering remains a TODO at this stage (see corresponding TODO comments in `prepare-curated.js` and `prepare-packages.js`). To create the curated data view, this update introduces new tools: - a `prepare-curated.js` tool that copies the raw data to the given folder, applies patches (CSS, elements, IDL) when needed, re-generates the `idlparsed` folder, re-generates the `idlnames` and `idlnamesparsed` folders and adjusts the `index.json` and `idlnames.json` files accordingly. - a `prepare-packages.js` tool (replaces the now gone `packages/prepare.js`) that copies relevant curated data from the curated folder to the packages folder. - a `commit-curated.js` tool that updates the `curated` branch with the contents of the given curated folder. Goal is to have the `curated` branch be the one published as GitHub Pages. The test logic was partially re-written to run the tests against the curated data, and against both the curated data and the NPM packages data when tests may yield different results. A new `curate.yml` job publishes the curated data whenever the crawl data is updated. The job also takes care or preparing package release PRs as needed, replacing the previous prepare-xxx-release jobs. The release workflow becomes: 1. Crawled data is updated (`update-ed.yml`) 2. Curated data and package data get generated (`curate.yml`) 3. Curated data and package data get tested (`curate.yml`) 4. The `curated` branch gets updated with the curated data (`curate.yml`) 5. Npm package pre-release PR gets created (`curate.yml`) 6. Someone reviews and merges the PR 7. New version of npm packages are released (`release-package.yml`) 8. A `Raw data for @webref/ttt@vx.y.z` tag gets added to the relevant commit on the `main` branch. 9. A `@webref/ttt@vx.y.z` tag gets added to the relevant commit on the `curated` branch. 10. The `@webref/ttt@latest` tag gets updated to point to the relevant commit on the `curated` branch. Note that, in order for a release to be created, curated data needs to have changed. A change to the static content in the `packages` folder won't be enough to trigger a release for instance. That should not be a major problem.
e33f108
to
f8c69a8
Compare
The tests are only meaningful for curated and package data. The data curation job will take care of running them in any case.
@dontcallmedom CI tests fail because the UUID spec still exists in webref. We should automate removal of files that are linked to a spec that got removed from browser-specs (in such cases, we can be confident that the removal was not accidental) |
Replying to myself:
Actually, that is going to be a problem because we want to do major/minor bumps under |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another amazing piece of work, thank you so much!
A few nits for your consideration, but looks great to me in any case
// rm dstDir/*.${fileExt} | ||
const dstFiles = await fs.readdir(dstDir); | ||
for (const file of dstFiles) { | ||
if (file.endsWith(`.${fileExt}`) && file !== 'package.json') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not a short term concern, but maybe something worth protecting us from at the browser-spec levels: what if a spec ends up using "package" as a shortname?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #472.
|
||
async function cleanCrawlOutcome(spec) { | ||
await Promise.all(Object.keys(spec).map(async property => { | ||
// Only consider properties that link to an extract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since the list of extracted properties is more or less hardcoded in the rest of the code (e.g. L77), it might be better to use a shared hardcoded list rather than rely on heuristics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went the opposite way, actually, and dropped the hardcoded list (except to note the folders that must not be integrated), so that the script can handle new extracts without having to be modified.
Note the heuristics are exactly the same as in the expandCrawlResults
function in Reffy.
Having the static packages files (`index.js`, `README.md`, `package.json`) in the `curated` branch is both useful to trigger an update of the branch when these files are updated (and thus make it possible to publish a new NPM package) as well as to have these files directly available under the `@webref/xxx@vx.y.z` tag.
- Use destructuring assignments for options - Drop hardcoded list of folders in `prepare-curated.js` - Add explanation about clean function purpose
The goal of this update is to create a curated data view along with the raw data view and the npm packages views (see #277).
Curation means applying patches to the raw data and re-generating the
idlparsed
,idlnames
andidlnamesparsed
folders. The latter two will only contain IDL names targeted at browsers, although note that actual spec filtering remains a TODO at this stage (see corresponding TODO comments inprepare-curated.js
andprepare-packages.js
).To create the curated data view, this update introduces new tools:
prepare-curated.js
tool that copies the raw data to the given folder, applies patches (CSS, elements, IDL) when needed, re-generates theidlparsed
folder, re-generates theidlnames
andidlnamesparsed
folders and adjusts theindex.json
andidlnames.json
files accordingly.prepare-packages.js
tool (replaces the now gonepackages/prepare.js
) that copies relevant curated data from the curated folder to the packages folder.commit-curated.js
tool that updates thecurated
branch with the contents of the given curated folder.Goal is to have the
curated
branch be the one published as GitHub Pages.The test logic was partially re-written to run the tests against the curated data, and against both the curated data and the NPM packages data when tests may yield different results.
A new
curate.yml
job publishes the curated data whenever the crawl data is updated. The job also takes care or preparing package release PRs as needed, replacing the previous prepare-xxx-release jobs.The release workflow becomes:
update-ed.yml
)curate.yml
)curate.yml
)curated
branch gets updated with the curated data (curate.yml
)curate.yml
)release-package.yml
)Raw data for @webref/ttt@vx.y.z
tag gets added to the relevant commit on themain
branch.@webref/ttt@vx.y.z
tag gets added to the relevant commit on thecurated
branch.@webref/ttt@latest
tag gets updated to point to the relevant commit on thecurated
branch.Note that, in order for a release to be created, curated data needs to have changed. A change to the static content in the
packages
folder won't be enough to trigger a release for instance. That should not be a major problem.