New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PGXN meta sketch #4
base: pre-sketch
Are you sure you want to change the base?
Conversation
Also strip newlines from HTML element attributes, as seen in the `description` meta field in the headers, now that I'm including newlines in descriptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTE: This isn't to be considered heavy criticism, but mostly curious answer-seeking for someone not very familiar with the operation of such a service.
distributed together. Packages may be downloaded directly from version | ||
control repositories or in [archive files] generated by a release tag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you expand on what you mean by the "version control repositories", and how you expect to download these packages from there?
I'd assume that you only want to distribute (and include) pre-built extensions using this metadata system, and I'd think that this would be annoying to do in the same system that also hosts the code, if only because including all those binaries in the code versioning repository would be hell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For how to get the packages from source repos, I'd take a hard look at borrowing how Go does it. But perhaps it won't be necessary if we can improve the tooling overall such that everyone just automatically sets up release pipelines to publish to PGXN.
I intend this as an expansion of the PGXN Meta Spec for source code distribution. The idea is, however, to support enough metadata that we can build tools that auto-generate binary packages for distribution. Those packages would have a different (derived?) metadata format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Go is a good example to copy from in this part, as Go packages are cross-platform distributions that you're expected to compile yourself.
I doubt that'll be the case in general for PGXN-distributed packages- I'd expect this to be mostly pre-packaged data shipping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PGXN source packages are source code. Binary packaging will be a different thing (hopefully also to be provided by PGXN). I'm a little confused what I'm omitting from my above explanations to clarify that, or what I might be misunderstanding about your point. :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went into this with the understanding that this would be primarily for PGXN, which I believe to be a package manager for PostgreSQL extensions; based on the pgxn install
line I've seen tossed around lately.
Additionally, an extension will still function when distributed as binaries without C/C++/Rust/Python sources (assuming it was built for the relevant platform), but not all target systems have the infrastructure to build these binaries from sources.
So, my confusion seems to be what PGXN is: It's not a binary package repository, but a source package repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. A lot of the stuff in this proposal is designed to add metadata to facilitate the creation of binary packages, though.
* **Source Distribution:** The contents of a single package bundled together | ||
with [package metadata](#package-metadata) into distributable archive | ||
file, usually named with the last part of the package path or the main | ||
extension, a dash, and the version, e.g., `pgtap-1.14.3.zip`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a "source distribution", shouldn't this include the sources of the package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* **Release:** A single instance of a package and version published on PGXN, | ||
expressed as the package path, an at sign, and the [semver]. Example: | ||
`github.com/theory/pgtap@v1.14.3`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a single instance of a package and version" implies that there can be more instances of a "package and version".
I'd probably cover this as
**Release:** A single version of the package made available to the public on PGXN, expressed as [...]. One package's release can have different packages for different /release channels/.
Example: [...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 752cb61.
* **Package Path:** Identifies a package, declared in the [package | ||
metadata](#package-metadata) file. A package path should describe both | ||
what the package does and where to find it. Typically, a package path | ||
consists of a repository root path --- the directory that contains the | ||
metadata file --- and a directory within the repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you distinguish the repository root path in the Package Path? Must there be only a single directory level under the repository root path to get to the package?
A package path should describe both what the package does and where to find it
I don't think we want package descriptions in package paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm borrowing from Go's definitions (but s/module/extension/g
). It just means they should be somewhat descriptive and not opaque, but a lot of people use funny names anyway.
* **Maintainer**: List of maintainers, each an object with `name` and either | ||
`email` or `url` (or both) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use plural maintainers here. Also, I'm not sure I agree with requiring email or url for all maintainers.
While seemingly useful way to contact maintainers, many projects have a public issue tracker that's better as a point of contact, and archiving these mail-addresses/urls isn't exactly great when considering things like GDPR or CCPA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 752cb61.
"linux": [ "amd64", "arm64" ], | ||
"darwin": [ "amd64", "arm64" ], | ||
"windows": [ "amd64" ], | ||
"freebsd": [ "amd64" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs more care for extenal dependencies: Debian dependencies are often named differently from those in the Red Hat family, which are different again from those in Suse, BSD, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of leaving open for now how to specify dependencies that have all sorts of different names in different places, but reference a few leads later in the doc.
This bit you've highlighted isn't packages, though, but hardware architectures.
* Is `pipeline` really necessary, given configure requirements? I think so, | ||
because it tells the client the preferred build system to use, in case it | ||
can't detect it for some reason. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curiosity: Why is this pipeline
here? Isn't this metadata for packaged packages, not to-be-packaged packages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't understand the question. But the point of pipeline
is so that a client that downloads one of these sources packages knows what build pipeline to use to build it (including compilation, etc.).
"downloads": 20 | ||
}, | ||
"ratings.example.com": { | ||
"stags": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"stags": { | |
"stats": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 71f6c7b.
|
||
* The `aggregates` section aggregates results from multiple sources, for | ||
example summing all downloads or averaging ratings. The list of items to | ||
aggregate could evolve regularly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the system check which items to aggregate, and what aggregate to choose? I could see reasons to use any of a weighted average, mean, median, weighted median, sum, min, max, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno, this is a forward-thinking bit of design I haven't really thought through, yet. I expect to build it up incrementally, though, perhaps just starting with download stats aggregated for 30, 90, and 365 days, as well as all-time.
* Each key in `sources` identifies a trusted downstream source of | ||
information. Each would have its own schema describing its objects and | ||
their meaning, along with URI templates to link to. For example, | ||
`stats.example.com` might have these templates: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this consider summation loops in this federated network of statistics? Assuming anyone can run a PGXN node, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I don't really follow, can you explain what "summation loops" means?
I'd like to come up with a federation model, but admit I wasn't really thinking about it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, there needs to be a way to distinguish locally measured stats in a way that's clearly distinct from federated stats, and distinct from the aggregate.
With summation loops, I basically mean this:
Assume mirror-nl.pgxn.org has "downloads": 30
, and federates with primary.pgxn.org, which has measured 40 downloads locally.
Assuming mirror-nl will update its data from the primary, and notices that it has 40 downloads. Aggregated with its downloads, that gives 70 downloads, which it then publishes.
Primary then pulls the data from mirror-nl for it's download statistics, and notices it has an aggregate 70 downloads. Added to its own pool of 40, that adds up to 110 total downloads.
After another sync, mirror-nl notices it has to update it's aggregate, because the primary now advertises 110 downloads, or 70 more than it's previous 40. Etc. Etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh. I hadn't even thought about the aggregation being bidirectional, but of course people would want that, if things are successful. Not an immediate goal, I think, because you're right, the technical infrastructure to prevent these kinds of loops will need some careful thought.
Already published and in main, but making this PR for commentary.