PGXN meta sketch #4

theory · 2024-03-21T19:46:01Z

Already published and in main, but making this PR for commentary.

Also strip newlines from HTML element attributes, as seen in the `description` meta field in the headers, now that I'm including newlines in descriptions.

MMeent

NOTE: This isn't to be considered heavy criticism, but mostly curious answer-seeking for someone not very familiar with the operation of such a service.

MMeent · 2024-03-21T19:54:55Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+    distributed together. Packages may be downloaded directly from version
+    control repositories or in [archive files] generated by a release tag.


Could you expand on what you mean by the "version control repositories", and how you expect to download these packages from there?

I'd assume that you only want to distribute (and include) pre-built extensions using this metadata system, and I'd think that this would be annoying to do in the same system that also hosts the code, if only because including all those binaries in the code versioning repository would be hell.

For how to get the packages from source repos, I'd take a hard look at borrowing how Go does it. But perhaps it won't be necessary if we can improve the tooling overall such that everyone just automatically sets up release pipelines to publish to PGXN.

I intend this as an expansion of the PGXN Meta Spec for source code distribution. The idea is, however, to support enough metadata that we can build tools that auto-generate binary packages for distribution. Those packages would have a different (derived?) metadata format.

I don't think Go is a good example to copy from in this part, as Go packages are cross-platform distributions that you're expected to compile yourself.
I doubt that'll be the case in general for PGXN-distributed packages- I'd expect this to be mostly pre-packaged data shipping.

PGXN source packages are source code. Binary packaging will be a different thing (hopefully also to be provided by PGXN). I'm a little confused what I'm omitting from my above explanations to clarify that, or what I might be misunderstanding about your point. :-(

I went into this with the understanding that this would be primarily for PGXN, which I believe to be a package manager for PostgreSQL extensions; based on the pgxn install line I've seen tossed around lately.

Additionally, an extension will still function when distributed as binaries without C/C++/Rust/Python sources (assuming it was built for the relevant platform), but not all target systems have the infrastructure to build these binaries from sources.

So, my confusion seems to be what PGXN is: It's not a binary package repository, but a source package repository.

Yes, that's right. A lot of the stuff in this proposal is designed to add metadata to facilitate the creation of binary packages, though.

MMeent · 2024-03-21T19:56:27Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+*   **Source Distribution:** The contents of a single package bundled together
+    with [package metadata](#package-metadata) into distributable archive
+    file, usually named with the last part of the package path or the main
+    extension, a dash, and the version, e.g., `pgtap-1.14.3.zip`.


As a "source distribution", shouldn't this include the sources of the package?

Yes, like why gets published to PGXN today. I don't talk about the contents so much, as this document is about the metadata, but perhaps at some point we should get more into designing a source distribution format like [Python defines. Today it's mostly driven by the needs of PGXS or pgrx.

MMeent · 2024-03-21T20:03:01Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+*   **Release:** A single instance of a package and version published on PGXN,
+    expressed as the package path, an at sign, and the [semver]. Example:
+    `github.com/theory/pgtap@v1.14.3`.


"a single instance of a package and version" implies that there can be more instances of a "package and version".

I'd probably cover this as

**Release:** A single version of the package made available to the public on PGXN, expressed as [...]. One package's release can have different packages for different /release channels/. Example: [...]

Done in 752cb61.

MMeent · 2024-03-21T20:06:53Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+*   **Package Path:** Identifies a package, declared in the [package
+    metadata](#package-metadata) file. A package path should describe both
+    what the package does and where to find it. Typically, a package path
+    consists of a repository root path --- the directory that contains the
+    metadata file --- and a directory within the repository.


How do you distinguish the repository root path in the Package Path? Must there be only a single directory level under the repository root path to get to the package?

A package path should describe both what the package does and where to find it

I don't think we want package descriptions in package paths.

I'm borrowing from Go's definitions (but s/module/extension/g). It just means they should be somewhat descriptive and not opaque, but a lot of people use funny names anyway.

MMeent · 2024-03-21T20:11:37Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+*   **Maintainer**: List of maintainers, each an object with `name` and either
+    `email` or `url` (or both)


I'd use plural maintainers here. Also, I'm not sure I agree with requiring email or url for all maintainers.
While seemingly useful way to contact maintainers, many projects have a public issue tracker that's better as a point of contact, and archiving these mail-addresses/urls isn't exactly great when considering things like GDPR or CCPA.

Done in 752cb61.

MMeent · 2024-03-21T21:30:18Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+    "linux": [ "amd64", "arm64" ],
+    "darwin": [ "amd64", "arm64" ],
+    "windows": [ "amd64" ],
+    "freebsd": [ "amd64" ]


I think this needs more care for extenal dependencies: Debian dependencies are often named differently from those in the Red Hat family, which are different again from those in Suse, BSD, etc.

I'm kind of leaving open for now how to specify dependencies that have all sorts of different names in different places, but reference a few leads later in the doc.

This bit you've highlighted isn't packages, though, but hardware architectures.

MMeent · 2024-03-21T21:31:39Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+*   Is `pipeline` really necessary, given configure requirements? I think so,
+    because it tells the client the preferred build system to use, in case it
+    can't detect it for some reason.


Curiosity: Why is this pipeline here? Isn't this metadata for packaged packages, not to-be-packaged packages?

Sorry, I don't understand the question. But the point of pipeline is so that a client that downloads one of these sources packages knows what build pipeline to use to build it (including compilation, etc.).

MMeent · 2024-03-21T21:38:33Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+        "downloads": 20
+      },
+      "ratings.example.com": {
+        "stags": {


Suggested change

"stags": {

"stats": {

Fixed in 71f6c7b.

MMeent · 2024-03-21T21:40:35Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+
+*   The `aggregates` section aggregates results from multiple sources, for
+    example summing all downloads or averaging ratings. The list of items to
+    aggregate could evolve regularly.


How does the system check which items to aggregate, and what aggregate to choose? I could see reasons to use any of a weighted average, mean, median, weighted median, sum, min, max, etc.

Dunno, this is a forward-thinking bit of design I haven't really thought through, yet. I expect to build it up incrementally, though, perhaps just starting with download stats aggregated for 30, 90, and 365 days, as well as all-time.

MMeent · 2024-03-21T21:41:53Z

content/post/postgres/rfc-pgxn-metadata-sketch.md

+*   Each key in `sources` identifies a trusted downstream source of
+    information. Each would have its own schema describing its objects and
+    their meaning, along with URI templates to link to. For example,
+    `stats.example.com` might have these templates:


Does this consider summation loops in this federated network of statistics? Assuming anyone can run a PGXN node, of course.

Sorry, I don't really follow, can you explain what "summation loops" means?

I'd like to come up with a federation model, but admit I wasn't really thinking about it here.

Basically, there needs to be a way to distinguish locally measured stats in a way that's clearly distinct from federated stats, and distinct from the aggregate.

With summation loops, I basically mean this:

Assume mirror-nl.pgxn.org has "downloads": 30, and federates with primary.pgxn.org, which has measured 40 downloads locally.

Assuming mirror-nl will update its data from the primary, and notices that it has 40 downloads. Aggregated with its downloads, that gives 70 downloads, which it then publishes.

Primary then pulls the data from mirror-nl for it's download statistics, and notices it has an aggregate 70 downloads. Added to its own pool of 40, that adds up to 110 total downloads.

After another sync, mirror-nl notices it has to update it's aggregate, because the primary now advertises 110 downloads, or 70 more than it's previous 40. Etc. Etc.

Oooh. I hadn't even thought about the aggregation being bidirectional, but of course people would want that, if things are successful. Not an immediate goal, I think, because you're right, the technical infrastructure to prevent these kinds of loops will need some careful thought.

theory added 2 commits March 21, 2024 14:52

Add post PGXN Meta v2 sketch post

447bb07

Also strip newlines from HTML element attributes, as seen in the `description` meta field in the headers, now that I'm including newlines in descriptions.

Remove PGXN v2 tag

d35c4c8

theory self-assigned this Mar 21, 2024

theory changed the title ~~Pgxn meta sketch~~ PGXN meta sketch Mar 21, 2024

MMeent reviewed Mar 21, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PGXN meta sketch #4

PGXN meta sketch #4

theory commented Mar 21, 2024

MMeent left a comment

MMeent Mar 21, 2024

theory Mar 21, 2024

MMeent Mar 22, 2024

theory Mar 22, 2024

MMeent Mar 22, 2024

theory Mar 22, 2024

MMeent Mar 21, 2024

theory Mar 21, 2024

MMeent Mar 21, 2024

theory Mar 25, 2024

MMeent Mar 21, 2024

theory Mar 21, 2024 •

edited

MMeent Mar 21, 2024

theory Mar 25, 2024

MMeent Mar 21, 2024

theory Mar 22, 2024

MMeent Mar 21, 2024

theory Mar 22, 2024

MMeent Mar 21, 2024

theory Mar 25, 2024

MMeent Mar 21, 2024

theory Mar 22, 2024

MMeent Mar 21, 2024

theory Mar 22, 2024

MMeent Mar 24, 2024

theory Mar 25, 2024

		distributed together. Packages may be downloaded directly from version
		control repositories or in [archive files] generated by a release tag.

		* Maintainer: List of maintainers, each an object with `name` and either
		`email` or `url` (or both)

PGXN meta sketch #4

Are you sure you want to change the base?

PGXN meta sketch #4

Conversation

theory commented Mar 21, 2024

MMeent left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theory Mar 21, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theory Mar 21, 2024 •

edited