Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use randomly generated IDs in the REST API #2187

Open
rhymes opened this issue Mar 25, 2019 · 2 comments

Comments

Projects
None yet
3 participants
@rhymes
Copy link
Collaborator

commented Mar 25, 2019

Is your feature request related to a problem? Please describe.

I'm opening this ticket after having read this great article about data analysis on dev.to's articles.

The question I'd like to set forth, in preparation for a future "publication" of the API, is: should dev.to use randomly generated resource IDs instead of exposing "internal" autoincrementing integer primary keys or not?

A "classic" best practice in API design is not to expose internal details of your app, separating as much as possible how the data is organized from how is accessible from third parties.

Autoincrementing primary keys are usually discouraged in public APIs (by some, not by all) mainly for three reasons:

  1. they expose database IDs to the public. Today you might have data in a relational database, tomorrow that data might be elsewhere and those IDs might not mean much because autoincrementing primary keys are local to one data source.
  2. by exposing incrementing IDs an onlooker (or a malicious attacker) can gather two information: size of the tables and growth rate
  3. it's easier for a malicious attacker to write a script to scrape data

This is more a question of policy than a technical argument because, at least for the second of the three reasons, DEV is perhaps perfectly fine with that.

Some possible alternatives to auto incrementing integer primary keys:

  • UUIDs: PostgreSQL has a native indexable UUID type with functions to generate default UUIDs, like pgcrypto's gen_random_uuid(). UUIDs though are not sortable, they kind of look ugly and they are verbose: https://dev.to/api/articles/2d931510-d99f-494a-8c67-87feb05e1594
  • KSUIDs: they are quite new and were "invented" by Segment. The characteristic they have is that they are sortable and time dependent: https://dev.to/api/articles/0ujsszwN8NRY24YaXiTIE2VWDTS
  • some other random alphanumeric ID generated in the app (eg. Ruby's SecureRandom.alphanumeric(20)): https://dev.to/api/articles/cDAkKyz38cqjTI1bV5lG

My preference goes to one of the last two options

Describe the solution you'd like

Because the API is not public yet, and it's relatively small (only 8 controllers), I think that the transition would be doable if there's consensus on this. After publication and documentation it might be harder, especially because DEV is already a big community and developers are already jumping at the opportunity to use the API (hence the article ;))

For obvious reasons there shouldn't be a situation where both (integer IDs and alphanumeric IDs) would work at the same time. That would only strain the system (because it might result in two queries, instead of one).

The frontend/SPA would also need to be aligned and updated.

What do you think?

Additional context

I did a quick tour of some public APIs (starting from those used by DEV itself and this is what I've found:

  • Fastly uses both URLs (for obvious reasons) and alphanumeric IDs (eg. SU1Z0isxPaozGVKXdv0eY) to identify resources, probably generated by the equivalent of Ruby's SecureRandom.alphanumeric(21)
  • Cloudinary uses a public_id which is a randomly generated string (eg. 8jsb1xofxdqamu2rzwt9q)
  • Algolia is not super clear about it but it seems to be using numeric IDs but in string form
  • Stripe uses alphanumeric random strings prefixed by a resource identifier, for example ch_19yUdh2eZvKYlo2CkFVBOZG7 for a charge and cus_El7v7DRE34iBPx for a customer

A few more:

  • StackOverflow uses integer IDs
  • Twilio uses string IDs with a prefix for type of resource, similar to Stripe
  • PayPal uses alphanumeric IDs

Refs #911

@triage-new-issues triage-new-issues bot added the triage label Mar 25, 2019

@abraham

This comment has been minimized.

Copy link
Contributor

commented Mar 26, 2019

Similar to KSUID is ObjectId that have been in use in MongoDB for a while.

A fun app is Twitter. They were originally based on Rails and used sequential IDs until they ran into scaling issues and needed to generate IDs on multiple servers at once. You can read about their fix. (I'm not advocating this as a solution, it's just interesting.)

@rhymes

This comment has been minimized.

Copy link
Collaborator Author

commented Mar 26, 2019

Similar to KSUID is ObjectId that have been in use in MongoDB for a while.

Cool, seems based on the same principle, the ability to sort and extract time information.

A fun app is Twitter. They were originally based on Rails and used sequential IDs until they ran into scaling issues and needed to generate IDs on multiple servers at once. You can read about their fix. (I'm not advocating this as a solution, it's just interesting.)

Yeah that's a classic problem in distributed programming after passing a certain scale I guess, Snowflake is mentioned in the Segment article. Although DEV is probably a little away from having those scaling issues I think it would save a lot of pain down the line to guard against that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.