Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instance pipeline #799

Merged

Conversation

micahmo
Copy link
Member

@micahmo micahmo commented Oct 3, 2023

Pull Request Description

This PR introduces a new pipeline which fetches a list of popular instances from fediverse.observer. The list is placed into a text file and also a code file so that it can be consumed programmatically by the Thunder app. Of course, we can also generate other kinds of files in the future (e.g., Android manifest files for handling links). The workflow then opens a PR in the repo so that we can review, approve, and merge the changes. At this time, the list of instances is generated by combining what's already in the file with the results of the API call, which queries for instances with >50 subscribers (as Jerboa does). One nice thing is that it doesn't remove any instances, so we can manually add ones that aren't detected automatically, if needed.

Initially, I would like to use this list for a couple of smaller things.

  • Pre-populating a list of suggested instances to log into.
  • Faster checking of whether a link references a Lemmy instance (rather than our current API check, which can be slow).

In the future, I would also like to use this list for in-app post/comment navigation, as well as the big one, handling external links.

For reference...

Note: This PR does introduce an API dependency on fediverse.observer. @hjiangsu should probably review their terms, and make sure we're ok with it, but everything seems fine. It's also free at this time, but of course we could remove this if that changes. For reference, Jerboa (developed by a Lemmy dev) also queries fediverse.observer in their pipeline.

Note: I had to change a few settings related to action permissions in my repo. I can share those details if needed.

Issue Being Fixed

Not directly related to any issue; this is more of an infrastructure thing. But should pave the way for some nice features.

Screenshots / Recordings

Checklist

  • Did you update CHANGELOG.md?
  • Did you use localized strings where applicable?
  • Did you add semanticLabels where applicable for accessibility?

@hjiangsu
Copy link
Member

hjiangsu commented Oct 3, 2023

This is pretty cool so thanks for working on this! Everything seems pretty good with me with regards to this new pipeline.

A couple of questions:

  • How does this differ from something like lemmyverse?
  • How should we handle the (inevitable) cases where instances are no longer active or available? Since we don't override the file, I'm assuming we would be able to do a diff between the raw pipeline output and our existing entries to remove the non-active instances (but maybe you had some other ideas on this)

Anyways, this has my approval! It would be good if you could share the changes for action permissions so that I can do that as well on here 😁

@micahmo micahmo force-pushed the feature/populate-instances-ci branch from b7489ae to dbf0690 Compare October 4, 2023 02:22
@micahmo
Copy link
Member Author

micahmo commented Oct 4, 2023

How does this differ from something like lemmyverse?

As far as I know, lemmyverse does not have an official API for their data. The author commented here and said that he doesn't "really plan on hosting a dedicated search api". They do have a data dump (https://data.lemmyverse.net/data/community.full.json), and I suppose we could download and parse that. But the fediverse API, which uses GraphQL, is very nice and allows to pass a where clause and request only specific fields, making it a very lightweight call.

How should we handle the (inevitable) cases where instances are no longer active or available?

That's a good question. The same could be asked for cases where we might want to manually exclude an instance for other reasons (although I doubt we want to take any kind of stance like that, as a client app). I guess I was thinking that there's no harm is keeping those domains around. If they go down, then they'd be inaccessible either way, so we're no worse off assuming that they're Lemmy instances. However, if you'd prefer, I can create an exclusion mechanism (i.e., another text file, maintained manually) that ensures select domains don't end up in the output. Let me know!

It would be good if you could share the changes for action permissions

Of course! I had to change the following.

  • Settings > Actions > General > Workflow permissions > Read and write permissions
  • Settings > Actions > General > Allow GitHub Actions to create and approve pull requests

@hjiangsu
Copy link
Member

hjiangsu commented Oct 4, 2023

But the fediverse API, which uses GraphQL, is very nice and allows to pass a where clause and request only specific fields, making it a very lightweight call.

That makes sense! It would be interesting to see what the differences are in the future, and compare one against another in terms of the information that we can retrieve from them and their accuracy

However, if you'd prefer, I can create an exclusion mechanism (i.e., another text file, maintained manually) that ensures select domains don't end up in the output. Let me know!

I think we can leave this for another time since this is something new! I'll go ahead and merge this in and change the permissions.

@hjiangsu hjiangsu merged commit d490d9a into thunder-app:develop Oct 4, 2023
1 check passed
@micahmo micahmo deleted the feature/populate-instances-ci branch October 5, 2023 02:20
@micahmo micahmo mentioned this pull request Oct 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants