Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting OpenShift #1090

Open
ikreymer opened this issue Aug 18, 2023 · 4 comments
Open

Supporting OpenShift #1090

ikreymer opened this issue Aug 18, 2023 · 4 comments
Labels
feature design This issue tracks smaller sub issues that compose a feature

Comments

@ikreymer
Copy link
Member

ikreymer commented Aug 18, 2023

Ideally, we'd have a way of running Browsertrix Cloud on the OpenShift flavor of K8s, but that'll likely require a few changes.
Currently, we don't have the capacity to do this just yet, but would like to eventually support OpenShift.
This is a placeholder issue to keep track of this.

We can start by listing known changes / requirements that will need to be made to support OpenShift in this issue.

@ikreymer ikreymer added the feature design This issue tracks smaller sub issues that compose a feature label Aug 18, 2023
@ikreymer ikreymer changed the title Support OpenShift Supporting OpenShift Aug 18, 2023
@wvengen
Copy link
Contributor

wvengen commented Aug 21, 2023

Actually, we're doing this already, patching the charts in the following ways:

  • avoid putting secrets in YAML config (with instructions to manually create the secrets)
  • comment out the clusterIP: None in the mongo template
  • proxy S3 storage with nginx (Swift doesn't support everything necessary for replay; see below)
  • hosting our own ReplayWeb.page with injection customizations

The S3 storage proxy is not ideal, that could use some improvement. The issue was that our OpenStack Swift S3 does not allow HEAD requests on signed objects for CORS, which is required for ReplayWeb.page. Maybe this is not needed when using your own ReplayWeb.page hosted on the same domain (as we do with ingress rules), since then CORS shouldn't matter - but I did not verify this.

@ikreymer
Copy link
Member Author

@wvengen That's great to hear, I'm surprised by the list below.

Would you mind opening a PR / sharing what you've done exactly so we can integrate into the main codebase?

One question we had was about namespaces - my understanding is that OpenShift has more constraints on namespace creation - are you using an existing namespace instead of crawlers?

Actually, we're doing this already, patching the charts in the following ways:

  • avoid putting secrets in YAML config (with instructions to manually create the secrets)

I commented more on #490 -- is that something that OpenShift requires or a decision you've made?
On first glance, it doesn't seem like OpenShift Secrets behave differently than k8s.

  • comment out the clusterIP: None in the mongo template

This is just to run the service as a headless service for the statefulset - a common pattern. Why was this needed?

  • proxy S3 storage with nginx (Swift doesn't support everything necessary for replay; see below)
  • hosting our own ReplayWeb.page with injection customizations

The S3 storage proxy is not ideal, that could use some improvement. The issue was that our OpenStack Swift S3 does not allow HEAD requests on signed objects for CORS, which is required for ReplayWeb.page. Maybe this is not needed when using your own ReplayWeb.page hosted on the same domain (as we do with ingress rules), since then CORS shouldn't matter - but I did not verify this.

ReplayWeb.page will fall back on GET requests if HEAD fails, it prefers HEAD to check the size, so this shouldn't be an issue either way.

@wvengen
Copy link
Contributor

wvengen commented Aug 22, 2023

are you using an existing namespace instead of crawlers

No, we use crawlers. I don't remember whether we needed to create the namespace manually or not.

avoid putting secrets in YAML config (with instructions to manually create the secrets)

I commented more on #490 -- is that something that OpenShift requires or a decision you've made?

No, that was a decision we made. Good to see this discussion in #490.

comment out the clusterIP: None in the mongo template

This is just to run the service as a headless service for the statefulset - a common pattern. Why was this needed?

Good question, I would need to dive into this to figure out again. Reading about the headless service, it looks like it isn't necessary.

ReplayWeb.page will fall back on GET requests if HEAD fails, it prefers HEAD to check the size, so this shouldn't be an issue either way.

Is this a GET request with a Range header to determine the size? For many archives, getting the whole archive is just too much.
I'd need to check if this suffers from the same issue or not.

Thanks for asking these questions to get more clarity on what is really necessary.

@wvengen
Copy link
Contributor

wvengen commented Feb 20, 2024

I got round to reinstalling Browsertrix from scratch, on OpenStack. Most things work as is, in our case not replaying archives.

Our infrastructure provider provides OpenStack's Swift for object storage (which has S3 support). We could not get CORS to work here (HEAD nor GET, signed and public URLs), so we need to keep using a storage proxy - a bit of a hack, but it works.

Swift does support CORS to some extent, but not in our case, unfortunately, so I cannot say if this holds for all OpenStack users.

ikreymer pushed a commit that referenced this issue Mar 16, 2024
… for deletion (#1600)

I came across [this
problem](https://forum.webrecorder.net/t/deleting-crawl-failure/512) and
noticed that the access URL is used when deleting files, causing my file
deletions to fail on OpenStack SWIFT S3 (relates to #1090). This trivial
change makes it work there.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature design This issue tracks smaller sub issues that compose a feature
Projects
Status: Todo
Development

No branches or pull requests

3 participants