Skip to content

feat(docker): adding native GHCR container #1184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

goulvenb
Copy link

Originally, if the user want to use Docker on this project, this are the steps he was gonna do :

  • Clone the Repo
  • Build the Docker image using docker build -t markitdown:latest .
  • Run the command docker run --rm -i markitdown:latest < ~/your-file.pdf > output.md

With this PR, i am hopping to simplify this procedure so that testing (and trying this repo out, for example to check if it correspond to our need) can be simplified to :

  • Run the command docker run --rm -i ghcr.io/microsoft/markitdown:latest < ~/your-file.pdf > output.md

@goulvenb
Copy link
Author

@microsoft-github-policy-service agree

@goulvenb
Copy link
Author

Forgot to say, there's a need to create a project variable called REGISTRY that contain, for example, "ghcr.io" (or "docker.io" but then there'll be a need to retouch my commits)

@afourney
Copy link
Member

This looks very promising. Thank you.

I need to check some procedural stuff on my end before I can merge this (re: hosting images rather than just Dockerfiles). Let me spend a day or two on this, and I'll get back to you.

@goulvenb
Copy link
Author

Hey @afourney,
As i just said in #1186 :

it was the first workflow i made, i henceforth don't have much knowledge about them

However, i would love to talk about it with @seuros.

From what i see, here's a table difference :

#1184 #1186
We create a container on tag creation We create a container when pushing on branch main ; from a PR or not
We do not set the permissions of the PAT ; it is implicitly set We set the permissions of the PAT to the minimum
Use the 6th version of docker/build-push-action Use the 5th version of docker/build-push-action

Additionally, @seuros added a docker/setup-qemu-action@v3 action that i do not know the use of, and a docker/setup-buildx-action@v3 action that allow to create, if i understand well, an image for the architecture "linux/amd64" and "linux/arm64"

From what i see, i believe we should implement the following :

  • #1184 : We create a container on tag creation
    • This way, we'll have access to ghcr.io/microsoft/markitdown:v1.0.0, ghcr.io/microsoft/markitdown:v2.0.0, ... Instead of just ghcr.io/microsoft/markitdown:main
  • #1186 : We set the permissions of the PAT to the minimum
    • Better security practice to set the permissions in hard instead of implicitly letting Github to set them
  • #1184 : Use the 6th version of docker/build-push-action
    • I don't see why v5 should be used instead of v6 ?
  • #1186 : Use docker/setup-buildx-action@v3
    • Did not know about this, but it's better to create an image for different CPU architecture

For the Qemu action, i don't really know until i've talked with @seuros.

goulvenb added a commit to seuros/markitdown that referenced this pull request Apr 14, 2025
This way, we'll have access to `ghcr.io/microsoft/markitdown:v1.0.0`, `ghcr.io/microsoft/markitdown:v2.0.0`, ... Instead of just `ghcr.io/microsoft/markitdown:main`
goulvenb added a commit to seuros/markitdown that referenced this pull request Apr 14, 2025
…ld-push-action`

I don't see why v5 should be used instead of v6 ?
Also, the [most recent `docker/build-push-action`](https://github.com/docker/build-push-action/blob/88844b95d8cbbb41035fa9c94e5967a33b92db78/README.md?plain=1#L53-L81)'s README tell us that version compatibility with the other actions is not an issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants