Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DownloadContents can't download files from root of repo #2480

Open
abatilo opened this issue Sep 25, 2022 · 19 comments
Open

DownloadContents can't download files from root of repo #2480

abatilo opened this issue Sep 25, 2022 · 19 comments
Assignees

Comments

@abatilo
Copy link

abatilo commented Sep 25, 2022

GitHub Apps have the ability to request access to individual files which is great for fine-grained access.

image

However, as of v47.1.0, this will prevent someone from being able to download a file from the root repository because of the way DownloadContents implements directory scanning first:

dir := path.Dir(filepath)
filename := path.Base(filepath)
_, dirContents, resp, err := s.GetContents(ctx, owner, repo, dir, opts)
if err != nil {
return nil, resp, err
}

The first thing that the function does is get just the directory name with path.Dir but if you try to download a file like just CODEOWNERS which is a valid location for a CODEOWNERS file, then the first request to the API will create a URL like so:

https://api.github.com/repos/$owner/$repo/contents/

At the moment, there's no way to add the root directory to the list of single file paths. You cannot add a blank path and adding just / doesn't work either.

@gmlewis
Copy link
Collaborator

gmlewis commented Sep 25, 2022

So was this a regression caused by a change to v47.1.0 ?

Do you want to create a PR to fix this, @abatilo , or would you like me to open up this issue to other contributors to this repo to address?

@abatilo
Copy link
Author

abatilo commented Oct 2, 2022

Hi @gmlewis, I think I actually poorly chose my words. I shouldn't have said "as of v47.1.0" but instead it's more that I found this behavior while using v47.1.0. It appears that this has been the behavior for the history of the function? I don't think I have the availability to work on this so I'd be happy to have this open up to anyone who wants to contribute!

@gmlewis
Copy link
Collaborator

gmlewis commented Oct 2, 2022

Thank you, @abatilo .

This would be a great PR for any new contributor to this repo or a new Go developer.
All contributions are greatly appreciated!

Feel free to volunteer for any issue and the issue can be assigned to you so that others don't attempt to duplicate the work.

Please check out our CONTRIBUTING.md guide to get started. (In particular, please remember to go generate ./... and don't use force-push to your PRs.)

Thank you!

@zaataylor
Copy link

I took a look into this tonight because I'd like to get involved in the project. Based on the path.Dir docs I found here, it seems like it should be possible for the existing code to download files from the root of a repo:

I was able to construct a minimal working example of this as follows:

  1. Created a test module called go-gh-testcase. main.go of this module looks like:
package main

import (
	"golang.org/x/oauth2"
	"github.com/google/go-github/github"
	"context"
	"fmt"
)

func main() {
	ctx := context.Background()
	ts := oauth2.StaticTokenSource(
		&oauth2.Token{AccessToken: "<YOUR-GITHUB-PAT-HERE>"})
	tc := oauth2.NewClient(ctx, ts)
	client := github.NewClient(tc)

	const (
		owner = "google"
		repo = "go-github"
	)
	filepath := "LICENSE"

	rc, _, err := client.Repositories.DownloadContents(ctx, owner, repo, filepath, nil)
	if err != nil {
		fmt.Errorf("Error was: %s\n", err)
	}
	defer rc.Close()
        byteCount := 100
	buf := make([]byte, byteCount)
        n, err := rc.Read(buf)
        if err != nil {
            fmt.Errorf("Error while reading into buffer: %s\n", err)
        }
	fmt.Printf("Contents of first %d bytes are: %s\n", byteCount, buf[:n])
}
  1. Modified go-github/github/repos_content.go very slightly to add some fmt.Printlns in the appropriate places:
diff --git a/github/repos_contents.go b/github/repos_contents.go
index be58fd5..d4a2319 100644
--- a/github/repos_contents.go
+++ b/github/repos_contents.go
@@ -127,13 +127,16 @@ func (s *RepositoriesService) GetReadme(ctx context.Context, owner, repo string,
 // code to verify the content is from a successful response.
 func (s *RepositoriesService) DownloadContents(ctx context.Context, owner, repo, filepath string, opts *RepositoryContentGetOptions) (io.ReadCloser, *Response, error) {
        dir := path.Dir(filepath)
+       fmt.Printf("Dir is: %s\n", dir)
        filename := path.Base(filepath)
+       fmt.Printf("Filename is: %s\n", filename)
        _, dirContents, resp, err := s.GetContents(ctx, owner, repo, dir, opts)
        if err != nil {
                return nil, resp, err
        }
 
        for _, contents := range dirContents {
+               fmt.Printf("\tName of found content item is: %s\n", *contents.Name)
                if *contents.Name == filename {
                        if contents.DownloadURL == nil || *contents.DownloadURL == "" {
                                return nil, resp, fmt.Errorf("no download link found for %s", filepath)

Because of the behavior of path.Dir, as outlined here, a filepath without any /s (such as the path "LICENSE" I chose in the first code block) will result in dir being set to ".". So I think this would actually result in a URL that looks like:
https://api.github.com/repos/$owner/$repo/contents/.

Running the code above, I got the result:

Dir is: .
Filename is: LICENSE
        Name of found content item is: .codecov.yml
        Name of found content item is: .github
        Name of found content item is: .gitignore
        Name of found content item is: .golangci.yml
        Name of found content item is: AUTHORS
        Name of found content item is: CONTRIBUTING.md
        Name of found content item is: LICENSE
Contents of first 100 bytes are: Copyright (c) 2013 The go-github AUTHORS. All rights reserved.

Redistribution and use in source and

which is the first 100 bytes of go-github's LICENSE file.

Doing the same test on the go.sum file (last file in the root of the repo as it appears in GitHub UI) yields:

Dir is: .
Filename is: go.sum
        Name of found content item is: .codecov.yml
        Name of found content item is: .github
        Name of found content item is: .gitignore
        Name of found content item is: .golangci.yml
        Name of found content item is: AUTHORS
        Name of found content item is: CONTRIBUTING.md
        Name of found content item is: LICENSE
        Name of found content item is: README.md
        Name of found content item is: example
        Name of found content item is: github
        Name of found content item is: go.mod
        Name of found content item is: go.sum
Contents of first 100 bytes are: github.com/golang/protobuf v1.3.1/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/

So, it seems like specifying a filename in the repo root will still behave properly; the API call will just consider dir to be ".", but the file will be found in the for loop over the contents of "." if it's in the root.


I've never participated in this project before, and I tried to make sure I understood the context before commenting, but if I've totally misunderstood something here, please let me know! I'm just trying to help and learn a little Golang! 😄

@gmlewis
Copy link
Collaborator

gmlewis commented Oct 7, 2022

I've never participated in this project before, and I tried to make sure I understood the context before commenting, but if I've totally misunderstood something here, please let me know! I'm just trying to help and learn a little Golang!

Thank you, @zaataylor , for the detailed analysis and excellent write-up... It is greatly appreciated! I believe that you have demonstrated that this is working as intended.

I will leave this issue open for a short while in case there are any other comments, but otherwise I believe this issue can be closed.

@abatilo
Copy link
Author

abatilo commented Oct 9, 2022

Hi @zaataylor, your solution demonstrates accessing things at the root of the repo for a GitHub Personal Access Token which does not have the same fine grained per file permissions that a GitHub App might request.

Maybe this issue needs to be re-worded to be specific about the file path permissions that you can set on a GitHub App.

@gmlewis
Copy link
Collaborator

gmlewis commented Oct 9, 2022

@abatilo - is it not possible to give the GitHub App the appropriate permissions needed to access the root of the repo?

If that is truly the case, then I think that you need to contact GitHub tech support and ask how to accomplish this in a GitHub App.
Feel free to report back here with what you find out.

@abatilo
Copy link
Author

abatilo commented Oct 10, 2022

It's possible. I discovered this because I am building an app to do things based on CODEOWNERS, which can be in 3 locations. CODEOWNERS, docs/CODEOWNERS, and .github/CODEOWNERS.

There's no actual need for the app to request permissions or have any ability to read files in the root of the repo and my guess is that security minded organizations would prefer that.

I do still think that the "correct" solution here is to change the DownloadContents code from it's original implementation from 8 years ago to not need to do a GetContents call on the path.Dir of the path before attempting to do a for loop over each file in the folder to get a string match.

I may have time this week to open a PR with that change and see if the proposed solution is alright!

@petersondmg
Copy link

Hello, may I try to take a look on this?

@gmlewis
Copy link
Collaborator

gmlewis commented Oct 24, 2022

Hello, may I try to take a look on this?

Sorry, @petersondmg , I should have assigned this to @abatilo from the previous comment. Let's wait to hear back what @abatilo discovers.

@abatilo
Copy link
Author

abatilo commented Oct 24, 2022

I did NOT have the time when I had hoped to. Please feel free to assign to @petersondmg! I appreciate it though @gmlewis

@gmlewis
Copy link
Collaborator

gmlewis commented Oct 24, 2022

I did NOT have the time when I had hoped to. Please feel free to assign to @petersondmg! I appreciate it though @gmlewis

No problem, @abatilo - thank you for the note, and best wishes to you.

@petersondmg - it is yours.

@gmlewis gmlewis assigned petersondmg and unassigned abatilo Oct 24, 2022
@petersondmg
Copy link

Hello
Maybe I misunderstood the point of the issue, but I was able to 'download files from root of repo with DownloadContents', using a GitHub App token, and a personal token as well.

To generate a Github App Token I did:

1 - generated a private key for the app
2 - authenticated as a GitHub App (generated the jwt)
3 - retrieved an access token by requesting the endpoint using the jwt https://api.github.com/app/installations/<app_installation_id>/access_tokens
4 - used the access token to execute the script posted on comments above

It was able to list files on the root and download the file.

Should I try a different approach? @gmlewis @abatilo

Thanks.

@gmlewis
Copy link
Collaborator

gmlewis commented Oct 29, 2022

Thank you, @petersondmg !
Let's wait to hear from @abatilo before proceeding.

@abatilo
Copy link
Author

abatilo commented Oct 29, 2022

Hi @petersondmg. If you configure the GitHub App with the Contents permissions, then there aren't any problems with downloading from the root. However, if you configure single files only like I have in the screenshot in the main post, and you provide a single file in the root as the only permission, then you won't be able to download it because you've only granted permission to the single path and not the directory.

@gmlewis
Copy link
Collaborator

gmlewis commented Nov 7, 2022

I believe this issue can be closed. Do you agree, @abatilo and @petersondmg ?

@petersondmg
Copy link

petersondmg commented Nov 7, 2022

@gmlewis I'm almost done with a solution to the point brought by @abatilo, that is, downloading the file without having to list the base directory. Just have to adjust the tests. I'll try to send a PR till the end of the week if you think it's still valid.

@githoober
Copy link

Any news on the proposed solution? How can we help?

@JayDubyaEey
Copy link

Hi there, was there any progress made on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants