-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add github_repository_content table #207
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
# Table: github_repository_content | ||
|
||
Gets the contents of a file or directory in a repository. | ||
|
||
Specify the file path or directory in `repository_content_path`. | ||
If you omit `repository_content_path`, you will receive the contents of the repository's root directory. | ||
See the description below regarding what the response includes for directories. | ||
|
||
The `github_repository_content` table can be used to query information about **ANY** repository, and **you must specify which repository** in the where or join clause (`where repository_full_name=`, `join github_repository_content on repository_full_name=`). | ||
|
||
## Examples | ||
|
||
### List a repository | ||
|
||
```sql | ||
select | ||
repository_full_name, | ||
path, | ||
content, | ||
type, | ||
size, | ||
sha, | ||
html_url | ||
from | ||
github_repository_content | ||
where | ||
repository_full_name = 'github/docs'; | ||
``` | ||
|
||
### List a directory in a repository | ||
|
||
```sql | ||
select | ||
repository_full_name, | ||
path, | ||
content, | ||
type, | ||
size, | ||
sha, | ||
html_url | ||
from | ||
github_repository_content | ||
where | ||
repository_full_name = 'github/docs' | ||
and repository_content_path = 'docs'; | ||
``` | ||
|
||
### Get a file in a repository | ||
|
||
```sql | ||
select | ||
repository_full_name, | ||
path, | ||
type, | ||
size, | ||
sha, | ||
content, | ||
html_url | ||
from | ||
github_repository_content | ||
where | ||
repository_full_name = 'github/docs' | ||
and repository_content_path = '.vscode/settings.json'; | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
package github | ||
|
||
import ( | ||
"context" | ||
"github.com/google/go-github/v48/github" | ||
"github.com/turbot/steampipe-plugin-sdk/v4/grpc/proto" | ||
"github.com/turbot/steampipe-plugin-sdk/v4/plugin" | ||
"github.com/turbot/steampipe-plugin-sdk/v4/plugin/transform" | ||
) | ||
|
||
//// TABLE DEFINITION | ||
|
||
func tableGitHubRepositoryContent() *plugin.Table { | ||
return &plugin.Table{ | ||
Name: "github_repository_content", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the table name (originally brought up in #207 (comment)), I think I still like the name For On the other hand, Between the two, I don't have any strong preferences. @e-gineer @johnsmyth @aminvielledebatAtBedrock - Curious to hear your thoughts as well, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cbruno10 I also lean toward |
||
Description: "List the content in a repository (list directory, or get file content", | ||
List: &plugin.ListConfig{ | ||
Hydrate: tableGitHubRepositoryContentList, | ||
ShouldIgnoreError: isNotFoundError([]string{"404"}), | ||
KeyColumns: []*plugin.KeyColumn{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For key columns, are we also able to pass in the |
||
{Name: "repository_full_name", Require: plugin.Required}, | ||
{Name: "repository_content_path", Require: plugin.Optional, CacheMatch: "exact"}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need a separate column for this? Could we use |
||
}, | ||
}, | ||
Columns: []*plugin.Column{ | ||
{Name: "repository_full_name", Description: "The full name of the repository (login/repo-name).", Type: proto.ColumnType_STRING, Transform: transform.FromQual("repository_full_name")}, | ||
{Name: "type", Description: "The file type (directory or file).", Type: proto.ColumnType_STRING}, | ||
{Name: "name", Description: "The file name.", Type: proto.ColumnType_STRING}, | ||
{Name: "repository_content_path", Description: "The requested path in repository search.", Type: proto.ColumnType_STRING, Transform: transform.FromQual("repository_content_path")}, | ||
{Name: "path", Description: "The path of the file.", Type: proto.ColumnType_STRING}, | ||
{Name: "size", Description: "The size of the file (in MB).", Type: proto.ColumnType_INT}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @aminvielledebatAtBedrock I saw that the GitHub API lists some caveats and restrictions about file size in https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28#size-limits. What are the query results/error if the file size is over 100 MB? Also, for files less than 1 MB and for those between 1 - 100 MB, are there any differences in the column values, or do the differences not affect the table? |
||
{Name: "content", Description: "The decoded file content (if the element is a file).", Type: proto.ColumnType_STRING, Transform: transform.From(transformFileContent), Hydrate: tableGitHubRepositoryContentGet}, | ||
{Name: "target", Description: "Target is only set if the type is \"symlink\" and the target is not a normal file. If Target is set, Path will be the symlink path.", Type: proto.ColumnType_STRING}, | ||
{Name: "sha", Description: "The sha of the file.", Type: proto.ColumnType_STRING, Transform: transform.FromField("SHA")}, | ||
{Name: "url", Description: "URL of file's metadata.", Type: proto.ColumnType_STRING}, | ||
{Name: "git_url", Description: "Git URL (with SHA) of the file.", Type: proto.ColumnType_STRING}, | ||
{Name: "html_url", Description: "Raw file URL in GitHub.", Type: proto.ColumnType_STRING}, | ||
{Name: "download_url", Description: "Download URL : it expires and can be be used just once.", Type: proto.ColumnType_STRING}, | ||
}, | ||
} | ||
} | ||
|
||
//// LIST FUNCTION | ||
|
||
func tableGitHubRepositoryContentList(ctx context.Context, d *plugin.QueryData, h *plugin.HydrateData) (interface{}, error) { | ||
owner, repo := parseRepoFullName(d.KeyColumnQuals["repository_full_name"].GetStringValue()) | ||
var filterPath string | ||
if d.KeyColumnQuals["repository_content_path"] != nil { | ||
filterPath = d.KeyColumnQuals["repository_content_path"].GetStringValue() | ||
} | ||
plugin.Logger(ctx).Trace("tableGitHubRepositoryContentList", "owner", owner, "repo", repo, "path", filterPath) | ||
|
||
type ListPageResponse struct { | ||
repositoryContent []*github.RepositoryContent | ||
resp *github.Response | ||
} | ||
client := connect(ctx, d) | ||
opt := &github.RepositoryContentGetOptions{} | ||
listPage := func(ctx context.Context, d *plugin.QueryData, h *plugin.HydrateData) (interface{}, error) { | ||
fileContent, directoryContent, resp, err := client.Repositories.GetContents(ctx, owner, repo, filterPath, opt) | ||
|
||
if err != nil { | ||
plugin.Logger(ctx).Error("tableGitHubRepositoryContentList", "api_error", err, "path", filterPath) | ||
return nil, err | ||
} | ||
|
||
if fileContent != nil { | ||
directoryContent = []*github.RepositoryContent{fileContent} | ||
} | ||
|
||
return ListPageResponse{ | ||
repositoryContent: directoryContent, | ||
resp: resp, | ||
}, err | ||
} | ||
|
||
for { | ||
listPageResponse, err := retryHydrate(ctx, d, h, listPage) | ||
if err != nil { | ||
plugin.Logger(ctx).Error("tableGitHubRepositoryContentList", "retry_hydrate_error", err) | ||
return nil, err | ||
} | ||
|
||
for _, i := range listPageResponse.(ListPageResponse).repositoryContent { | ||
if i != nil { | ||
d.StreamListItem(ctx, i) | ||
} | ||
|
||
// Context can be cancelled due to manual cancellation or the limit has been hit | ||
if d.QueryStatus.RowsRemaining(ctx) == 0 { | ||
return nil, nil | ||
} | ||
} | ||
|
||
if listPageResponse.(ListPageResponse).resp.NextPage == 0 { | ||
break | ||
} | ||
} | ||
return nil, nil | ||
} | ||
|
||
//// GET FUNCTION | ||
|
||
func tableGitHubRepositoryContentGet(ctx context.Context, d *plugin.QueryData, h *plugin.HydrateData) (interface{}, error) { | ||
owner, repo := parseRepoFullName(d.KeyColumnQuals["repository_full_name"].GetStringValue()) | ||
filterPath := *h.Item.(*github.RepositoryContent).Path | ||
|
||
plugin.Logger(ctx).Trace("tableGitHubRepositoryContentGet", "owner", owner, "repo", repo, "path", filterPath) | ||
|
||
type GetResponse struct { | ||
repositoryContent *github.RepositoryContent | ||
resp *github.Response | ||
} | ||
|
||
client := connect(ctx, d) | ||
getFileContent := func(ctx context.Context, d *plugin.QueryData, h *plugin.HydrateData) (interface{}, error) { | ||
fileContent, _, resp, err := client.Repositories.GetContents(ctx, owner, repo, filterPath, &github.RepositoryContentGetOptions{}) | ||
|
||
if err != nil { | ||
plugin.Logger(ctx).Error("tableGitHubRepositoryContentGet", "api_error", err, "path", filterPath) | ||
return nil, err | ||
} | ||
|
||
return GetResponse{ | ||
repositoryContent: fileContent, | ||
resp: resp, | ||
}, err | ||
} | ||
|
||
getResponse, err := retryHydrate(ctx, d, h, getFileContent) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
return getResponse.(GetResponse).repositoryContent, nil | ||
} | ||
|
||
func transformFileContent(_ context.Context, d *transform.TransformData) (interface{}, error) { | ||
content := d.HydrateItem.(*github.RepositoryContent) | ||
// directory use case. By definition, a directory doesn't have a raw content | ||
if content.Content == nil { | ||
return nil, nil | ||
} | ||
// empty file with "none" encoding, | ||
// or too big file (greater than 100MB, the RepositoryContent endpoint is not supported) | ||
if *content.Content == "" { | ||
return "", nil | ||
} | ||
return content.GetContent() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aminvielledebatAtBedrock I tried to use this table to list contents of a directory that contained a
.png
file, and then tried a query to get the actual.png
file:I'm not sure what the repository content API returns, but have you tested using this table when getting content for non-text files, or listing directories that contain them? For instance, does this table also work with GIF, JPEG, SVG, Microsoft Office (Word, PPT, Excel), PDF, etc., files? If so, what's in
content
for them?Also, I'm not sure if you're on a different version, but I'm on
github.com/google/go-github/v48 v48.0.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried against an
.svg
file and it seemed to return thecontent
OK