New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
report gist filenames in table_github_gist #57
Comments
Definitely makes sense to add the files column. JSON is a reasonable starting point for it too ... if a piece of data from deep in the JSON is used a lot then sometimes we elevate it up to a column on it's own, but that is usually when completely obvious or widely requested (easy to add columns, very hard to deprecate/remove them). A slight change to the Postgres makes it easier to use I believe. For example:
Add a good example like that to the docs with it and it would be good to go / easy to use. |
Hmm. I think it needs to be {
"files": {
"gistfile1.txt": {
"filename": "gistfile1.txt",
"language": "Text",
"raw_url": "https://gist.githubusercontent.com/judell/9744381/raw/cd2f695f7e776e82ef0c6dc6678a6322a514f5f9/gistfile1.txt",
"size": 6341,
"type": "text/plain"
}
}
} So maybe like this, to clean up the referencing?
|
As per https://steampipe.io/blog/adding-a-column-to-a-table, we decided to flatten the object to a simple array of objects. The query solution we gave was: select
f ->> 'language' as language,
count(*)
from
github_my_gist g
cross join
jsonb_array_elements(g.files) f
group by
language
order by
count desc On reflection I'm still puzzled by the cross join. The literature suggests that the "Cartesian product" it produces is useful for things like this. create table colors (color text);
insert into colors(color) values ('red');
insert into colors(color) values ('green');
create table sizes (size text);
insert into sizes(size) values ('small');
insert into sizes(size) values ('medium');
insert into sizes(size) values ('large');
select * from colors cross join sizes;
size | color
--------+-------
small | red
small | green
medium | red
medium | green
large | red
large | green Here is a simplified example of the case discussed in the blog post. create table my_gist(id text, description text, files jsonb);
insert into my_gist(id, description, files)
values ('b89721e4f71c3f647e8d686887de3008', 'gist-with-two-files', '[{"filename": "file1.md"}, {"filename": "file2.md"}]')
select * from my_gist
id | description | files
----------------------------------+---------------------+------------------------------------------------------
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md", }, {"filename": "file2.md"}]
select jsonb_array_elements(g.files) as files from my_gist
files
-----------------------
{"filename": "file1.md"}
{"filename": "file2.md"} Cross joining against the files object. select
g.id,
g.description,
files
from
my_gist g
cross join
jsonb_array_elements(g.files) files;
id | description | files
----------------------------------+---------------------+------------------------------------------------------------------------------
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}]
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}] That seems like a weird Cartesian product! Cross joining with indexing into the files object gets us where we want to go. select
id,
description,
f ->> 'filename'
from
my_gist g
cross join
jsonb_array_elements(g.files) f;
id | description | filename
----------------------------------+---------------------+----------
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file1.md
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file2.md
But how to explain it? I'm just not seeing how it works. I've always gone with this approach. select
id,
description,
jsonb_array_elements(files) as files
from
my_gist;
id | description | jsonb_array_elements
----------------------------------+---------------------+--------------------------------------
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | { "filename": "file1.md"}
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | { "filename": "file2.md"} I'm used to how
The b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}]
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | [{"filename": "file1.md"}, {"filename": "file2.md"}] Indexing into it with id | description | filename
----------------------------------+---------------------+----------
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file1.md
b89721e4f71c3f647e8d686887de3008 | gist-with-two-files | file2.md |
Hey @LalitTurbot, thanks for following up on that! |
It's hard to make sense of the table_github_gist listing without filenames. I found here the element
Files
.I tried adding this to the table_github_gist.go:
{Name: "files", Type: pb.ColumnType_JSON, Description: "The filename."},
Now I can do this:
The downside is that this requires some Postgres mojo that makes me stop and think, despite my having used Postgres JSONB a lot. So while this works for me I'm not sure I'd recommend for others. Presumably the plugin could use a derived type that hoists these fields to the top level? Anyway, it was an easy and instructive workaround.
The text was updated successfully, but these errors were encountered: