Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk Data content access #4

Closed
jonquandt opened this issue Jul 11, 2018 · 8 comments
Closed

Bulk Data content access #4

jonquandt opened this issue Jul 11, 2018 · 8 comments
Assignees
Labels
Features Items currently under consideration by the govinfo team for development. Upcoming Items currently under development and will be available in our next release
Milestone

Comments

@jonquandt
Copy link
Member

jonquandt commented Jul 11, 2018

Currently, we don't provide direct access to the bulkdata contents via the API. How important is that to the community?

The govinfo bulk data repo does provide access to the contents via both xml and json endpoints as it stands.

It's doable to make the bulkdata available via the API, but we're wondering how much a priority that is to the community vs. some of the other features that we're currently looking at?

If we did this, would making BILLSTATUS and ECFR available be sufficient?

@jonquandt jonquandt added the Features Items currently under consideration by the govinfo team for development. label Jul 11, 2018
@cnizzardini
Copy link

I am currently working with the API on a project related to congressional bills. It would be great if I could use the same API to get full bill text in JSON format. It's just easier to work with than working with this type of XML: https://www.govinfo.gov/content/pkg/BILLS-115hr2740rfs/xml/BILLS-115hr2740rfs.xml

@jonquandt
Copy link
Member Author

jonquandt commented Nov 18, 2018

@cnizzardini - thanks for the feedback. As you mentioned, there is no official version of Congressional Bills in JSON format currently.

Have you considered using xml2json or a similar library for your language of choice (PHP, it looks like) to transform the XML into json? Of course, then you're not dealing directly with the official content, so it may require additional verification that the translated version meets your needs.

@yian-yin
Copy link

We are working on a project studying government documents, and it would be very helpful if we could directly download bulkdata through the API. Thanks!

@yian-yin
Copy link

yian-yin commented Dec 1, 2018

@jonquandt Just out of curiosity, is there a way to download the whole corpus before you make the feature available? I know this can be done by downloading though API per file but wanted to make sure this follows your rate limit first :)

@jonquandt
Copy link
Member Author

@yian-yin - when you say “whole corpus”, do you mean all of the files available through the bulk data repository, or the entire corpus of content available on govinfo?

The bulk data repository represents only a subset of documents available from govinfo - primarily XML content only.

Via the govinfo API, you can already access much of the xml content that exists on the bulk data repository- like XML of Congressional Bills and the Federal Register. Going via the API for those resources does also give the flexibility of getting other content formats and our MODS metadata records for that content, which provides a wealth of information that can be used to understand the content and link it to other Government publications.

There are a few collections that are available only via the bulk data site, currently. From current usage patterns, we would prioritize making Congressional Billstatus and ECFR data available via the API, though other types might make sense to include as well.

@yian-yin
Copy link

yian-yin commented Dec 1, 2018

@jonquandt Thanks for your answer! I am actually interested in the entire corpus of content available on govinfo. As I understand, a large fraction of such information is currently unavailable through bulk data, that's why I ask if there's any rate limit in using API.

Also thanks for reminding me that some collections only exist in the bulk data site -- does this mean the API only includes a subset of what's available on govinfo? If so, would you mind letting me know the estimate of this fraction?

@jonquandt
Copy link
Member Author

jonquandt commented Dec 1, 2018

@yin-yang no rate limit at the moment, but we may impose one if we’re seeing an excessive number of requests - I don’t anticipate it being an issue at this time. Best to try to do larger updates overnight though.

Of the list of collections listed on www.govinfo.gov/bulkdata, the bulk data only collections are:

-Congressional Bill Status – 113th Congress to Present
-Congressional Bill Summaries – House Bill Summaries added in 2014, summaries for Senate Bills added in January 2015
-Electronic Code of Federal Regulations (current XML file for each of the titles in the eCFR)
-Supreme Court Decisions

All of the bulk data resources are available via our bulk data sitemaps or directly as xml or json endpoints.

The vast majority of content and metadata is available via the API.

@jonquandt jonquandt self-assigned this Dec 3, 2018
@jonquandt jonquandt added the Upcoming Items currently under development and will be available in our next release label Mar 8, 2019
@jonquandt jonquandt added this to the March 2019 milestone Mar 8, 2019
@jonquandt
Copy link
Member Author

BILLSTATUS and ECFR content is now available via the API:

https://api.govinfo.gov/collections:

{
"collectionCode": "BILLSTATUS",
"collectionName": "Congressional Bill Status",
"packageCount": 5554,
"granuleCount": null
},
....
{
"collectionCode": "ECFR",
"collectionName": "Electronic Code of Federal Regulations",
"packageCount": 16,
"granuleCount": null
},

We are in the process of reindexing, so the numbers under packageCount will increase over the coming weeks.

You can use the collections and packages endpoints for these packages now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Features Items currently under consideration by the govinfo team for development. Upcoming Items currently under development and will be available in our next release
Projects
None yet
Development

No branches or pull requests

3 participants