Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sitemapindex empty for bulkdata #37

Closed
mooncalfskb opened this issue Mar 8, 2019 · 13 comments
Closed

sitemapindex empty for bulkdata #37

mooncalfskb opened this issue Mar 8, 2019 · 13 comments
Assignees

Comments

@mooncalfskb
Copy link

Hello. We are relying on this index page to determine what files to download but it is empty today. do you know what's up?
https://www.govinfo.gov/sitemap/bulkdata/BILLSTATUS/sitemapindex.xml

The sub pages that were referenced on the sitemapindex appear to still be working:
example: https://www.govinfo.gov/sitemap/bulkdata/BILLSTATUS/115s/sitemap.xml

thanks
Sherrod

@jonquandt
Copy link
Member

thanks for reporting this issue. I'll look into it and let you know when it's resolved.

@mooncalfskb
Copy link
Author

thanks!

@jonquandt
Copy link
Member

We've restored an older copy of the sitemaps. I will go ahead and republish a couple of billstatus for the most recently updated billtypes in the 115th and 116th Congress. that should at least update the last modified date on the index and let you know that you should crawl them. The regular billstatus job will also trigger updates.

Finally, coming soon -- access to BILLSTATUS and ECFR bulkdata from our API - usgpo/api#4

@mooncalfskb
Copy link
Author

Great!

@JoshData
Copy link

Any update here? https://www.govinfo.gov/sitemap/bulkdata/BILLSTATUS/115hr/sitemap.xml has been blank for a few days.

@jonquandt
Copy link
Member

Let me look into that. I must have missed that one, though the others all had data in them. I’ll get back to you this morning

@jonquandt
Copy link
Member

@JoshData - that sitemap is now restored. We're looking into how to prevent that from happening in the future. We set the update time to be this morning for all the billstatus files to ensure that no updates are missed. This does mean that there are now about 7300 billstatus packages reporting as new/updated.

@JoshData
Copy link

Thanks! I love that there are 7,300 billstatus packages to update. A few years ago there were zero! :)

@jonquandt
Copy link
Member

reopening this issue to resolve the missing www.

@jonquandt jonquandt reopened this Mar 13, 2019
@jonquandt
Copy link
Member

@JoshData - We updated the sitemap for 115hr to include www. in the loc element.

@jonquandt
Copy link
Member

Closed based on unitedstates/congress#239 comments

@JoshData
Copy link

Thanks!

@mooncalfskb
Copy link
Author

Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants