Skip to content

kdaigle/random-repositories

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finding A Set Of Repositories

Head over to Google BigQuery and their collection of GitHub Timeline data.

https://bigquery.cloud.google.com/table/githubarchive:github.timeline

Find the total number of created repositories that are public and known to Google BigQuery.

SELECT COUNT(repository_url) FROM [githubarchive:github.timeline] WHERE type = "CreateEvent" AND payload_ref IS NULL

Select a subset of the repositories by URL. This will find roughly 10,000 repository URLs.

SELECT repository_url FROM [githubarchive:github.timeline] WHERE type = "CreateEvent" AND payload_ref IS NULL AND RAND() < 10000/13626213

Export the results to CSV.

Then, pull down this repository.

git@github.com:kdaigle/random-repositories.git

Run bundle install. Set the path_to_bigquery_csv variables to the full file path to your CSV. I've included a sample file here that you can replace.

Setup a GitHub personal access token with public_repo scope.

Run the script with your personal access token. Note: you can only make 5,000 requests per hour to the API. If you need more repositories than that, you'll need to adjust the script to split into groups.

ACCESS_TOKEN=yourPersonalAccessToken ruby random_repos.rb

About

Do you need a sample of public GitHub repositories?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages