This is an extension to the gh
command-line tool for analyzing the count of programming languages used in repositories across a GitHub enterprise or organization. It retrieves a list of repositories and their associated languages, and then aggregates the data to produce a report of language frequency.
Note
If you are looking to compare your language frequency against public trends, you can access quarterly data from 2020 onward here as part of GitHub's Innovation Graph project.
- Install the GitHub CLI: https://github.com/cli/cli#installation
- Confirm that you are authenticated with an account that has access to the enterprise/org you would like to analyze:
gh auth status
Ensure that you have the necessary scopes. For example, if you are analyzing an organization, you need the repo
scope and for enterprises you need the read:enterprise
scope. You can add scopes by running:
gh auth login -s "repo,read:enterprise"
Important
Enterprise owners do not inherently have access to all of the repositories across their organizations. You must ensure that your account has the necessary permissions to access the repositories you want to analyze.
To install this extension, run the following command:
gh extension install CallMeGreg/gh-language
Tip
Each command has default limits to prevent accidental excessive API usage. You can adjust these limits using the --org-limit
and --repo-limit
flags. To analyze all repositories in an organization or enterprise, set these flags to a very high number (e.g., 1000000
).
The following flags are available for all commands:
--org
or--enterprise
: Specify the organization or enterprise to analyze. These flags are mutually exclusive, and one of them is required.--org-limit
: Limit the number of organizations to analyze (default is 5).--repo-limit
: Limit the number of repositories to analyze per organization (default is 10).--top
: Return the top N languages (default is 10).--language
: Filter results by a specific programming language (case-sensitive).--codeql
: Restrict analysis to CodeQL-supported languages.
Note
The --top
, --language
, and --codeql
flags are mutually exclusive.
When the --codeql
flag is set, the analysis will only include the following languages:
- C
- C++
- C#
- Go
- HTML
- Java
- Kotlin
- JavaScript
- Python
- Ruby
- Swift
- TypeScript
- Vue
Display the count of each programming language used in repos across an enterprise or organization.
gh language count --enterprise YOUR_ENTERPRISE_SLUG
gh-language-count.mov
Display the breakdown of programming languages used in repos across an enterprise or organization per year, based on the repo creation date.
gh language trend --enterprise YOUR_ENTERPRISE_SLUG
gh-language-trend.mov
Analyze languages by bytes of data, rather than count, across repositories in an enterprise or organization.
gh language data --enterprise YOUR_ENTERPRISE_SLUG
Specify the unit for displaying data with the --unit
flag. Supported units are bytes
, kilobytes
, megabytes
, and gigabytes
. The default is bytes
.:
gh language data --enterprise YOUR_ENTERPRISE_SLUG --unit megabytes
gh-language-data.mov
Analyze the top 20 languages used across all repositories in an enterprise:
gh language count --enterprise YOUR_ENTERPRISE_SLUG --org-limit 1000000 --repo-limit 1000000 --top 20
Analyze the trend of Rust usage in repositories across an organization, limited to the first 100 repositories:
gh language trend --org YOUR_ORG_SLUG --repo-limit 100 --language Rust
Analyze the top 5 languages, based on data size, in megabytes, used across all repositories in an organization:
gh language data --org YOUR_ORG_SLUG --repo-limit 1000000 --top 5 --unit megabytes
Analyze all CodeQL-supported languages in an enterprise across all repositories:
gh language count --enterprise YOUR_ENTERPRISE_SLUG --org-limit 1000000 --repo-limit 1000000 --codeql
gh-language-count-codeql.mov
The count
and trend
commands have been optimized to use GitHub's GraphQL API, which provides significant performance improvements over the REST API:
- Reduced API calls: GraphQL fetches repository and language data for 100 repositoreis in a single request, compared to the REST API which requires a single request for each repository.
- Better rate limiting: GraphQL API has different rate limits than REST API, often allowing for more processing before hitting limits
The data
command continues to use the REST API as it requires detailed byte-level language statistics that are only available through the REST endpoints.
For help, run:
gh language -h
Usage:
language [command]
Available Commands:
count Analyze the count of programming languages used in repos across an organization
data Analyze language data by bytes
help Help about any command
trend Analyze the trend of programming languages used in repos across an organization over time
Flags:
--codeql Restrict analysis to CodeQL-supported languages
-e, --enterprise string Specify the enterprise
-h, --help help for language
-l, --language string The language to filter on (case-sensitive)
-o, --org string Specify the organization
--org-limit int The maximum number of organizations to evaluate for an enterprise (default 5)
--repo-limit int The maximum number of repositories to evaluate per organization (default 10)
-t, --top int Return the top N languages (ignored when a language is specified) (default 10)
Use "language [command] --help" for more information about a command.
This tool is licensed under the MIT License. See the LICENSE file for details.