Language Savant, Python clone of github/linguist.
PIP
pip install linguistEasy_install
easy_install linguistLinguist defines the list of all languages known to GitHub in a yaml file. In order for a file to be highlighted, a language and lexer must be defined there.
Most languages are detected by their file extension. This is the fastest and most common situation.
For disambiguating between files with common extensions, we use a Bayesian classifier. For an example, this helps us tell the difference between .h files which could be either C, C++, or Obj-C.
For testing, there is a simple FileBlob API:
from linguist.libs.file_blob import FileBlob
FileBlob('test.py').language.name #=> 'Python'
FileBlob('test_file').language.name #=> 'Python'See linguist/libs/language.py and lib/linguist/languages.yml.
The actual syntax highlighting is handled by our Pygments wrapper, pygments. It also provides a Lexer abstraction that determines which highlighter should be used on a file.
We typically run on a pre-release version of Pygments, pygments.rb, to get early access to new lexers. The languages.yml file is a dump of the lexers we have available on our server.
The Language Graph you see on every repository is built by aggregating the languages of all repo's blobs.
The repository stats API can be used on a directory:
from linguist.libs.repository import Repository
project = Repository.from_directory(".")
project.language.name #=> 'Python'
project.languages #=> defaultdict(<type 'int'>, {<Language name:Python>: 53446, <Language name:JavaScript>: 1991})
for lang, count in projects.iteritems():
print lang.name, count
#=> Python, 53446
#=> JavaScript, 1991These stats are also printed out by the binary. Try running linguist on itself.
Checking other code into your git repo is a common practice. But this often inflates your project's language stats and may even cause your project to be labeled as another language. We are able to identify some of these files and directories and exclude them.
from linguist.libs.file_blob import FileBlob
FileBlob('static/js/jquery-2.0.0.min.js').is_vendored #=> TrueSee BlobHelper#is_vendored and linguist/libs/vendor.yml.
from linguist.libs.file_blob import FileBlob
FileBlob('jquery-2.0.0.min.js').is_generated #=> True
FileBlob('app.coffee').is_generated #=> True* Fork the repository.
* Create a topic branch.
* Implement your feature or bug fix.
* Add, commit, and push your changes.
* Submit a pull request.cd tests/
python run.pyv0.0.1 [2013-04-22]
- Release v0.0.1
