Description
https://github.com/github/linguist
Linguist is a tool developed by GitHub for the specific purpose of detecting languages. It's a very mature tool that gets it right the majority of the time by using complex rules.
https://github.com/github/linguist
Linguist is a tool developed by GitHub for the specific purpose of detecting languages. It's a very mature tool that gets it right the majority of the time by using complex rules.
Activity
o2sh commentedon Feb 17, 2019
But why ?
https://github.com/Aaronepower/tokei is written in Rust and does a great job detecting languages.
aaronfranke commentedon Feb 17, 2019
Is that what Onefetch currently uses? It detects C++ as C in the case of Godot, and it didn't detect anything for the repo of a Godot project (while GitHub detects GDScript).
o2sh commentedon Feb 17, 2019
it only detects the languages that are currently supported by onefetch (WIP):
Also tokei ignores all commented lines which is why the language distribution sometimes differs from GH.
Supported languages by tokei --> https://github.com/Aaronepower/tokei#supported-languages
aaronfranke commentedon Mar 10, 2019
Upstream issues: XAMPPRocky/tokei#305 and XAMPPRocky/tokei#67
We can leave this closed though if you want.
[-]Use GitHub Linguist to detect languages[/-][+]Improve language detection system to recognize C++ headers[/+]o2sh commentedon Mar 10, 2019
Ok, with the new title it makes more sense to keep this open.
We'll wait for tokei to fix it then.
Thx @aaronfranke
stale commentedon Aug 21, 2020
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
aaronfranke commentedon Aug 21, 2020
This issue still exists, though it is likely seen by the devs as low priority, so I'll probably have to bump this again later to please the
stale
bot.stale commentedon Nov 20, 2020
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
39 remaining items
spenserblack commentedon Apr 26, 2023
dsully commentedon Apr 26, 2023
spenserblack commentedon Apr 26, 2023
spenserblack commentedon Aug 14, 2023
Hey everyone following this 👋
There's been a bit of discussion here, but to keep you all up to date: I went ahead and started a project called gengo that should be more linguist-like, to hopefully improve our language detection eventually. Unlike tokei, there can be file extension collisions, and gengo will try to pick the right language using heuristics. For example, for this comment, it would need to register
ts
as an XML file extension, and include a heuristic to be confident that the.ts
file is actually XML.But right now, gengo doesn't support nearly enough languages. While I can just grab the data from linguist (and maybe I eventually will), right now I'm hoping that language support grows more organically, with discussion for each added language. So if you'd like to contribute, please do! I'll definitely need help with languages that I'm unfamiliar with, especially when it comes to adding heuristics, for example for C and C++
.h
header files.Edit: See spenserblack/gengo#34
fenio commentedon Nov 10, 2023
spenserblack commentedon Nov 10, 2023
fenio commentedon Nov 11, 2023
spenserblack commentedon Nov 12, 2023