Skip to content

Commit

Permalink
Merge pull request #375 from bertsky/charfreq-grapheme-cluster
Browse files Browse the repository at this point in the history
charfreq: use PCRE to operate on grapheme clusters instead of codepoints
  • Loading branch information
zdenop committed Mar 2, 2024
2 parents 16df5ab + 24ca345 commit b2a5a29
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ unicharset: $(OUTPUT_DIR)/unicharset

# Show character histogram
charfreq: $(ALL_GT)
LC_ALL=C.UTF-8 grep -o . $< | sort | uniq -c | sort -rn
LC_ALL=C.UTF-8 grep -P -o "\X" $< | sort | uniq -c | sort -rn

# Create lists of lstmf filenames for training and eval
lists: $(OUTPUT_DIR)/list.train $(OUTPUT_DIR)/list.eval
Expand Down

0 comments on commit b2a5a29

Please sign in to comment.