GitHub - writecrow/crow_training: Examples, scaffolding, and sandbox for Crow code tools

Crow Training Materials

This is a general-purpose repository for learning the central concepts and main tools used by the Corpus and Repository of Writing team for text processing.

Command Line

This directory includes example scripts written in bash, primarily demonstrating how scripts written in various computer programming languages can be chained together to perform a series of steps on given texts.

Python

basic_concepts includes scripts that use the simplest python commands to open a given text file, tokenize the words, and perform computational analysis like identifying average sentence length and finding sentences containing passive voice.
better_coding builds on many of the other lessons in this section to introduce ways to make your code:

more readable through use of comments and adherence to PEP8 standards
more re-usable through use of functions and imports
more user-friendly by using parameters instead of hardcoded values
more reliable by integrating tests

conversion focuses on methods to manipulating text input, including:

conversion from Word or PDF formats to plaintext
conversion into UTF-8 format

natural_language_toolkit includes code snippets that illustrate using the Natural Language Toolkit (https://nltk.org), including:

tokenization
text cleaning
collocations

regex provides sample code snippets for various methods of complex parsing of texts, including:

Finding a "header" value in markup like <Program: XXX>
Finding all header values and putting into a manipulable list
Finding all matches of phrases like "were xxx by" or "was xxx by"

remote_data demonstrates basic methods for retrieving content from non-local files

Sandbox

This is a directory to be used for simply trying out coding ideas, and using Git in order to keep track of that code. One of the team scaffolding assignments is to create an issue in this repository, create a branch that adds a file to this "Sandbox" directory, and open that as a "pull request" for team review.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
command_line		command_line
python		python
sandbox		sandbox
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crow Training Materials

Command Line

Python

Sandbox

About

Releases

Packages

Contributors 9

Languages

License

writecrow/crow_training

Folders and files

Latest commit

History

Repository files navigation

Crow Training Materials

Command Line

Python

Sandbox

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 9

Languages

Packages