LocationNormalization

##Running the code

#Approach For each line of input, we go through the following steps:

Convert all characters to lower case for normalization. Although upper/lower case information can be used as a feature for identifying location entities, this feature is not fully reliable, so it is ignored for now
Convert the result from step 1 to a list of tokens by spliting it on non-letter characters. Because city, state names only contains letters, all other information is ignored even they may also be used as features sometimes
Get possible entities by checking whether consecutive tokens forms valid city, state names. This is done by checking the string against HashMaps that contain U.S. state, city names. HashMaps are used here because it provides fast lookup and the dataset is small enough to be fitted in the memory. (If spell check is to be implemented, it would be used here, e.g., if no entity is found in the list of tokens, we can get the words with similar spelling using k-gram indexes, and try those words)
Match possible city entity with posible state entities
Format the output

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback