Summary
Newspaper content analysis project tailored toward archives that contain records from the Civil War, Reconstruction, and Gilded Age eras of American history. In particular, we focus on textual and content analytics to identify local and national trends in corruption.
Proposal
Goals
- Create scrapers for multiple historical archives.
- Create analyzers for semantic analysis and classification.
- OpenCalais
- Provide an API for creating queries and manipulating results as objects.
- Provide graphing and other visualization capabilities
- Using NetworkX with graphviz:
- Basic relational graphs
- Histograms
- Using Circos:
- Complex relational graphs
- Using NetworkX with graphviz:
- Other:
- Tag clouds for simple words, phrases, and extracted semantic concepts.
- Use open semantic sources like Freebase to:
- generalize groups of similar people, things, or concepts.
- pinpoint related concepts with greater accuracy.
- Use statistical techniques like clustering to:
- reveal relationships between people, things, and concepts.