Skip to content

Latest commit

 

History

History
31 lines (24 loc) · 1.22 KB

README.md

File metadata and controls

31 lines (24 loc) · 1.22 KB

Gilded Age

Summary

Newspaper content analysis project tailored toward archives that contain records from the Civil War, Reconstruction, and Gilded Age eras of American history. In particular, we focus on textual and content analytics to identify local and national trends in corruption.

Proposal

Here

Goals

  • Create scrapers for multiple historical archives.
  • Create analyzers for semantic analysis and classification.
    • OpenCalais
  • Provide an API for creating queries and manipulating results as objects.
  • Provide graphing and other visualization capabilities
    • Using NetworkX with graphviz:
      • Basic relational graphs
      • Histograms
    • Using Circos:
      • Complex relational graphs
  • Other:
    • Tag clouds for simple words, phrases, and extracted semantic concepts.
  • Use open semantic sources like Freebase to:
    • generalize groups of similar people, things, or concepts.
    • pinpoint related concepts with greater accuracy.
  • Use statistical techniques like clustering to:
    • reveal relationships between people, things, and concepts.