Language analysis platform for historical documents from the Reconstruction and Gilded Age eras of American history.
Python Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
analyzer
cfg
circos
db
docs
lib
linker
mirrors
scraper
.gitignore
README.md
cleanoutput.sh
constants.py
fts.py
main.py
utils.py

README.md

Gilded Age

Summary

Newspaper content analysis project tailored toward archives that contain records from the Civil War, Reconstruction, and Gilded Age eras of American history. In particular, we focus on textual and content analytics to identify local and national trends in corruption.

Proposal

Here

Goals

  • Create scrapers for multiple historical archives.
  • Create analyzers for semantic analysis and classification.
    • OpenCalais
  • Provide an API for creating queries and manipulating results as objects.
  • Provide graphing and other visualization capabilities
    • Using NetworkX with graphviz:
      • Basic relational graphs
      • Histograms
    • Using Circos:
      • Complex relational graphs
    • Other:
      • Tag clouds for simple words, phrases, and extracted semantic concepts.
  • Use open semantic sources like Freebase to:
    • generalize groups of similar people, things, or concepts.
    • pinpoint related concepts with greater accuracy.
  • Use statistical techniques like clustering to:
    • reveal relationships between people, things, and concepts.