RAC

Work for the Harvard Royal African Company (RAC) research project, centered primarily on computational textual analysis of historical letters related to the RAC.

Directory Structure

Raw HTML

Digital versions of the original paper texts were scraped from Oxford Scholarly Editions Online using Web Scraper. A dump of the scraped data in .csv format is stored in raw_html, organized by volume number.

Jupyter Notebooks

The code for cleaning and data manipulation are contained within the following Jupyter Notebooks:

Clean-Scraped.ipynb - Extraction of letter numbers from raw HTML, extraction of letter text and joining with metadata based on volume and letter number.

`csv` Directory

This directory contains various files of the text data at various stages of preprocessing, along with a hand-curated metadata file:

csv/032818_RAC_Networks_Database.xlsx - Hand-compiled metadata on the letters, with key information including unique identifiers (UID), date, place and author
csv/texts.csv - Extracted letter texts with corresponding volume and letter number in .csv format.
metadata_text_merged.csv - Joined data file of the text file texts.csv and the metadata file, joined on volume number and letter number. This should be the primary reference data file.

Questions?

Email wenzhexue2014@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
aliases		aliases
csv		csv
ocr_deprecated		ocr_deprecated
scraped		scraped
word2vec		word2vec
.gitignore		.gitignore
Clean-Scraped.ipynb		Clean-Scraped.ipynb
Cooccurrence-Analysis.ipynb		Cooccurrence-Analysis.ipynb
Date-Analysis.ipynb		Date-Analysis.ipynb
Edit-Distance.ipynb		Edit-Distance.ipynb
Exploratory-Analysis.ipynb		Exploratory-Analysis.ipynb
Generate-Search-Spreadsheets.ipynb		Generate-Search-Spreadsheets.ipynb
Helper-Sort-Sender-Names.ipynb		Helper-Sort-Sender-Names.ipynb
Interactive-Search.ipynb		Interactive-Search.ipynb
Interloper-Analysis.ipynb		Interloper-Analysis.ipynb
Most-Common-Terms.ipynb		Most-Common-Terms.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAC

Directory Structure

Raw HTML

Jupyter Notebooks

`csv` Directory

Questions?

About

Releases

Packages

Languages

xueharry/RAC

Folders and files

Latest commit

History

Repository files navigation

RAC

Directory Structure

Raw HTML

Jupyter Notebooks

csv Directory

Questions?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`csv` Directory

Packages