# Programming Examples

David J. Thomas

----

Uses (and demonstrates) utility functions to quickly do DH work. To use in your notebooks...

```python
from library import WebPage, TextFile, CSVFile
```

---

## TextFile()

```python
TextFile().load()
TextFile().save()
```

**Loading, editing, and re-saving a text file**

A copy of the 1790 State of the Union Address has a slight issue... there are no spaces after the period ending every sentence. To fix this we need to load the file, then change every `.` into `. `, and then save the results to a different file.

This demonstrates loading and saving, including how to specify that you want to write over any existing file

In [2]:
import os
from library import TextFile

# build system path relative to the current directory
input_path = os.path.join(os.getcwd(), 'sample_data', 'state_of_union_1790.txt')
output_path = os.path.join(os.getcwd(), 'output', 'state_of_union_1790_cleaned.txt')

# load file data
text_data = TextFile(input_path).load()
# add a space after every period
text_data = text_data.replace('.', '. ')
# save results, sending optional options dict with overwrite flag set in case file already exists
TextFile(output_path).save(text_data, options={'overwrite': True})

print('Success')

Loading C:\Users\milma\git\notebook\dh-programming\sample_data\state_of_union_1790.txt
Saving to C:\Users\milma\git\notebook\dh-programming\output\state_of_union_1790_cleaned.txt
Success


## WebPage()

```python
WebPage().fetch()
WebPage().load()
```

An historical database has a page with useful data. This example downloads and parses webpages into a BeautifulSoup object, mines them for data, and 

In [1]:
# external imports
import os
# internal imports
from library import WebPage, TextFile

input_path = 'http://thePortus.com'
# builds full system path by joining the directory of this file to subdirectory and filename
output_path = os.path.join(os.getcwd(), 'output', 'thePortus.txt')

# make request and turn into beautifulsoup object
page_soup = WebPage(input_path).soup()
# get raw html as a string from beautifulsoup object
raw_html = page_soup.get_text()
# create a new TextFile object and then save to the designated path
TextFile(output_path).save(raw_html)

print('Success')


Fetching http://thePortus.com
Saving to C:\Users\milma\git\notebook\dh-programming\output\thePortus.txt
Success


## Converting CSV to text data

In [2]:
# external imports
import os
# internal imports
from library.files import CSVFile

input_path = os.path.join(os.getcwd(), 'sample_data', 'caesar_gallic_wars.csv')
output_path = os.path.join(os.getcwd(), 'output', 'caesar_gallic_wars.txt')

# load data as a list of dictionaries
csv_data = CSVFile(input_path).load()
# create placeholder to append text to
entire_text = ''
# loop through each row and append the value of 'text' column to entire_text
for data_row in csv_data:
    entire_text += ' ' + data_row['text']
# save the joined data
TextFile(output_path).save(entire_text)

print('Success')

Loading C:\Users\milma\git\notebook\dh-programming\sample_data\caesar_gallic_wars.csv
Saving to C:\Users\milma\git\notebook\dh-programming\output\caesar_gallic_wars.txt


True