# Reading data from an HTML web page

The goal of this notebook session is to read a data file that is displayed as an HTML table on a website. 

The data we will use in this session is from a non-active research site of the *Long Term Ecological Research Network*, called *North Inlet LTER*. The data consist of daily water samples from from 1978 to 1992. This data is available from the *Environmental Data Initiative* (EDI) [data repository](https://portal.edirepository.org/nis) under the repository identifier [knb-lter-nin.1.1](https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-nin&identifier=1).

In [1]:
!pip install lxml
!pip install beautifulsoup4



In [3]:
from bs4 import BeautifulSoup

In [4]:
with open('./data.html', 'r') as f:
    soup = BeautifulSoup(f, 'lxml')

In [5]:
html_table = soup.find('table')
table_rows = html_table('tr')

table = []
for table_row in table_rows[1:]:
    row = []
    row_data = table_row('td')
    for data_token in row_data:
        row.append(data_token.text)
    table.append(row)

In [6]:
for head in table[:9]:
    print(head)

['9/1/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '2', '-9.9', '-9.9']
['9/2/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '2', '-9.9', '-9.9']
['9/3/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '1', '-9.9', '-9.9']
['9/4/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '1', '-9.9', '-9.9']
['9/5/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '1', '-9.9', '-9.9']
['9/6/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '2', '-9.9', '-9.9']
['9/7/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '2', '-9.9', '-9.9']
['9/8/1978', 'TC', '-9.9', '-9.9', '-99.9', '-99.9', '-9.9', '-9.9', '-9.999', '-9.9', '-9.99', '-999', '6', '-9.9', '-9.9']
