## `lab11`—The Weather Forecast

❖ Objectives

-   Engage in pair programming exercises to build team programming skills.
-   Run Python script from command line window
-   Use pipe to pass program output to another program or file

<div class="alert alert-warning">
**Pair Programming**
<br />
This lab is built around *pair* programming—you need to work in pairs, although you need not be at a single machine unless you prefer to work that way.  At the end, when you report collaborators, please report the names and IDs of all partners in this lab exercise.  (In exceptional cases, such as the room layout, trios are permitted.)
</div>

The big picture:  we have a series of Python scripts which do the following:

1.  `grab_stations` retrieves a list of National Weather Service stations and prints the station call sign and latitude/longitude.
2.  `grab_forecast` accepts the foregoing data and retrieves the current temperature or a forecast, adding this as a column to a similar output.
3.  `plot_forecast` plots the resulting temperatures against a map of the state of Illinois.

You are going to determine how to compose scripts 1 and 2, and then work together as a team to implement the full toolchain.

The lab will take place in three parts:

1.  The Instructor will teach a brief lesson covering how the pieces of the toolchain work and what you need to implement.
2.  The class will divide into groups of two.  For each group, one partner will write the `grab_stations` script, while the other will write the `grab_forecast` script. You may share code in class using [this Etherpad](https://public.etherpad-mozilla.org/p/Pa6X2Bzq0h), but please be careful in using it as a common resource.
3.  Once your code works, you and your partner will put your pieces together to make the end result work and plot the figure.

### Working from the Command Line

We've worked in the Jupyter notebook almost exclusively for the labs, but this time we're going to run some code directly on the command line.  Open a command line window to work in (Windows/type `cmd`). You will see a command line window prompts out. You then change the working directory by the following command:  (You shouldn't type the `$`—it's there to remind you that the code is at the prompt window rather than in Python.)
    
    $ cd Documents\lab13

One of the current Python scripts can be executed by simply telling Python to run a file (instead of to start the interpreter):
    
    $ python grab_stations.pyc

<div class="alert alert-info">
We're doing something a little funny here—note that `pyc` extension instead of `py`.  This is a *compiled* Python file, which means that Python has stored the script in a faster intermediate code that isn't human-readable anymore.  (These get replaced if you edit and run the script, so they're never a problem.)  Since you need to *write* the scripts but still need to *test* the toolchain, we provide these `pyc` files as a substitute for testing.
</div>

To run the entire toolchain, the command line lets you take the *output* from one script and use it as the *input* to another.  Thus:
    
    $ python grab_stations.pyc | python grab_forecast.pyc | python plot_forecast.py

This takes a moment to run and then displays a list of Illinois weather stations and today's forecast high temperatures visible.

### Reading Input from another program

Note the `|` operator in the above command: this takes the output of the first program and *pipes* it as the input to the next program. It is a straightforward way to pass output from one program to another.

The output of the first program is passed as *standard input stream*, or `stdin` into a string that you can parse in the second program:

In [7]:
# this won't show anything in Jupyter because there's no `stdin`, but here's what you would do:
import sys
for line in sys.stdin:
    print(line)

To see how this works in command line, I wrote a simple test code to try out:

program1.py:

    print('hello world')
    print('another line to print..')

program2.py:
    
    import sys
    from line in sys.stdin:
        print('I receive this line: %s'%line.strip())
        
Now try the following from command line:

    $ python program1.py | python program2.py

### Accessing Web Data:  `requests`

Programmers and scientists often need to access data which are not located directly on the hard drive, so `open` won't take care of it.  If the data are available on the web, we can use the `requests` library to access the server, grab the data, and then parse it as a single string (just like `read()` for a file).

This example grabs a web page and outputs the HTML markup code underlying the page:

In [None]:
import requests

url = 'http://www.nws.noaa.gov/mdl/gfslamp/lavlamp.shtml'
page = requests.get(url)

print(page.text)

This example grabs data from an online source:

In [None]:
url = 'https://raw.githubusercontent.com/davis68/cs101-example/master/exoplanets.csv'
planets = requests.get(url)

print(planets.text)

Part of this lab will involve crawling data from a web page.  This means that you have to pull a large chunk of data (the web page as a string) and then fish around to find the exact datum you are looking for. The scripts are written to cache the web files locally so that you don't need to crawl the page every time you run (for saving your time).

### Forming your team

Each one of you should form a team with another partner for this session. One of you will be in charge of finishing the `grab_stations.py`, the other one will be in charge of finishing `grab_forecast.py`.

If you are in charge of writing `grab_stations.py`, then you can use `grab_forecast.pyc` to test your code and make it work:
    
    $ python grab_stations.py | python grab_forecast.pyc
    
If you are in charge of writing `grab_forecast.py`, then you can use `grab_stations.pyc` to test your code and make it work:
    
    $ python grab_stations.pyc | python grab_forecast.py

Again, the `pyc` files are intermediate code compiled from the final scripts. You may use one of them for your testing if you are working on one of the scripts. This makes it easier for you to debug your code. 

The output of these commands should be a list of records, each line containing a station name, lon, lat and the highest temperature data of the day. The `plot_forecast.py` script takes these as input and plots out the final whether forecast figure showed at the bottom of the page.

##  `grab_stations.py`

If you are on this script, you'll need to open the skeleton code file `grab_stations.py` from Jupyter, and write the unfinished part of this script.  The program has the following structure:

    import requests

    def grab_website_data():
        '''Get raw data as HTML string from the NOAA website.'''
        
        page_file = "./stations.txt"
        if not os.path.isfile(page_file):
            url = 'http://www.nws.noaa.gov/mdl/gfslamp/docs/stations_info.shtml'
            
            ## YOUR CODE HERE
            
            f = open(page_file,'w')   #cache the page content to a local file 
            f.write(page.text)
            f.close()
            return page.text
        else:
            f = open(page_file,'r')  #read from cached html file
            data = f.read()
            f.close()
            return data

    def extract_section(text):
        '''Find Illinois data segment (in a PRE tag).
        We know (from examination) that inside of the PRE block containing ' IL '
        (with whitespace and case matching) we can find the IL station data.
        This solution isn't robust, but it's good enough for practical cases.'''
        
        il_start  = text.find(' IL ')
        tag_start = text.rfind('PRE', il_start-200, il_start) # look backwards
        tag_end   = text.find('PRE', il_start)
        data = text[tag_start+4:tag_end-2]
        lines = data.splitlines()
        lines = lines[2:-1]
        return lines

    def parse_station_line(line):
        '''Extract latitude and longitude of stations. We know the columns are fixed
        (which is both inconvenient and convenient). In this case, we will simply
        set the limits of the relevant columns by counting the number of columns
        over we need to go.'''
        
        pass  # YOUR CODE HERE
        
        return stn, lon, lat
        

    text = grab_website_data()
    lines = extract_section(text)
    for line in lines: 
        try:
            stn, lon, lat = parse_station_line(line)
            print('%s\t%f\t%f'%(stn,lon,lat))
        except:
            print('Could not parse line\n\t%s'%line)

To test:
    
    $ python grab_stations.py | python grab_forecast.pyc

or
    
    $ python grab_stations.py | python grab_forecast.pyc |　python plot_forecast.py

## `grab_forecast.py`

If you are on this script, you'll need to open the skeleton code file `grab_forecast.py` from Jupyter, and write the unfinished part of this script.  The program has the following structure:

    def grab_stdin(text=sys.stdin):
        '''Get input stations from stdin.'''
        stns = []
        locx = []
        locy = []
        for line in text:
            try:
                pass  # YOUR CODE HERE
            except:
                print('Could not parse line \n\t"%s"'%line)
        return stns, locx, locy

    def grab_forecast_data():
        '''Get raw data as HTML string from the NOAA website.'''
        
        page_file = ".forecast.txt"
        if not os.path.isfile(page_file):
            url = 'http://www.nws.noaa.gov/mdl/gfslamp/lavlamp.shtml'
            
            ## YOUR CODE HERE
            
            f = open(page_file,'w')   #cache the page content to a local file 
            f.write(page.text)
            f.close()
            return page.text
        else:
            f = open(page_file,'r')   #read from cached html file 
            data = f.read()
            f.close()
            return data

    def get_station_temp(temp_data, stn):
        '''We have a list of Illinois stations from the sites loaded previously.
           We need to load the data for each of those sites and store these data 
           locally.  There are a lot of data included here, but we are only 
           interested in one:  the current temperature, located at the index
           offset 169 and of length 2 (found by examination).
        '''
        
        tag_start = temp_data.find(stn)
        if tag_start == -1:
            T = float('NaN')
            return
        tag_end = tag_start + 1720 #each text block is 1720 characters long
        T = float(temp_data[tag_start+169:tag_start+172])
        return T
    
    stns, locx, locy = grab_stdin()
    temp_data = grab_forecast_data()
    for stn,lat,lon in zip(stns, locx, locy):
        temp = get_station_temp(temp_data, stn)
        print('%s\t%f\t%f\t%f'%(stn,lon,lat,temp))

To test:
    
    $ python grab_stations.pyc | python grab_forecast.py

or

    $ python grab_stations.pyc | python grab_forecast.py | python plot_forecast.py

### Putting It All Together

Now you and your partner need to collaborate to make these codes work together.  There's not a really easy way to share the files directly with the tools available in our CS101 lab, so we suggest that you email your script to one another so you can get them both in the same folder (or simply use a USB drive).  This will, of course, overwrite the existing skeleton code file in that folder.

Once you've got the updated file from your partner, you can run the toolchain without using the .pyc file anymore. And one last thing: use the `>` operator in command line to *redirect* the stdout to a file called `output.txt`
    
    $ python grab_stations.py | python grab_forecast.py | python plot_forecasts.py

If this doesn't work, debug and figure out where things went wrong with your partner.

If it works, This should plot a figure like this, but in your plot, there should be temperature information in the plot as well:

![](./img/figure_1.png)

### Stretch Goal

When reporting your work, please use the following format.
    
    grab_stations_[ID1].py
    grab_forecast_[ID2].py
    station_plot.png

Only one of you of the team need to send me the result. Please remember to `cc` the email to the other member of your team.

If you make it this far, awesome! This is all for today's session.

Try this out if there is more than twenty minutes left:  randomly pair with a different person in the lab and swap code with them. This should work if both of your scripts are correct. If not, try to figure out why your previous pair of codes worked and the new pairing does not..