<img src="rc_logo.png">

#USGS Data Conversion and Cleaning#
<hr>


##Converting a CSV text file to a NetCDF file in python.##
Wednesday 24th of June 2015
<hr>
<ol>
<li>[Import the modules](#import)</li>
<li>[Read the CVS](#csv)</li>
<li>[Inspect the data](#inspect)</li>
<li>[Create a NetCDF file](#netcdf)</li>
<li>[Add metadata](#meta)</li>
<li>[Create the variables](#vars)</li>
<li>[Write the data](#data)</li>
<li>[Close the file](#close)</li>
</ol>

<a id='import'></a>

##Import all the needed modules

In [1]:
import numpy as np
import netCDF4 as nc
import pandas as pd
import datetime as dt

<a id='csv'></a>

##Read in the CVS file using pandas

In [2]:
data = pd.read_csv('Walruses.csv', thousands=',')

In [3]:
epoch = dt.datetime(1970,1,1)
data['time'] = [t.total_seconds() for t in (pd.to_datetime(data['DateTimeUTC']) - pd.to_datetime(epoch))]
print(data['time'][0:2])

0    1212261900
1    1212290640
Name: time, dtype: float64


<a id='inspect'></a>

##Quickly inspect the data for correctness

In [4]:
longLat = data[['Walrus', 'Longitude', 'Latitude']]
longLat[2:10:2]

Unnamed: 0,Walrus,Longitude,Latitude
2,271,-168.44436,65.587969
4,271,-168.489215,65.742984
6,271,-168.376483,66.175663
8,271,-168.265869,66.557191


In [5]:
walrus_271 = data[data.Walrus == 271]
walrus_281 = data[data.Walrus == 281]
walrus_322 = data[data.Walrus == 322]
print(walrus_271.head(4))

   Walrus      DateTimeUTC    Xcoord     Ycoord    Behav   Longitude  \
0     271  5/31/2008 19:25  95616.95 -528324.60  1.00900 -167.956095   
1     271    6/1/2008 3:24  84741.71 -511653.75  1.00050 -168.177987   
2     271   6/1/2008 11:24  71834.45 -491176.95  1.00625 -168.444360   
3     271   6/1/2008 19:24  65275.80 -478935.62  1.02025 -168.580284   

    Latitude        time  
0  65.248715  1212261900  
1  65.401217  1212290640  
2  65.587969  1212319440  
3  65.699143  1212348240  


<a id='netcdf'></a>

##Create a NetCDF

In [6]:
ncf = nc.Dataset('Walrus_271.nc', mode='w')
print(ncf)

<type 'netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): 
    variables(dimensions): 
    groups: 



<a id='meta'></a>

##Create the metadata, attributes and conventions

In [7]:
ncf.title        = ''
ncf.history      = ''
ncf.comment      = ''
ncf.source       = ''
ncf.references   = ''
ncf.Conventions  = "CF-1.6"
ncf.date_created = dt.datetime.utcnow().strftime('%d/%m/%Y %H:%M:%S.%f')
ncf.institution  = 'USGS'

In [8]:
ncf.createDimension('time', None)
ncf.createDimension('longitude', len(walrus_271[['Longitude']]))
ncf.createDimension('latitude', len(walrus_271[['Latitude']]))

<type 'netCDF4.Dimension'>: name = 'latitude', size = 314

<a id="vars"></a>

##Create the variables

In [9]:
time = ncf.createVariable('time', 'f8', ('time',), zlib=True)
time.units = 'seconds since 1970-01-01 00:00:00 UTC'
time.standard_name = 'time'
time.long_name = 'Epoch Time'

In [10]:
lon = ncf.createVariable('longitude', 'f8', ('longitude',), zlib=True, least_significant_digit=8)
lon.units = 'degrees_east'
lon.standard_name = 'longitude'
lon.long_name = 'Longitude'

In [11]:
lat = ncf.createVariable('latitude', 'f8', ('latitude',), zlib=True, least_significant_digit=8)
lat.units = 'degrees_north'
lat.standard_name = 'latitude'
lat.long_name = 'Latitude'

<a id='data'><a>

##Write the data to the NetCDF file

In [12]:
w271 = ncf.createVariable('walrus_271', 'u2', ('time', 'latitude', 'longitude'), 
                             zlib=True)
w271[:] = 1
w271.id = 271
w271.standard_name = 'walrus_271'
w271.long_name = 'Rufus the Walrus, 271'

In [13]:
ncf.variables['time'][:] = walrus_271['time'].values
ncf.variables['longitude'][:] = walrus_271['Longitude'].values
ncf.variables['latitude'][:] = walrus_271['Latitude'].values

<a id='close'></a>

##Close the NetCDF

In [14]:
ncf.close()