<img src="rc_logo.png">

#USGS Data Conversion and Cleaning#
<hr>


##Converting a CSV text file to a NetCDF file in python.##
Wednesday 24th of June 2015
<hr>
<ol>
<li>[Import the modules](#import)</li>
<li>[Read the CVS](#csv)</li>
<li>[Inspect the data](#inspect)</li>
<li>[Create a NetCDF file](#netcdf)</li>
<li>[Add metadata](#meta)</li>
<li>[Create the variables](#vars)</li>
<li>[Write the data](#data)</li>
<li>[Close the file](#close)</li>
</ol>

<a id='import'></a>

##Import all the needed modules

In [1]:
import numpy as np
import netCDF4 as nc
import pandas as pd
import datetime as dt

<a id='csv'></a>

##Read in the CVS file using pandas

In [2]:
filename = 'Walruses.csv'
data = pd.read_csv(filename, parse_dates=True, thousands=',')

In [3]:
epoch = dt.datetime(1970,1,1)
time = [ t.total_seconds() for t in (pd.to_datetime(data['DateTimeUTC']) - pd.to_datetime(epoch))]
print(time[0:2])

[1212261900.0, 1212290640.0]


<a id='inspect'></a>

##Quickly inspect the data for correctness

In [12]:
longLat = data[['Walrus', 'Longitude', 'Latitude']]
longLat[2:10:2]

Unnamed: 0,Walrus,Longitude,Latitude
2,271,-168.44436,65.587969
4,271,-168.489215,65.742984
6,271,-168.376483,66.175663
8,271,-168.265869,66.557191


<a id='netcdf'></a>

##Create a NetCDF

In [4]:
ncf = nc.Dataset('Walruses.nc', mode='w')
print(ncf)

<type 'netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): 
    variables(dimensions): 
    groups: 



<a id='meta'></a>

##Create the metadata, attributes and conventions

In [5]:
ncf.createDimension('time', None)
ncf.createDimension('longitude', len(data[['Longitude']]))
ncf.createDimension('latitude', len(data[['Latitude']]))
print(ncf.dimensions)

OrderedDict([('time', <type 'netCDF4.Dimension'> (unlimited): name = 'time', size = 0
), ('longitude', <type 'netCDF4.Dimension'>: name = 'longitude', size = 454
), ('latitude', <type 'netCDF4.Dimension'>: name = 'latitude', size = 454
)])


In [8]:
ncf.title        = ''
ncf.history      = ''
ncf.comment      = ''
ncf.source       = ''
ncf.references   = ''
ncf.Conventions  = "CF-1.6"
ncf.date_created = dt.datetime.utcnow().strftime('%d/%m/%Y %H:%M:%S.%f')
ncf.institution  = 'USGS'

454


<a id="vars"></a>

##Create the variables

In [None]:
time = ncf.createVariable('time', 'f8', ('time',), zlib=True)
time.units = 'seconds since 1970-01-01 00:00:00 UTC'
time.standard_name = 'time'
time.long_name = 'Epoch Time'

In [None]:
lon = ncf.createVariable('longitude', 'f8', ('longitude',), zlib=True, least_significant_digit=8)
lon.units = 'degrees_east'
lon.standard_name = 'longitude'
lon.long_name = 'Longitude'

In [6]:
lat = ncf.createVariable('latitude', 'f8', ('latitude',), zlib=True, least_significant_digit=8)
lat.units = 'degrees_north'
lat.standard_name = 'latitude'
lat.long_name = 'Latitude'

<type 'netCDF4.Variable'>
float64 latitude(latitude)
unlimited dimensions: 
current shape = (454,)
filling on, default _FillValue of 9.96920996839e+36 used

<a id='data'><a>

##Write the data to the NetCDF file

In [9]:
ncf.variables['time'][:] = time
ncf.variables['longitude'][:] = data['Longitude'].values
ncf.variables['latitude'][:] = data['Latitude'].values

<a id='close'></a>

##Close the NetCDF

In [10]:
ncf.close()