# ISSUES WITH WRF-SMOKE NETCDF DATA

__working_filename__: `/workspace/Shared/Users/malindgren/wrf_smoke/raw/wrfout_d01_2017-06-15_00:00:00`

***things we know from the file itself and supplemental emails:***
- **proj4 string:**	`+proj=lcc +lat_1=65 +lat_2=65 +lat_0=65 +lon_0=-152 +R=6370000`
- **GRID METADATA**
    - GRID_ID:                         1
    - PARENT_ID:                       0
    - I_PARENT_START:                  1
    - J_PARENT_START:                  1
    - PARENT_GRID_RATIO:               1
    - CEN_LAT:                         65.0
    - CEN_LON:                         -152.0
    - TRUELAT1:                        65.0
    - TRUELAT2:                        65.0
    - MOAD_CEN_LAT:                    65.0
    - STAND_LON:                       -152.0
    - POLE_LAT:                        90.0
    - POLE_LON:                        0.0
    - GMT:                             0.0
    - JULYR:                           2017
    - JULDAY:                          166
    - MAP_PROJ:                        1
    - MAP_PROJ_CHAR:                   Lambert Conformal


In [1]:
# lets read in the file so we can examine some _stuff_ about it
import xarray as xr
import pandas as pd
import numpy as np
import rasterio

fn = '/Volumes/Shared/Users/malindgren/wrf_smoke/raw/wrfout_d01_2017-06-15_00-00-00_ML.nc'
ds = xr.open_dataset( fn )

  from pandas.tslib import OutOfBoundsDatetime


### We know (from Chris' email) that `XLONG` `XLAT` are the spatial dimensions we should use. *Note I am assuming these are in the source crs which was determined to be Lambert projection or WRF-GRID:01 (see above file metadata).

In [2]:
# grab only a single 2D slice of the dimension.
lons = ds[ 'XLONG' ][ 0 ]
lats = ds[ 'XLAT' ][ 0 ]

# resolution -- sort of hacky but should get us close if it is a regular grid (spoiler alert!: its not)
lons_diff = np.diff( lons )
print( 'LONGITUDES ARE NOT REGULARLY SPACED:' )
print( '  longitude diff min: {}\n  longitude diff max: {}\n'.format( lons_diff.min(), lons_diff.max() ) )
lats_diff = np.diff( lats )
print( 'LATITUDES ARE REGULARLY SPACED:' )
print( '  latitude diff min: {}\n  latitude diff max: {}'.format( lats_diff.min(), lats_diff.max() ) )

LONGITUDES ARE NOT REGULARLY SPACED:
  longitude diff min: 0.0818023681640625
  longitude diff max: 0.1420440673828125

LATITUDES ARE REGULARLY SPACED:
  latitude diff min: -0.01416015625
  latitude diff max: 0.01416015625


### using what we know from the file we can make a very simplistic [affine transform](http://www.perrygeo.com/python-affine-transforms.html) for these data, which should get us somewhat close to where we should be in space.

BUT as we have seen above the data are _not_ regularly spaced... So a common way of determining a resolution for the raster would be to take the `np.mean` of the non-regularly spaced longitudes.

In [3]:
lon_res = np.mean( lons_diff.ravel() )
lat_res = lats_diff.ravel()[0] # grab the first one since this dimension is regularly spaced.

import rasterio

# grab the upper-left corner - which is where GDAL (and rasterio) reads from. aka the origin.
ulx, uly = np.min( lons.data ), np.max( lats.data )
print( 'origin from file: ({},{})'.format(ulx, uly) )


origin from file: (-172.42266845703125,71.68465423583984)


Now, lets look at the ulx,uly values from the above cell, while comparing to the `GCPs` which we also received from the data provider.

`-gcp 0 0 -164.510605 57.652809 -gcp 0 298 -172.422668 70.599640 -gcp 298 0 -139.489395 57.652809 -gcp 298 298 -131.577332 70.599640`

A quick look at cell 0,0 (which would be the origin of the file) in the above gcps shows: `-164.510605, 57.652809`
This is obviously incorrect given the GDAL way of reading files, so we are not going to be able to use the row/col id's coupled with the x,y for the GCPs...  at least in the form it is now.  Also, according to Chris' most recent email, these GCP's characterize the corner points of the grid to be generated, which do not seem to match those in the `XLONG`/`XLAT` dimensions in the NetCDF file.

It apears that in the above GCPs `-gcp 0 298 -172.422668 70.599640` appears to be closest to the upper-left corner that we want for our affine transform origin, based on the `XLONG`/`XLAT` dimensions in the file. From the file, however, our origin is `-172.42266845703125, 71.68465423583984`, which is fine for longitudes, but for latitudes it is _close_, but not exact. This means that the data we are pulling from the file are more likely the corner points and the gcps are located somewhere within the cell it is related to.

Lets do a difference of the 2 latitude corners:

`71.68465423583984 - 70.599640 = 1.08501423583985`

Judging by the data coordinates, the data do not seem to have a resolution of >=1 degree (see spacing min/max above), so I am not sure what this GCP is in reference to...  Its not a centroid, which would be 1/2 the cell resolution in each direction, and it is not the upper-left corner which would mean it was the same as the max latitude in the NetCDF file, and it is not. Ultimately, something is incorrect in either the GCPs provided, _or_ the information in the NetCDF file. Hard to tell what is wrong.




In [11]:
# so in order to do something lets use the data 
# make an affine transform from the lons/lats
affine = rasterio.transform.from_origin( ulx, uly, lon_res, lat_res )
# affine = rasterio.transform.from_origin( -172.422668, 70.59963999, res[0], res[1] )

# read in the prepped file (generated earlier) where I have selected and re-stacked the data in a more sane way
out_ds = xr.open_dataset( '/Volumes/Shared/Users/malindgren/wrf_smoke/netcdf/wrfout_d01_PM2_5_DRY_2017-06-15_00-00-00.nc' )
print(out_ds)
variable = 'PM2_5_DRY'
crs_proj4 = '+proj=lcc +lat_1=65 +lat_2=65 +lat_0=65 +lon_0=-152 +R=6370000'
# convert proj4 to rasterio CRS mapping
crs = rasterio.crs.CRS.from_string( crs_proj4 )

# build some output metadata
time, levels, height, width = out_ds[ variable ].shape
meta = {'res':(lon_res, lat_res), 'transform':affine, 'height':height, 'width':width, 'count':1, 'dtype':'float32', 'driver':'GTiff', 'compress':'lzw', 'crs':crs }
out_ds_level = out_ds[ variable ].isel(level=0)

# this will output the data in its 'raw' crs for examination with other data before we try a reprojection.
with rasterio.open( '/Volumes/Shared/Users/malindgren/wrf_smoke/smoke_raw.tif', mode='w', **meta ) as out:
    out.write( np.flipud( out_ds_level[ 23, ... ].data ), 1 )

# ABOVE DATA IS INCORRECT...  SAME AS WHAT BOB WAS SEEING IN MapVenture... Not sure where to take this from here...

<xarray.Dataset>
Dimensions:    (level: 29, time: 24, x: 299, y: 299)
Coordinates:
    lon        (x, y) float32 -164.511 -164.429 -164.347 -164.265 -164.183 ...
  * time       (time) datetime64[ns] 2017-06-15 2017-06-15T01:00:00 ...
    lat        (x, y) float32 57.6528 57.6616 57.6702 57.6789 57.6875 57.696 ...
  * level      (level) int32 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ...
Dimensions without coordinates: x, y
Data variables:
    PM2_5_DRY  (time, level, x, y) float32 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ...
Attributes:
    wrf-chem:           smoke model
    variable:           PM2_5_DRY
    postprocessed by::  Michael Lindgren -- SNAP
