# Draught Data Germany

I downloaded data from the Helmholtz institutes draught monitor for Germany.
https://www.ufz.de/index.php?de=37937

This data is availabel in .nc format: In order to import it into PowerBI it has to be converted into .csv format 

Source: UFZ-Dürremonitor/ Helmholtz-Zentrum für Umweltforschung

Good tutorial about how to visualize .nc data:
https://youtu.be/wXCYtPlT-kg

In [1]:
# prepare for plotting in python
!pip install netCDF4





In [2]:
pip install --upgrade matplotlib

Note: you may need to restart the kernel to use updated packages.




In [4]:
# import python librabries
from netCDF4 import Dataset
import numpy as np
import matplotlib.pyplot as plt
import mpl_toolkits
#from mpl_toolkits import Basemap # Warum funktioniert es nicht?

**Open the dataset in reading mode 'r'**

In [5]:
nc_file = '253177_Duerremagnitude_Gesamtboden_1952-2020_Apr-Okt.nc'
data = Dataset(nc_file)#, mode='r')
data

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    affiliation: UFZ - Helmholtz Zentrum fuer Umweltforschung
    NCO: netCDF Operators version 4.9.3 (Homepage = http://nco.sf.net, Code = http://github.com/nco/nco)
    contact: klima@ufz.de
    dimensions(sizes): time(69), bnds(2), easting(175), northing(225)
    variables(dimensions): float64 time(time), float64 time_bnds(time, bnds), int32 easting(easting), int32 northing(northing), float32 droughtmagnitude(time, northing, easting)
    groups: 

**Convert netCDF4 data into a pandas dataframe**

In [90]:
# convert draught data into pandas dataframe:

import xarray as xr

ds = xr.open_dataset('253177_Duerremagnitude_Gesamtboden_1952-2020_Apr-Okt.nc')
df = ds.to_dataframe()
df.head(100)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,time_bnds,droughtmagnitude
time,bnds,easting,northing,Unnamed: 4_level_1,Unnamed: 5_level_1
1952-07-16 12:00:00,0,4040000,5222000,1952-04-01,
1952-07-16 12:00:00,0,4040000,5226000,1952-04-01,
1952-07-16 12:00:00,0,4040000,5230000,1952-04-01,
1952-07-16 12:00:00,0,4040000,5234000,1952-04-01,
1952-07-16 12:00:00,0,4040000,5238000,1952-04-01,
1952-07-16 12:00:00,0,4040000,...,...,...
1952-07-16 12:00:00,0,4040000,5602000,1952-04-01,
1952-07-16 12:00:00,0,4040000,5606000,1952-04-01,
1952-07-16 12:00:00,0,4040000,5610000,1952-04-01,
1952-07-16 12:00:00,0,4040000,5614000,1952-04-01,


In [91]:
# reset the index of th enew dataframe
df.reset_index(inplace=True)

In [92]:
df

Unnamed: 0,time,bnds,easting,northing,time_bnds,droughtmagnitude
0,1952-07-16 12:00:00,0,4040000,5222000,1952-04-01,
1,1952-07-16 12:00:00,0,4040000,5226000,1952-04-01,
2,1952-07-16 12:00:00,0,4040000,5230000,1952-04-01,
3,1952-07-16 12:00:00,0,4040000,5234000,1952-04-01,
4,1952-07-16 12:00:00,0,4040000,5238000,1952-04-01,
...,...,...,...,...,...,...
5433745,2020-07-16 12:00:00,1,4736000,6102000,2020-10-31,
5433746,2020-07-16 12:00:00,1,4736000,6106000,2020-10-31,
5433747,2020-07-16 12:00:00,1,4736000,6110000,2020-10-31,
5433748,2020-07-16 12:00:00,1,4736000,6114000,2020-10-31,


In [93]:
# Add two new numerical columns to dataframe:
# https://sparkbyexamples.com/pandas/pandas-add-an-empty-column-to-dataframe/
df['latitude']= np.nan
df['longitude']= np.nan
df

Unnamed: 0,time,bnds,easting,northing,time_bnds,droughtmagnitude,latitude,longitude
0,1952-07-16 12:00:00,0,4040000,5222000,1952-04-01,,,
1,1952-07-16 12:00:00,0,4040000,5226000,1952-04-01,,,
2,1952-07-16 12:00:00,0,4040000,5230000,1952-04-01,,,
3,1952-07-16 12:00:00,0,4040000,5234000,1952-04-01,,,
4,1952-07-16 12:00:00,0,4040000,5238000,1952-04-01,,,
...,...,...,...,...,...,...,...,...
5433745,2020-07-16 12:00:00,1,4736000,6102000,2020-10-31,,,
5433746,2020-07-16 12:00:00,1,4736000,6106000,2020-10-31,,,
5433747,2020-07-16 12:00:00,1,4736000,6110000,2020-10-31,,,
5433748,2020-07-16 12:00:00,1,4736000,6114000,2020-10-31,,,


In [94]:
# check if new columns are in float format
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5433750 entries, 0 to 5433749
Data columns (total 8 columns):
 #   Column            Dtype         
---  ------            -----         
 0   time              datetime64[ns]
 1   bnds              int64         
 2   easting           int64         
 3   northing          int64         
 4   time_bnds         datetime64[ns]
 5   droughtmagnitude  float32       
 6   latitude          float64       
 7   longitude         float64       
dtypes: datetime64[ns](2), float32(1), float64(2), int64(3)
memory usage: 310.9 MB


In [95]:
# Translate easting and northing into latitude and longitude
pip install pyproj

SyntaxError: invalid syntax (4039767340.py, line 2)

In [96]:
# convert Gauß to lat/ lon: https://gis.stackexchange.com/questions/78838/converting-projected-coordinates-to-lat-lon-using-python
from pyproj import Proj, transform

inProj = Proj(init='epsg:31468')
outProj = Proj(init='epsg:4326')
x1,y1 = 4040000,5222000 # easting, northing
x2,y2 = transform(inProj,outProj,x1,y1)
print(x2,y2)

5.952251378874303 46.976737330247346


  in_crs_string = _prepare_from_proj_string(in_crs_string)
  in_crs_string = _prepare_from_proj_string(in_crs_string)
  x2,y2 = transform(inProj,outProj,x1,y1)


In [None]:
# Try transformation on one row
inProj = Proj(init='epsg:31468')
outProj = Proj(init='epsg:4326')
x1,y1 = 4040000,5222000 # easting, northing
x2,y2 = transform(inProj,outProj,x1,y1) # latitude, longitude?
print(x2)
print(y2)

In [97]:
inProj = Proj(init='epsg:31468')
outProj = Proj(init='epsg:4326')
x1,y1 = 4040000,5222000 # easting, northing
x2,y2 = transform(inProj,outProj,df['easting'].values,df['northing'].values) # latitude, longitude
print(x2)
print(y2)

  in_crs_string = _prepare_from_proj_string(in_crs_string)
  in_crs_string = _prepare_from_proj_string(in_crs_string)
  x2,y2 = transform(inProj,outProj,df['easting'].values,df['northing'].values) # latitude, longitude


[ 5.95225138  5.94819867  5.94413818 ... 15.69289834 15.69620818
 15.69952539]
[46.97673733 47.01251732 47.04829681 ... 55.06264141 55.09849753
 55.13435331]


In [98]:
# Apply transformation to all the rows in latitude and longitude
df['latitude']=x2
df['longitude']=y2



In [99]:
df

Unnamed: 0,time,bnds,easting,northing,time_bnds,droughtmagnitude,latitude,longitude
0,1952-07-16 12:00:00,0,4040000,5222000,1952-04-01,,5.952251,46.976737
1,1952-07-16 12:00:00,0,4040000,5226000,1952-04-01,,5.948199,47.012517
2,1952-07-16 12:00:00,0,4040000,5230000,1952-04-01,,5.944138,47.048297
3,1952-07-16 12:00:00,0,4040000,5234000,1952-04-01,,5.940070,47.084076
4,1952-07-16 12:00:00,0,4040000,5238000,1952-04-01,,5.935994,47.119854
...,...,...,...,...,...,...,...,...
5433745,2020-07-16 12:00:00,1,4736000,6102000,2020-10-31,,15.686301,54.990928
5433746,2020-07-16 12:00:00,1,4736000,6106000,2020-10-31,,15.689596,55.026785
5433747,2020-07-16 12:00:00,1,4736000,6110000,2020-10-31,,15.692898,55.062641
5433748,2020-07-16 12:00:00,1,4736000,6114000,2020-10-31,,15.696208,55.098498


In [None]:
# Drop easting and northing before transformation into csv
df=df.drop(columns=['easting', 'northing'])

In [None]:
df.head

In [None]:
# transform into csv
df.to_csv('253177_Duerremagnitude_Gesamtboden_1952-2020_Apr-Okt.csv', index=False)

In [None]:
# create csv file
from IPython.display import FileLink
FileLink('253177_Duerremagnitude_Gesamtboden_1952-2020_Apr-Okt.csv')