# Final project 

## 1. Introduction


This study investigates the correlation between solar radiation and cloud coverage, focusing specifically on the Global Horizontal Irradiance (GHI), taking California state as an example. To understand this relationship, we will compare clear-sky GHI with actual GHI values. The data source is from the National Solar Radiation Database (NSRDB), which provides extensive meteorological and solar irradiance datasets. This comprehensive dataset covers a range of locations, time periods, and coordinations.

Cloud coverage metrics are also obtained from the NSRDB. In this analysis, cloud optical depth and the average size of cloud droplets will be utilized, which influence both cloud reflectivity and absorption characteristics. Additionally, wind speed and dew point will be examined as factors that may affect cloud formation and behavior.

The relationship between solar radiation and cloud coverage is a critical area of study in the field of renewable energy, particularly for solar energy generation. By visualizing all the variables and making comparisons, this study can facilitate the research in energy distribution thereby providing constructive advice on energy facility construction and weather solutions.


## 2. Load the dataset
First, in order to get access to this dataset, the h5pyd should be installed according to the instructions on the NSRDB official website.


In [1]:
import sys
print(sys.executable)
## the h5 file already installed in the terminal

/srv/conda/envs/notebook/bin/python


In [2]:
!pip install h5pyd 



In [3]:
%matplotlib inline
import h5pyd
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from scipy.spatial import cKDTree
import cartopy.crs as ccrs

Variables are displayed as follows. This is an h5pyd library. Then key dataset should be converted into data frame for further analysis.

In [4]:
f = h5pyd.File("/nrel/nsrdb/v3/nsrdb_2020.h5", 'r')
list(f) ##list all the variables


['air_temperature',
 'alpha',
 'aod',
 'asymmetry',
 'cld_opd_dcomp',
 'cld_reff_dcomp',
 'clearsky_dhi',
 'clearsky_dni',
 'clearsky_ghi',
 'cloud_press_acha',
 'cloud_type',
 'coordinates',
 'dew_point',
 'dhi',
 'dni',
 'fill_flag',
 'ghi',
 'meta',
 'ozone',
 'relative_humidity',
 'solar_zenith_angle',
 'ssa',
 'surface_albedo',
 'surface_pressure',
 'time_index',
 'total_precipitable_water',
 'wind_direction',
 'wind_speed']

### Location data

In [5]:
dataset1 = pd.DataFrame(f['meta'][...])
print('location data: ') 
print(dataset1.head())
print('-'*50)

location data: 
   latitude   longitude  elevation  timezone  country    state   county  \
0    -19.99 -175.259995        0.0        13  b'None'  b'None'  b'None'   
1    -19.99 -175.220001        0.0        13  b'None'  b'None'  b'None'   
2    -19.99 -175.179993        0.0        13  b'None'  b'None'  b'None'   
3    -19.99 -175.139999        0.0        13  b'None'  b'None'  b'None'   
4    -19.99 -175.100006        0.0        13  b'None'  b'None'  b'None'   

     urban  population  landcover  
0  b'None'       -9999        210  
1  b'None'       -9999        210  
2  b'None'       -9999        210  
3  b'None'       -9999        210  
4  b'None'       -9999        210  
--------------------------------------------------


In [17]:
meta = pd.DataFrame(f['meta'][...])
Cal = meta.loc[meta['state'] == b'California']
Cal.head()

Unnamed: 0,latitude,longitude,elevation,timezone,country,state,county,urban,population,landcover
70276,32.529999,-117.099998,55.0625,-8,b'United States',b'California',b'San Diego',b'None',32326,130
70588,32.57,-117.099998,7.1,-8,b'United States',b'California',b'San Diego',b'Tijuana',27971,190
70589,32.57,-117.059998,24.92,-8,b'United States',b'California',b'San Diego',b'Tijuana',51608,190
70590,32.57,-117.019997,96.599998,-8,b'United States',b'California',b'San Diego',b'Tijuana',15236,110
70591,32.57,-116.980003,140.600006,-8,b'United States',b'California',b'San Diego',b'Tijuana',2949,130


This study selects the dataset from California.

### Time slicing
Since it is more complicated to conduct annual trends and GHI variations covering a long time span, this study selects

In [6]:
time_index = pd.to_datetime(f['time_index'][...].astype(str))
print(time_index)
##Extract from time index

DatetimeIndex(['2020-01-01 00:00:00+00:00', '2020-01-01 00:30:00+00:00',
               '2020-01-01 01:00:00+00:00', '2020-01-01 01:30:00+00:00',
               '2020-01-01 02:00:00+00:00', '2020-01-01 02:30:00+00:00',
               '2020-01-01 03:00:00+00:00', '2020-01-01 03:30:00+00:00',
               '2020-01-01 04:00:00+00:00', '2020-01-01 04:30:00+00:00',
               ...
               '2020-12-31 19:00:00+00:00', '2020-12-31 19:30:00+00:00',
               '2020-12-31 20:00:00+00:00', '2020-12-31 20:30:00+00:00',
               '2020-12-31 21:00:00+00:00', '2020-12-31 21:30:00+00:00',
               '2020-12-31 22:00:00+00:00', '2020-12-31 22:30:00+00:00',
               '2020-12-31 23:00:00+00:00', '2020-12-31 23:30:00+00:00'],
              dtype='datetime64[ns, UTC]', length=17568, freq=None)


In [7]:
june = time_index.month == 3
np.where(june)[0]

array([2880, 2881, 2882, ..., 4365, 4366, 4367])

In [11]:
timestep = np.where(time_index == '2020-06-21 00:00:00')[0][0]
print('Choose a specific timestep - ', timestep)

Choose a specific timestep -  8256


### Global Horizontal Irradiance (GHI)

The Global Horizontal Irradiance (GHI) is computed for clear skies using the REST2 model. REST2 is a high-performance model to predict cloudless-sky broadband irradiance. Because this database is very large, this study will select the 10th location and time stamps. 


In [12]:
print(f['coordinates'].attrs)
coords = f['coordinates'][...]

<Attributes of HDF5 object at 136877729756608>


In [None]:
dset = f['ghi']
#data = dset[timestep, ::10] 
df = pd.DataFrame()
df['longitude'] = coords[::10, 1]
df['latitude'] = coords[::10, 0]
#df['ghi'] = data / dset.attrs['psm_scale_factor']

In [None]:
df.shape

In [None]:
df.plot.scatter(x='longitude', y='latitude', c='ghi',
                colormap='YlOrRd',
                title=str(time_index[timestep]))
plt.show()

In [None]:
df = Cal['longitude', 'latitude']].copy()
df['ghi'] = data / dset.attrs['psm_scale_factor']
df.shape

### GHI statistics

## Cloud variables

Cloud formation is a complex process influenced by various atmospheric conditions. According to xxx, therefore this study selects and the varibles of wind-speed and dew point will also be examined for further studies.

cld_opd_dcomp: Cloud optical depth, which quantifies the extent to which clouds prevent solar radiation from reaching the surface.
cld_reff_dcomp: Cloud effective radius, indicating the average size of cloud droplets, influencing cloud reflectivity and absorption characteristics.
Wind-speed: also a factor that affect the cloud


### Cloud optical depth (cld_opd_dcomp)

In [None]:
clouddata1=f['cld_opd_dcomp']
dataset_cloud1 = pd.DataFrame(clouddata1)


### Cloud effective radius (cld_reff_dcomp)

In [None]:
clouddata1=f['cld_opd_dcomp']
dataset_cloud1 = pd.DataFrame(clouddata1)
