# Weather Data Analysis of areas in the state of Alabama 

## Introduction

This report deals with the analysis of weather trends in an area that covers the areas in Alabama state, particularly around the city of Montgomery.

The data used is taken from [NOAA](https://www.ncdc.noaa.gov/). All the weather data and related information files are available at [FTP site](ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/).

The file index is <b>BSSSBSBB</b>

Six Measurements are considered for analysis: 
* **TMIN, TMAX:** the daily minimum and maximum temperature.
* **TOBS:** The average temperature for each day.
* **PRCP:** Daily Percipitation (Rainfall)
* **SNOW:** Daily snowfall
* **SNWD:** The measure of depth of the snow accumulated (calculated per day).



<i> Notebook 4.4 Weather Analysis - Visualization </i>

## Sanity Check

### Comparing with a different data source 

![](images/alabama-usclimatedata.png)
<br>
The above plot is obtained from US Climate data for Alabama state. It shows Maximum Temperature in Fahrenhiet (corresponding to TMAX), Minimum Temperature in Fahrenhiet (corresponding to TMIN) and Precipitation in inches (corresponding to PRCP).

On translating from Fahrenhiet to Centigrade and from inches/year to mm/day, we can observe that the measurement data from our data source agrees with that from US Climate Data (a different data source). 
![mean std TMIN_TMAX.png](images/mean std TMIN_TMAX.png)
The plots for TMIN and TMAX are in complete agreement with the US Climate Data. 
![](images/only prcp mean.png)
The precipitation on US Climate Data is 53.05 inches/year which translates to 3.69 mm/day. The average precipitation for our data is about 3.4 mm/day, so it more or less agrees with US climate data. It also agrees with the fact that Precipitation is almost constant throughout the year and it is high during the month of March. 

### Analyzing distribution of missing observations 

The following plots take into account the missing observations in the dataset and we analyze the distribution of these missing observations throughout the year to check whether they're uniformly distributed or not. 

![valid_TMIN_TMAX.png](images/valid_TMIN_TMAX.png)

The valid-counts plot for TMIN shows that the number of missing observations ranges from 1649 to 1705, showing that the variance in the number is not too high and that the missing observations are more or less distributed uniformly throughout the year. Similarly, the valid-counts plot for TMAX shows the missing observations ranging from 1635 to 1700 indicating little but not too high variance in data and thus, more or less uniform distribution of missing observations. 

![valid_TOBS_PRCP.png](images/valid_TOBS_PRCP.png)

The valid-counts plot for TOBS shows that the number of missing observations ranges from 1252 to 1295, showing that the variance in the number is not too high and that the missing observations are more or less distributed uniformly throughout the year. Similarly, the valid-counts plot for PRCP shows the missing observations ranging from 3200 to 3310 indicating some variance in data and thus, not a very uniform distribution of missing observations.

![valid_SNOW_SNWD.png](images/valid_SNOW_SNWD.png)

The valid-counts plot for SNOW shows that the number of missing observations ranges from 2412 to 2485, showing some variance in data and that the missing observations are less uniformly distributed throughout the year. Similarly, the valid-counts plot for SNWD shows the missing observations ranging from 2185 to 2262 indicating some variance in data and thus,  less uniform distribution of missing observations.

<b> Conclusions </b> 
<br>
We can infer that out of the 6 measurements, <b> PRCP shows the maximum variance </b> in distribution of missing observations throughout the year. This could be attributed to human factors such as the inability to record measurements due to heavy precipitation. 

### Analyzing the noise in data

For each of the 6 measurements, we plot the mean, mean+standard deviation and mean-standard deviation, in order to analyze the noise in the data. 

![mean std TMIN_TMAX.png](images/mean std TMIN_TMAX.png)

<p>
The mean+-std plot for TMIN shows that TMIN drops as low as 0 to -5 degree centigrade during the months from December to February and reaches as high as 22 degree centigrade during July and August. The mean+std and mean-std plots follow the mean very closely showing very less amount of variation. Thus, there is an almost clear trend in the measurement TMIN for the given data. 
</p>
<p>
As for the mean+-std plot of TMAX, it ranges from about 9 degree centigrade in the months from December to February, to about 36 degree centigrade in the months from mid-June to August. The mean+std and mean-std plots more or less resemble the mean plot leaving us to infer that there is a very reasonable trend in TMAX for the given data and very less variation. 
</p>

![mean std TOBS_PRCP.png](images/mean std TOBS_PRCP_2.png)

<p>
The mean+-std plot for TOBS shows that TOBS drops to about 0 degree centigrade during the months from December to February and reaches as high as 30 degree centigrade from June to August. The mean+std and mean-std plots follow the mean very closely showing almost no variation. Thus, there is a very clear trend in the measurement TOBS for the given data.
</p>
<p>
As for the mean plot of PRCP, we observe that the PRCP measurements are almost constant throughout the year, indicating that it rains throughout the year. However, the mean+std plot differs hugely from the mean plot indicating high noise (variation) in the data. 
</p>

![mean std SNOW_SNWD.png](images/mean std SNOW_SNWD.png)

<p>
The mean+-std plot for SNOW shows that there is high amount of variation in the data. The mean+std and mean-std plots do not resemble the mean plot. There is high amount of snow as high as 11 mm during the per day during the months of January, fluctuating slightly from the months of mid-November to March. There is no snow from May to mid-November with exceptions on a day in mid-June.   
</p>
<p>
The mean+-std plot for SNWD shows that there is some amount of variation in the data. The mean+std and mean-std plots do not very much resemble the mean plot. The snow depth is the highest, 30 mm during December. There is zero snow depth from April to mid-November.    
</p>

<b> Conclusion </b>
<ul>
<li>
It is interesting to note that TOBS is very less noisy as compared to TMIN and TMAX. This is because TOBS is the observed temperature for a day and it averages over all the different variations in temperature for the day. 
</li>
<li>
If we compare SNOW and SNWD, it is observed that the data for SNOW is more noisy as compared to SNWD. This is because SNOW measurements particularly measure SNOW for a single day. SNWD measurements measure snow-depth which is a measure of the accumulation of Snow over a period of time, giving it a more constant nature as compared to SNOW
</li>
</ul>

![correlation-heatmap.png](images/correlation_heatmap.png)

<ul>
<li>
The above figure is a **heatmap of the correlation matrix**. The correlation matrix is computed between the Mean values for 6 measurements (for each of the 365 days). This is because Mean is a reasonable represetation of such behavioral trends in data. The heatmap shows that SNOW and SNWD are highly correlated. This is reasonable since high amount of snow would usually lead to more amount of snow-depth for that day. It also shows that PRCP is highly correlated with SNOW and SNWD. 
</li>
</ul>

## PCA Analysis

![TMIN, TOBS, TMAX](images/pca1.png)

![SNOW, SNWD, PRCP](images/pca2.png)

These figures represent the percentage of variance explained by top 5 Eigenvectors in the given dataset for each of the 6 measurements, TMIN, TMAX, TOBS, SNOW, SNWD, PRCP. 

For **TMIN, the top 5 EigenVectors capture 21.5% variance while for TMAX, the top 5 EigenVectors capture 19.4% variance**. 

We can observe that for **PRCP, the top 5 Eigenvectors capture as low as 7% of the variance**. We also observe that the percentage of variance explained by the eigenvectors goes on increasing almost linearly. 

As for **TOBS, about 44% is captured by top 5 eigenvectors**. It is also observed that the first Eigenvector itself explains 34% of the total variance. 

Looking at SNOW and SNWD, we can observe that if we consider only top 5 eigenvectors, **SNWD tops it all by explaining 95% variance**. It is also interesting to note that the first Eigenvector for SNWD explains 78% of the total variance in the dataset. As for **SNOW, it is the second most explained measurement considering variance explained by top 5 eigenvectors, with the value being 58%**. 

By looking at the percentage of variance explained by top 5 eigenvectors, we get a fair idea of which measurement to explore more using Principal Component Analysis (PCA). For this dataset, that measurement would be **SNWD**

<i> Notebook 4.5 Weather Analysis - reconstruction SNWD </i>

## Exploring SNWD (Snow Depth) more by PCA
###  Top 5 Eigenvectors 

![snow-depth-explanation](images/pca5.png)

The first plot is the Mean of the measurement, Snow Depth, for every day in an year corresponding to all the datapoints in the given dataset. This plot represents the average of the snow depth, hence is not expected to capture the variance in data. 

The scale for eigen vectors varies from that of mean representation because eigen vectors are normalized to lie between 0 and 1. 

The <b>first eigenvector</b> resembles the mean for most part of the year. However, it is different from the mean during the months of mid-December to April. This is because mean represents the average snow depth while the eigen vectors capture the variance of data. So, the first eigen vector represents the overall amount of snow above/ below the mean, keeping same snow distribution over time. 


The second, third, fourth and fifth eigen-vectors are responsible for changes in the snow distribution over time. 
<ul>
<li>
The <b>second eigenvector</b> corresponds to high amount of snow and hence peak in snow depth in January (late onset of snow season).</li><li>
The <b>third eigenvector</b> corresponds to negative snow depth in Mid March and in latter part of December. This is because some snow melts during that time.</li><li>
The <b>fourth eigenvector</b> represents a sharp dip in snow depth at the end of the month February, which can be interpreted as the early end of the snow (winter) season.</li><li>The <b>fifth eigenvector</b> corresponds to very less snow in end-January, some snow in February and again very less snow from mid-February.(on and off snow)</li></ul>


### Coefficients

In order to interpret the information explained by eigen vectors, we plot reconstructions of datapoints for which the coefficients are most positive and most negative. 

** coeff_1 **

![coeff1_most_neg](images/coeff1_neg.png)
coeff1 : most negative

![coeff1_most_pos](images/coeff1_pos.png)
coeff1 : most positive

Very high values of coefficient 1 represent high amount of snow depth during the month of December. It also represents the early onset and end of snow season, i.e end by start of January as opposed to April. Very low values of coeffcient 1 represent late onset and late end of snow season, i.e start in January. 

** coeff_2 **

![coeff1_most_neg](images/coeff2_neg.png)
coeff2 : most negative

![coeff1_most_neg](images/coeff2_pos.png)
coeff2 : most positive
Large positive values of coefficient 2 indicate the highest amount of snow in January and very less snow for the rest of the year. Very low values of coefficient 2 indicate snow during the rest of the year and very less snow in January. 

** coeff_3 **

![coeff1_most_neg](images/coeff3_neg.png)
coeff3 : most negative

![coeff1_most_neg](images/coeff3_pos.png)
coeff3 : most positive
High positive values of coefficient 3 indicate end of snow season in March and also lesser amount of snow in December (late onset of snow season). Lower values of coefficient 3 indicate opposite behavior, early onset of snow season and for more duration. 

** coeff_4 **

![coeff1_most_neg](images/coeff4_neg.png)
coeff4: most negative

![coeff1_most_neg](images/coeff4_pos.png)
coeff4 : most positive

Positive values of coefficient 4 correspond to end of snow season abruptly in February while negative values correspond to ongoing snow season after February.  

** coeff_5 **

![coeff1_most_neg](images/coeff5_neg.png)
coeff5 : most negative

![coeff1_most_neg](images/coeff5_pos.png)
coeff5 : most positive

Positive values of coefficient 5 correspond to on and off snow season during months of February to April while  negative values indicate high amount of snow depth corresponding to a more constant snow season. 

### Cumulative distribution of res_3 

![cumulative distribution of res_3 for SNWD](images/cum_res3.png)

From the above plot, we observe that it is a very sharp curve, i.e. the residual variance after removing mean and top 3 eigen vectors, is zero for about 30% of the instances, after which it goes on increasing and reaches 1 for about 50% of the instances. 

<b> Conclusion: </b><br>
This behavior can be because the top 3 eigen vectors capture all of the variance in data for about 30% of the instances. The variance in these instances (examples) could be the major contributor of the total variance. This is why, the top 3 eigen vectors explain 90% of the variance. Also the unexplained variance in the remaining 70% instances (the ones with high residual variance res_3) would be very less and captured by the few consecutive eigen vectors. 

<i> Notebook 4.5 Weather Analysis - reconstruction PRCP </i>

## Reconstructing PRCP
### Eigenvectors
![](images/eigen_recon_PRCP.png)

We can see that there is a lot of noise in the data and the first 3 eigen vectors are unable to explain any variance in the dataset. 
### Coefficients
** coeff_1 **
![](images/prcp_coeff1_pos.png)
coeff_1 : positive
![](images/prcp_coeff1_neg.png)
coeff_1 : negative <br>
** coeff_2 **
![](images/prcp_coeff2_pos.png)
coeff_2 : positive
![](images/prcp_coeff2_neg.png)
coeff_2 : negative <br>
** coeff_3 **
![](images/prcp_coeff3_posi.png)
coeff_3 : positive
![](images/prcp_coeff3_neg.png)
coeff_3 : negative <br>
### Best reconstruction of an example datapoint
![recon](images/best_recon_datapoint_PRCP.png)
It can be seen that the best reconstruction of the datapoint fails to explain the datapoint hugely. Hence it becomes obvious that PCA is not a good method to analyze the PRCP data. 
### Distribution of residuals
![distribu](images/distribution_residuals_PRCP.png)


This plot shows that as the number of datapoints increases the residual variace after 3 eigen vectors drops. 

<i> Notebook 5 maps using iPyLeaflet </i>

## Visualizing the distribution of observations

![notebook5_map1.png](images/notebook5_map1.png)

For this part of the analysis, we are considering the average value of coefficients over all years (for which data exists) for every station. The box represents the area marked by the region represented by this dataset. The center of the circle represents the station. The size of the circle indicates the number of measurements available for that station. The color of the circle represents the value of the average of coefficient in consideration for that station.  

![/map_coeff1.png](images/map_coeff1.png)

This plot is for coefficient 1. We observe that all the circles are of the same color, navy. On further analysis, it was seen that the average values for coefficient 1 for all stations are very close to each other and hence they are all mapped to the same color.  

<b> Conclusions: </b><br>
We can infer that average values for coefficient 1 across all stations are almost the same with unaccountable differences. This further lets us infere that almost all the variance in data for different stations is explained by coefficient 1, and the corresponding eigen vector. 

![/map_coeff2.png)](images/map_coeff2.png)

![/map_coeff3.png)](images/map_coeff3.png)

For each of the two plots above, for coefficient 2 and coefficient 3 respectively, we observe different colors of the circles for different stations, indicating stations are not correlated with respect to average values for coefficient 2 and coefficient 3. 

<i> Notebook 6 is SNWD variation spatial or temporal? </i>

## Temporal and spatial Variation

In order to compare the effect of temporal variation and that of spatial variation on the first eigen vector, we first organize our data by pivoting coeff_1 that represents the first eigen vector and setting the rows and columns as years and stations respectively. The intersection of each row and column is the value of coeff_2. Next, we compute two quantities, namely:
1. Mean or Average per station
2. Mean or Average per year

Next, we remove the best of the two (mean by year, mean by average) first and calculate the RMS. We repeat this step by then removing each of the two one after the other and observe how much RMS reduces. We repeat this because, after removing mean by station, the mean of the year becomes non-zero, hence we remove mean by year to center data as per year again. We observe that initially, after removing mean by year and mean by station once, the RMS reduces by a great amount. After which, it reduces by small amounts. 

### Analyzing whether SNWD varies temporally or spatially 

Data: The data considered for this analysis is taken from decon_BSSSBSBB_SNWD.parquet file which consists of the eigen reconstructions of the data using top 3 eigen vectors. We begin by storing this data in a dataframe. 

We get the following results: 
1. total RMS                   =  73.6201876732
2. RMS removing mean-by-station=  47.1399999139
3. RMS removing mean-by-year   =  43.1485237401

Since RMS removing mean-by-year is lesser than RMS removing mean-by-station, we can infer that stations are correlated by the amount of snow-depth. 

initial RMS= 73.6201876732
1. 0 after removing mean by year    = 43.1485237401
2. 0 after removing mean by stations= 29.2817287902
3. 1 after removing mean by year    = 28.5834340677
4. 1 after removing mean by stations= 28.2196249259
5. 2 after removing mean by year    = 28.0032494012
6. 2 after removing mean by stations= 27.861632388
7. 3 after removing mean by year    = 27.7630310959
8. 3 after removing mean by stations= 27.6917717488
9. 4 after removing mean by year    = 27.639132497
10. 4 after removing mean by stations= 27.5997425172



###  Analyzing whether PRCP varies temporally or spatially

Data: The data considered for this analysis is taken from decon_BSSSBSBB_PRCP.parquet file which consists of the eigen reconstructions of the data using top 3 eigen vectors. We begin by storing this data in a dataframe.

We get the following results:
1. total RMS                   =  209.855349894
2. RMS removing mean-by-station=  207.723962315
3. RMS removing mean-by-year   =  74.5024166352

Since RMS removing mean-by-year is much lesser than RMS removing mean-by-station, we can infer that stations are correlated by the amount of rain or precipitation.
1. 0 after removing mean by year    = 74.5024166352
2. 0 after removing mean by stations= 73.0962416201
3. 1 after removing mean by year    = 73.0529944231
4. 1 after removing mean by stations= 73.0480584596
5. 2 after removing mean by year    = 73.0470491025
6. 2 after removing mean by stations= 73.0467809491
7. 3 after removing mean by year    = 73.0467010785
8. 3 after removing mean by stations= 73.0466756521
9. 4 after removing mean by year    = 73.0466671318
10. 4 after removing mean by stations= 73.0466641464


Below, we stimate the relative importance of location-to-location variation relative to year-by-year variation.
These are measured using the fraction by which the variance is reduced when we subtract from each station/year entry the average-per-year or the average-per-station respectively. Here are the results:

** coeff_1**

total MS                   =  47842.220987 <br>
MS removing mean-by-station=  0.00105826855904 <br>
fraction explained station=   99.999997788 <br>
MS removing mean-by-year   =  45802.1959771 <br>
fraction explained year=   4.26406836429 <br>

This explains that almost all of the variance between stations is explained by the first eigen vector, for coefficient 1, stations explain almost all of the variance. One reason for this could be that the data lies on some low dimensional manifold. 

** coeff_2**

total MS                   =  5419.93203303 <br>
MS removing mean-by-station=  2222.17959188 <br>
fraction explained station=   58.9998623905 <br>
MS removing mean-by-year   =  1861.79510095 <br>
fraction explained year=   65.6491061215 <br>

The effect of year explaining more variance than stations is very weak for coefficient 2. 

** coeff_3**

total MS                   =  2308.60294968 <br>
MS removing mean-by-station=  1870.15874516 <br>
fraction explained station=   18.9917544971 <br>
MS removing mean-by-year   =  469.749391205 <br>
fraction explained year=   79.6522225153 <br>

For coefficient 3, the year explains more variance than the stations and this effect is more pronounced. 

<i> Notebook 7 Analyzing Residuals</i>

## Analyzing Residuals 

### Why PCA doesn't give good analysis for PRCP

We can observe that the data for the measurement PRCP, that is Precipitation, is very noisy. This can be substantiated by the following three plots

![eigen-reconstruction-PRCP-3](images/eigen_recon_PRCP.png)

The plot above indicates that the top 3 eigen-vectors for Precipitation do not resemble the mean. Therefore, we can interpret this as the data being too noisy. 

![percentage of variance explained](images/var_explained_PRCP.png)

The percentage of variance explained by the top 8 eigen-vectors is very less, abot 11%. 

 ![cumulative-distribution-residual_3](images/cum_res_3_new.png) 

This is the plot of cumulative distribution of res_3, which is the residual variance obtained by subtracting the mean and top 3 eigen vectors. This plot shows that the residual variance is more than 0.9 for 90% of the instances (datapoints). This shows that the top 3 eigen vectors for PRCP have very less explanatory power, hence PCA is not a good way to analyze Precipitation data. 

Since, Precipitation Data is very noisy, it is difficult to measure the correlation between amount of rainfall on the same data for two stations. Hence, we can reduce this to measuring whether it rained or not, a simple binary measurement. 

![cdf-PRCP](images/cdf_PRCP_binary.png)

This plot shows that it does not rain for about 73% of the year. 

### Measuring statistical significance 

In order to measure the correlation between two stations based on whether it rained on a day or not, we proceed by forming a <b>null hypotheses</b> that the probability of raining in any two stations on the same day is independent of each other. Next, we consider a <b>statistical test</b> to reject this null hypotheses. <p> We compute the p-values using normalized log probability function for every unique pair of stations. Lower p-values indicate that null hypotheses is false while higher p-values indicate that null hypotheses is true. If a considerable number of stations have low p-values, we can safely reject the null hypotheses. </p>

![signif.png](images/signif.png)

Above plot is a histogram of all p-values between every pair of stations. It is used to analyze which pairs of stations have lower p-values as compared to other pairs.  
<br>
<b> Conclusions: </b>
<br>
<ul>
<li>
The p-values range between 0.01 and 0.3 for most of the pairs of stations. </li><li>They are distributed almost uniformly the mean being at 0.15. </li><li>The one single rise in p-values at 0.4 can be safely attributed to p-values for same station pairs and can be ignored.</li><li> We can infer that stations with low p-values, particularly lesser than 0.1 are highly correlated. 

![correlaion-stations-P_norm_grey](images/greyMatrixPRCP.png)

A group of very correlated stations is: [u'USC00012188', u'USC00011810', u'USC00016335', u'USC00016334', u'USW00063874', u'USC00221174', u'US1ALEL0022', u'US1ALEL0024', u'USC00018178', u'USC00011315', u'USC00015354', u'USW00023802', u'USW00023801', u'USC00015449', u'USC00016684', u'US1ALLE0007', u'USC00012079', u'USC00010140']

<p> The above plot is a grey matrix that is plotted using P_norm, the matrix containing p-values between every pair of stations. It is a symmetric matrix and hence the eigen vectors that approximate the matrix explain a major part of the variance. </p><br>
<b> Conclusions: </b>
<br>
<ul>
<li>
The top left square is darker than the remaining part of the matrix, indicatng that the first 18 stations are more correlated than the others. 
</li>
<li>
The rest of the matrix represents uniform distribution of dark and light patches, indicating that probability for overlap of rain on a day in two stations is random. 
</li>
<li>
The pair of stations along the diagonal contain the same stations and can be ignored. 
</li>
</ul>

<p> To further analyze correlation between stations, we use Singular Value Decomposition. </p>
<br>
![](images/residual_PCA_PRCP.png)
<p> The above plot shows that top 10 eigen vectors explain about 85% variance in data. </p>
<br>
![](images/pca_prcp_ev_4.png)
<p> The above plots represent the components of PCA sorted one at a time according to values of each of the PCA components for every station. </p>
<br>
![](images/block diagonal.png)

The rows and columns of the P_norm matrix obtained are sorted according to the first Eigen Vector for the first square in the plot, according to the second eigen vector for the next square and so on for the 4 top eigen vectors. This reordering of the matrix makes grouping or clustering between stations more significant. 
<br>
It can be seen that for the first square, first 36 stations are highly correlated, out of these 36, first 18 and 28-36 are more correlated than the others. 

### Analyzing the spatial relationship between groups of correlated stations

From the grey-matrix we can observe that first 18 stations are highly correlated. By intuition, we can attribute this correlation to spatiality. In order to check whether this intuition that stations that are spatially close to each other are more correlated, is correct, we generate three plots. <br>
The first plot represents all of the 18 very correlated stations. <br>
![](images/extra1.png)
<br>
The next plot is for a pair of stations having higher p-value, 0.3293.<br>
![](images/extra2.png)
<br>  The third plot is for a pair of stations having lower p-value, 0.039. <br>
![](images/extra3.png)
<br>
As observed from the maps, the higher p-value stations are very close to each other while the lower p-value stations are far away from each other. <br>
<b> Conclusion: </b>
<br>
This leads to the conclusion that the intuition about spatially close stations being more correlated than those that are far away is wrong in this case for this dataset. 

<i> Notebook 5.5 Data on maps </i>

![map](images/residual_PRCP_map.png)

After analyzing residuals by computing P_norm (normalized log probability) for all pairs of stations, Principal Component Analysis is performed on  it. While plotting, for understandability, first 4 components are considered which corresponds to top 4 eigen vectors. Each station is indicated by 4 triangles juxtaposed with each other. The top right traingle represents the coefficient 1, corresponding to first (top) eigen vector; the top left is coefficient 2, corresponding to second (top second) eigen vector; bottom left is the third coefficient, correpsonding to third eigen vector and bottom right is coefficient 4 and it correponds to fourth eigen vector. 

![](images/residual_PRCP_map_zoom.png)

The opacity of the traingle indicates a negative value of the coefficient while the transparency of the triangle indicates a positive value of the coefficient. The size of the traingle indicates the magnitude of the coefficient. If all coefficients are very small in magnitude for a particular station, indicated by very minute triangles, it is interpreted as the datapoint being very close to the mean (very less variation from mean).