Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underlying theory/covariance or correlation PCA/EOF #19

Closed
Murk89 opened this issue Jul 11, 2022 · 9 comments
Closed

Underlying theory/covariance or correlation PCA/EOF #19

Murk89 opened this issue Jul 11, 2022 · 9 comments

Comments

@Murk89
Copy link

Murk89 commented Jul 11, 2022

Hi Niclas,

After my extensive reading on the topic of PCA, EOF I was wondering whether the multivariate xeof example here, (https://xeofs.readthedocs.io/en/latest/auto_examples/1uni/plot_multivariate-eof.html#sphx-glr-auto-examples-1uni-plot-multivariate-eof-py), uses the covariance or correlation matrix?
I wanted to run a multivariate EOF for three variables at each grid box of WRF output. And my supervisor has recommended using a correlation based PCA since the variables are different. I understand that your example uses subsets of the same variable, but I am wondering if it is suitable to replace these subsets with different variables ?
Many thanks.

@nicrie
Copy link
Collaborator

nicrie commented Jul 11, 2022

The example shown uses the covariance matrix. However, you're free to choose - what you're probably looking for is the norm argument which normalizes each feature by its standard deviation I.e. computing the correlation matrix.

In general, there's no problem to use multivariate PCA with different climate variables. I agree that in this case most often you want to use the correlation matrix instead of covariance matrix. So taking the above mentioned example just use norm=True for your case.

Hope it helps!

@Murk89
Copy link
Author

Murk89 commented Jul 11, 2022

Hi Niclas,

Thanks for getting back.
I have run the xeofs with multiple variables and seems to be working. Though slightly puzzled as to how to save the analysis?
I tried saving it similarly to the xmca, using mpca.save_analysis('my_analysis') and get this error: AttributeError: 'EOF' object has no attribute 'save_analysis'

@nicrie
Copy link
Collaborator

nicrie commented Jul 14, 2022

sorry for coming back so late - there's currently no method to automatically save a model (although it's not too difficult to implement).
For the moment, you have to save the individual fields on your own, e.g. assuming that you did multivariate EOF analysis using two DataArray yielding eofs1 and eofs2, you could do:

eofs1, eofs2 = pca.eofs()
pcs = pca.pcs()

eofs1.to_netcdf('eofs1.nc')
eofs2.to_netcdf('eofs2.nc')
pcs.to_netcdf('pcs.nc')

@Murk89
Copy link
Author

Murk89 commented Jul 19, 2022

Hi Niclas.

So I have run into a new error now.
ValueError: Standard deviation of one ore more features is zero, normalization not possible.

My understanding regarding this error is that some grid points in my WRF temperature, rain and/or snow arrays are fully zero, due to which this error is generated. Do you have any suggestions about dealing with this?
Many thanks.

@nicrie
Copy link
Collaborator

nicrie commented Jul 19, 2022

Try removing the grid points which have zero variance, e.g. for a given DataArray da

# to check if variance is zero compare against a small number
epsilon = 1e-5  
# define the names of your spatial dimensions
spatial_dimensions = ('lat', 'lon')

valid_gridpoints = da.var('time') > epsilon
da_clean = da.stack(x=spatial_dims).sel(x=valid_gridpoints.stack(x=spatial_dims)).unstack()

note: better to keep different issues separated. Don't worry to open a new issue for each new bug/error that you encounter. It helps other people with similar issues finding the solution. :)

another note: has your initial problem in this thread been solved?

@Murk89
Copy link
Author

Murk89 commented Jul 19, 2022

Hi Niclas,
Thanks for getting back. About the original issue, can't say as yet because the different errors/questions raised here are all part of the analysis.

I am also trying to understand the significance of the output. Starting a new issue for it.

@nicrie
Copy link
Collaborator

nicrie commented Aug 23, 2022

Hi Niclas, Thanks for getting back. About the original issue, can't say as yet because the different errors/questions raised here are all part of the analysis.

I am also trying to understand the significance of the output. Starting a new issue for it.

@Murk89 this is just to let you know that in the new release version 0.6.0 there is a bootstrapping class which allows you to identify automatically the number of significant modes + confidence intervals for your EOFs and PCs. You can find an example here.

@Murk89
Copy link
Author

Murk89 commented Aug 23, 2022 via email

@nicrie
Copy link
Collaborator

nicrie commented Aug 23, 2022

No worries & good luck finishing your thesis!

@nicrie nicrie closed this as completed Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants