# xCDAT vs. CDAT Spatial Averaging Output `dtype`

Objective:

* Figure out the root cause for the large floating point differences (absolute and relative), which might be related to the `dtype`.

Questions:
* Is xCDAT or CDAT doing something incorrectly?
* Does CDAT cast the type of the data from `float32` to `float64`?
   * If it does, when does it cast the type?

Resources:
https://discourse.pangeo.io/t/variable-type-changing-when-using-xarray/1823/2

* Same array, same bits, but different precision. In general, arbitrary smooth decimals can’t be represented exactly using float32 precision.
* The index type (Float64Index) actually comes from Pandas (pandas.Float64Index — pandas 1.3.3 documentation). In this case, promoting a float32 type to a float64 in the context of indexing should not affect selection operations, since float32 can safely be cast to float64
* The number 10.45 is interpreted differently if it is float32 vs float64.
* Builtin python floats are 64-bit, so IMO Xarray does the right thing by promoting everything to float64 when making comparisons with your lat values.


In [1]:
import numpy as np

import xcdat
import cdms2
import cdutil

In [2]:
fn = "/p/user_pub/climate_work/pochedley1/surface/gistemp1200_GHCNv4_ERSSTv5.nc"

ds_xcdat = xcdat.open_dataset(fn)
ds_cdat = cdms2.open(fn)

### 1. Check dtype for the variable `"tempanomaly"`

In [3]:

ds_xcdat["tempanomaly"].dtype

dtype('float32')

In [4]:

ds_cdat("tempanomaly").dtype

dtype('float32')

#### The `dtype` of the variable is `float32` for both libraries, so they agree.

### 2. Check dtype for the spatial averaging output for `"tempanomaly"`

xCDAT

In [5]:
ds_xcdat_avg = ds_xcdat.spatial.average("tempanomaly", axis=["Y"])
ta_xcdat_avg = ds_xcdat_avg["tempanomaly"]


In [6]:
ta_xcdat_avg

In [7]:
ta_xcdat_avg.dtype

dtype('float32')

CDAT

In [8]:
ta_cdat_avg = cdutil.averager(ds_cdat("tempanomaly"), axis="y")

In [9]:
ta_cdat_avg

variable_6
masked_array(
  data=[[0.035545653325030215, 0.026956274832029305,
         0.016102136600269726, ..., -0.030794956835496247,
         -0.009603562061872723, 0.00881278244252939],
        [0.06634341186248234, 0.04435135777310842, 0.01892279890685338,
         ..., 0.02397299728132458, 0.03883514230588276,
         0.05697104274523798],
        [0.14746000307247378, 0.11239838593655853, 0.07544755841048628,
         ..., 0.04206947944889023, 0.0754978114092197,
         0.11411796429245147],
        ...,
        [0.8559087033405837, 0.8190742797330746, 0.7515332662789139, ...,
         1.0020296439006402, 0.9598274150062646, 0.9050297894921276],
        [0.645442851180381, 0.6352518904793731, 0.5955442821497092, ...,
         0.8723914081140204, 0.8259617007564366, 0.8008464881940878],
        [0.8317094150775682, 0.8324741985594918, 0.8098450089447665, ...,
         0.937383456065645, 0.9137497736748647, 0.8913162676287111]],
  mask=[[False, False, False, ..., False, False,

In [10]:
ta_cdat_avg.dtype

dtype('float64')

#### xCDAT is returning `float32` (same as input dtype).
#### CDAT is returning `float64` (not the same as input type).

### 3. Why is CDAT returning `float64`?

If you do a search for float in [genutil.averager](https://github.com/CDAT/genutil/blob/master/Lib/averager.py), you will find `numpy.float` referenced several times.

### 4. What exactly is a `numpy.float`? It is `float64`.

In [11]:
np.float

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.float


float

## Key Takeaways

* `ds.spatial.average()` maintains a `dtype` of `float32`, which is **CORRECT**.
* `cdutil.averager()`/`genutil.averager()` is typecasting the `dtype` to `float64`, which is **INCORRECT**.
* Floating point comparisons between CDAT and xCDAT results in large differences since the `dtype` of the outputs are different (`float32` vs. `float64`)

**Based on these findings, we are confident that xarray/xCDAT is doing the right thing with floating points, while CDAT is not.**