Bug cov nat #60898

fbourgey · 2025-02-09T15:33:27Z

closes BUG: cov buggy when having NaT in column #53115

…in DataFrame.cov

…nputs

WillAyd · 2025-02-14T21:35:26Z

pandas/core/frame.py

@@ -11239,6 +11239,12 @@ def cov(
        c -0.150812  0.191417  0.895202
        """
        data = self._get_numeric_data() if numeric_only else self
+        if data.select_dtypes(include=[np.datetime64, np.timedelta64]).shape[1] > 0:


Is this happening in spite of numeric_only or is that False in this case?

I think it would be False in that case. If numeric_only=True, then no error is raised.

Hmm something feels off to me about this. I assume other types like object/string are raising naturally without any special casing here. Can you check where those are raising to see if that unlocks any clues? I don't think we should be special-casing the type selection in the algorithm like this

The following

df = DataFrame({"a": ["a","b","c"]}) df.cov()

raises

ValueError: could not convert string to float: 'a'

so object/string should be fine.

@WillAyd, do you have any other suggestions on the best way to handle this?

So the question is ultimately how are those raising even though we don't branch for them within this function. Do you think you can find that through the debugger?

I can check the type of every element of the DataFrame or Series and do something like

if data.map(lambda x: isinstance(x, (np.datetime64, np.timedelta64, pd.Timedelta, pd.Timestamp, pd._libs.tslibs.nattype.NaTType))).any().any(): raise ValueError()

The cov() method does data.to_numpy(dtype=float, na_value=np.nan, copy=False) on data. This gives strange values when there are some pd.NaT and raises an error if it contains some string.

Hmm my overall point is that we shouldn't be special-casing anything within the implementation here. Does this naturally dispatch to calling a method on the TimedeltaArray or DatetimeArray? I feel like it would be better for those arrays to signal that this method is not supported rather than baking it into the implementation here

@jbrockmendel knows more about this, so he may have some other ideas

I feel like it would be better for those arrays

That is going to break down in heterogeneous-dtype (or just non-consolidated) cases.

There are a couple bugs here.

The first is that .cov should never work with dt64 or td64 dtypes, regardless of whether they contain NaTs. The unit on the result would be timedelta**2, which isn't a thing. This is the same reason why DatetimeArray and TimedeltaArray .var raises while .std does not.

The second bug is not in .cov but in .corr, which should work. The problem is in to_numpy not respecting na_value correctly:

dti = pd.date_range("2016-01-01", periods=3) df = pd.DataFrame(dti) df.iloc[0,0] = pd.NaT df.to_numpy(float, na_value=np.nan) >>> df.to_numpy(float, na_value=np.nan) array([[-9.22337204e+18], [ 1.45169280e+18], [ 1.45177920e+18]])

The first entry in that result should be np.nan.

fbourgey added 2 commits February 9, 2025 10:31

ENH: Add TypeError for unsupported datetime64 and timedelta64 dtypes …

bda0025

…in DataFrame.cov

TST: Add test for TypeError in DataFrame.cov with NaT and Timedelta i…

de954f7

…nputs

WillAyd reviewed Feb 14, 2025

View reviewed changes

WillAyd added Datetime Missing-data labels Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug cov nat #60898

Bug cov nat #60898

fbourgey commented Feb 9, 2025

WillAyd Feb 14, 2025

fbourgey Feb 17, 2025

WillAyd Feb 18, 2025

fbourgey Feb 27, 2025

WillAyd Feb 27, 2025

fbourgey Feb 28, 2025 •

edited

Loading

WillAyd Mar 4, 2025

jbrockmendel Mar 5, 2025

Bug cov nat #60898

Are you sure you want to change the base?

Bug cov nat #60898

Conversation

fbourgey commented Feb 9, 2025

WillAyd Feb 14, 2025

Choose a reason for hiding this comment

fbourgey Feb 17, 2025

Choose a reason for hiding this comment

WillAyd Feb 18, 2025

Choose a reason for hiding this comment

fbourgey Feb 27, 2025

Choose a reason for hiding this comment

WillAyd Feb 27, 2025

Choose a reason for hiding this comment

fbourgey Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

WillAyd Mar 4, 2025

Choose a reason for hiding this comment

jbrockmendel Mar 5, 2025

Choose a reason for hiding this comment

fbourgey Feb 28, 2025 •

edited

Loading