Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NA/nan within returned DataFrame via UDF back to Python is parsed to empty string '' instead of NA/nan #1947

Closed
weigao-123 opened this issue Jun 30, 2022 · 5 comments
Milestone

Comments

@weigao-123
Copy link

OS (e.g. Windows 10 or macOS Sierra)

Windows 10

Versions of xlwings, Excel and Python (e.g. 0.11.8, Office 365, Python 3.7)

0.27.7, Office 365, Python 3.7

Describe your issue (incl. Traceback!)

I found that when we return a DataFrame including NA/nan value to Excel using UDF, so that the NA/nan value becomes empty value in the dynamic array in Excel, which is good. But when I try to read this dynamic array back to Python, it turns out that the empty value is parsed into a string with '' instead of NA/nan, which does not make sense. Although I could solve this manually, is there any way that could handle this automatically or it is just a potential issue?

# Your traceback here
Not an error.

Include a minimal code sample to reproduce the issue (and attach a sample workbook if required!)

image

# Your code here
@xw.func
@xw.ret(index=False)
def test_write():
    df = pd.DataFrame({'a': [1, 2], 'b': [3, None]})
    return df


@xw.func
@xw.arg('df', pd.DataFrame, index=False)
def test_read(df, r, c):
    return str(type(df.iloc[int(r)][int(c)]))
@weigao-123 weigao-123 changed the title Failed to parse NA/nan within UDF returned DataFrame Failed to parse NA/nan within returned DataFrame back to Python using UDF Jun 30, 2022
@weigao-123 weigao-123 changed the title Failed to parse NA/nan within returned DataFrame back to Python using UDF NA/nan within returned DataFrame via UDF back to Python is paresed to empty string '' instead of NA/nan Jun 30, 2022
@fzumstein
Copy link
Member

Hi @weigao-123, thanks for reporting! Yes, I agree that this can be improved. What I am unsure about is, if xlwings should just treat truly empty cells and cells with "" as the same, or if there's any implications.

If they should not be treated the same, we could think about adding a strings options, equivalent to the numbers option that we currently have:

@xw.func
@xw.arg('x', numbers=int)
def myfunction(x):
    # all numbers in x arrive as int
    return x

As a workaround for right now, you'd probably be best off to have pandas do the conversion via pd.to_numeric(df, errors='coerce') or something similar.

@weigao-123
Copy link
Author

weigao-123 commented Jul 1, 2022

Hi @fzumstein, thank you! Yes, I think they should not be the same.

However, I found that this is becasue that even though the empty cell shows nothing in this case, the content in that cell is actually ="", and in this sense, xlwings parses it back to Python correctly in my opinion.

So the point is that is there any way that it can be delivered to Excel in the dynamic array correctly from Python side.

e.g. if the df in Python side is: pd.DataFrame({'a': [1, 2], 'b': [3, ""]}), then the "" value should be returned as ="" in Excel;

if the df is: pd.DataFrame({'a': [1, 2], 'b': [3, None]}), then the None value should be returned as empty value in Excel.

The thing I am not sure is that if this is caused by the dynamic array since dynamic array has to be in a square range which may always contain something, so the original empty value is converted to ="" automatically.

@fzumstein
Copy link
Member

@weigao-123 Excel can't handle truly empty cells in UDFs, so "" is currently the only option if a numeric value isn't suitable. There is some discussion at #776 about being able to show errors instead of empty strings, but no solution yet.

@fzumstein fzumstein changed the title NA/nan within returned DataFrame via UDF back to Python is paresed to empty string '' instead of NA/nan NA/nan within returned DataFrame via UDF back to Python is parsed to empty string '' instead of NA/nan Jul 6, 2022
@fzumstein fzumstein added this to the 0.27.11 milestone Jul 6, 2022
@fzumstein
Copy link
Member

Actually, interpreting "" as empty will be fine, so this will be fixed in 0.27.11

@weigao-123
Copy link
Author

Actually, interpreting "" as empty will be fine, so this will be fixed in 0.27.11

Nice, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants