Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: when matching dataframe and strictly formatted excel indices and columns #1874

Closed
SSMK-wq opened this issue Mar 22, 2022 · 3 comments

Comments

@SSMK-wq
Copy link

SSMK-wq commented Mar 22, 2022

OS (e.g. Windows 10)

Versions of xlwings, Excel and Python (e.g. 0.24,Excel 2019, Python 3.7)

I already referred SO post https://stackoverflow.com/questions/25822604/pandas-concat-valueerror-buffer-dtype-mismatch-expected-python-object-but-go but still it doesn't help. Hence, posting it here

Am trying to copy values from dataframe to excel sheet based on matching indices and column names.

Sample excel file is attached here - dummy.xlsx

Sample dataframe looks like below

ID,DIV,APP1,APP2,APP3,Col1,Col4
1,A,AB1,ABC1,ABCD1,20,40
2,A,AB1,ABC2,ABCD2,60,
3,B,BC1,BCD1,BCDE1,10,20

region_1 = pd.read_clipboard(sep=',') 

I followed the solution from [https://github.com//issues/1872](issue 1872) but that doesn't seem to work on my real data.

I guess its because my real data is not in neat format as required for the solution suggested in issue 1872. For some reason, when I try this on my real file (with lot of formatting, merged cells etc), it doesn't paste the data correctly. Moreover, I cannot touch the columns and index in excel file (because they are shared file) and it is fixed by the department (cannot modify). So, we should not try to overwrite it. So, I just need read those indexes and columns for criteria matching and just type in the values like how a normal human does it manually.

To be clear, my real data looks like as shown below

image

As you can see that range B1:F3 is merged. My actual columns are from G3:J3 and indices is from B4:F4 (and the values for indices cannot be modified as well). This file is slightly different from dataframe structure

So, I cannot modify/rewrite these elements (because we are not supposed to modify that because it gets that value from other sheet in the excel based on some formula). Only area where I have write access is G5:J100. So, I have to plainly put/type in my numeric value under the appropriate column for matching indices and column names. Please find my code below

rows = sales.sheets[0]['B5:B8'].value
cols = sales.sheets[0]['G3:J3'].value
sales.sheets[0]['G5'].value = region_1.reindex(columns=cols,index=rows)

I get the below error

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double'

If I try the below

    ws = sales.sheets[0] 
    ws.range('G5').options(index=False).value = region_1.astype(str).reset_index(drop=True)

This pastes the data but doesn not do any column or index match check.

While I managed to write something like below, not sure whether it is even follproof or scalable/efficient for big data problems

for val in df_value:
    for a_cell in sales.sheets[0]['B5:B8']:
        if a_cell.value == val:
            rn1 = a_cell.row
            temp = region_1.loc[[val]]
            temp = temp.dropna(axis=1, how='all')
            colu = temp.columns.tolist()
            for col in colu:
                for b_cell in sales.sheets[0]['G3:J3']:
                    if b_cell.value == col:
                        rn2 = b_cell.row
                        data_entry_loc = str(b_cell.address[1]) + str(rn1)
                        enter_val = temp[col].values
                        sales.sheets[0][data_entry_loc].value = enter_val

I expect my output to be like as below

image

@fzumstein - Is this something you/anyone from your team help me with? Am not sure whether this is the right forum to discuss this. I can pay a one time support fee of 50 USD and can also sanction 50 bounty points in SO to help me with this issue (if what I propose is okay with you). or you can let me know your expectations. We have to do this for a large data (and can't imagine doing it manually). So, your help would really be appreciated.

@SSMK-wq SSMK-wq changed the title ValueError: Buffer dtype mismatch, expected 'Python object' but got 'double' ValueError: when matching dataframe and excel indices and columns Mar 22, 2022
@SSMK-wq SSMK-wq changed the title ValueError: when matching dataframe and excel indices and columns ValueError: when matching dataframe and strictly formatted excel indices and columns Mar 22, 2022
@fzumstein
Copy link
Member

Hi @SSMK-wq, the issue tracker is indeed mostly for bug reports and feature requests. If it's an easy question, I am trying to answer it (or maybe someone from the community), however, you can't expected dedicated support here, for that matter, feel free to drop me an email (on my profile page).

You might want to work without DataFrame if you want to keep it as simple as possible. Key is to read in the value into Python, then do the logic, i.e., don't loop through single xlwings range objects directly as that is inefficient. For example, to loop through every cell:

values = sales.sheets[0]['B1:N13'].value

for row_ix, row in enumerate(values):
    for col_ix, value in enumerate(row):
        print(row_ix, col_ix, value)

@SSMK-wq
Copy link
Author

SSMK-wq commented Mar 23, 2022

@fzumstein - will email you.

Regarding the question, I have two objects. One is xlwings object and other is dataframe object (with data). So, when you meant don't use dataframe, do you suggest that I store my input dataframe in some other forms like list, values etc? In your code above what you have shown is for xlwings

@fzumstein
Copy link
Member

Looks like you've found my email, so closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants