Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandasbcp wrong encoding when saving spanish characters #170

Open
anlagbr opened this issue Dec 29, 2023 · 2 comments
Open

pandasbcp wrong encoding when saving spanish characters #170

anlagbr opened this issue Dec 29, 2023 · 2 comments

Comments

@anlagbr
Copy link

anlagbr commented Dec 29, 2023

bcpandas saves pd.DataFrame with default encoding utf-8 and when it's uploaded through bcp some Spanish characters are not displayed correctly in the database. (They are correctly displayed in my pd.DataFrame

Right now, I have tried specifying -C 65001 to the bcp command by modifying the source bcpandas files. It has not worked. I will post a solution if I find one.

Best.

@vlasvlasvlas
Copy link

i also had to change the default encoding previously when using bcpandas, would be great if you can pass it as a param

@anlagbr
Copy link
Author

anlagbr commented Jan 2, 2024

image

bcpandas uses format file created just in time. Thus, the flag -C 65001 won't work because the format file takes precedence.

However, you can specify the collation as a param bcpandas.to_sql and this solves the problem. I specified collation="Modern_Spanish_100_CI_AS_SC_UTF8" and it solved my encoding problem.

import bcpandas

bcpandas.to_sql(
            df,
            table.name,
            creds,
            collation="Modern_Spanish_100_CI_AS_SC_UTF8",
            encoding="utf-8"
 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants