Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode text table output from pandas dataframe: turns string objects into numbers and changes their representation #44

Closed
mjb-v9-5-2 opened this issue Jun 22, 2021 · 2 comments

Comments

@mjb-v9-5-2
Copy link

mjb-v9-5-2 commented Jun 22, 2021

This brilliant tool is breaking when rendering a pandas table as text.

The data contains very long numbers stored as strings. The strings contain representations of long decimals and long ints with thousands separators. In the Pandas dataframe, they are stored as objects. When I output the table using a pytablewriter unicode writer, it produces a faithful rendering of the long ints, but it seems to process the strings representing long decimals as though they were numbers, and shows them as though they had been converted from strings into floats, with all the problems of string representation that floats bring: unwanted zeros after the last significant decimal digit on short decimals, and precision too short to show the whole number on long decimals.

For example:

"0.000000000000001" is represented by pytablewriter as "0.000000"
"0.001" as "0.001000"

Yet, with long numbers:

"1,000,000,000,000" is represented faithfully as "1,000,000,000,000".

The problem seems to be a general issue with decimals. Notwithstanding the fact that they are held as strings for the very purpose of ensuring their representation is as strings and not numbers, the pytablewriter output table applies different justification to the strings that represent integers and those that represent decimals. The former, it justifies left, the latter it justifies right. So it seems to be treating the strings that contain decimals as though they are numbers, converts them to float and then outputs them as numbers.

It justifies the string "1" to the right with the decimals as well.

@thombashi
Copy link
Owner

thombashi commented Jul 18, 2021

@mjb-v9-5-2
Thank you for your feedback.

The problems that you described are fixed for certain values at pytablewriter 0.62.0:

import pandas as pd
import pytablewriter as ptw

writer = ptw.UnicodeTableWriter(
    dataframe=pd.DataFrame(
        {"realnumber": ["0.000000000000001", "0.000000000000002"], "long": ["1,000,000,000,000", "1"]}
    ),
    margin=1,
    column_styles=[
        ptw.style.Style(thousand_separator=","),
        ptw.style.Style(thousand_separator=","),
    ]
)
writer.write_table()
┌───────────────────┬───────────────────┐
│    realnumber     │       long        │
├───────────────────┼───────────────────┤
│ 0.000000000000001 │ 1,000,000,000,000 │
├───────────────────┼───────────────────┤
│ 0.000000000000002 │                 1 │
└───────────────────┴───────────────────┘

However, in the case of mixed decimal place values, the problem still exists as before:

import pandas as pd
import pytablewriter as ptw

writer = ptw.UnicodeTableWriter(
    dataframe=pd.DataFrame(
        {"realnumber": ["0.000000000000001", "0.1"], "long": ["1,000,000,000,000", "1"]}
    ),
    margin=1,
    column_styles=[
        ptw.style.Style(thousand_separator=","),
        ptw.style.Style(thousand_separator=","),
    ]
)
writer.write_table()
┌─────────────┬───────────────────┐
│ realnumber  │       long        │
├─────────────┼───────────────────┤
│ 0.000000000 │ 1,000,000,000,000 │
├─────────────┼───────────────────┤
│ 0.100000000 │                 1 │
└─────────────┴───────────────────┘

I will also fix this in the future version.

@thombashi
Copy link
Owner

The problem fixed at pytablewriter 0.63.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants