Skip to content

Commit

Permalink
BUG: Make sure that sas7bdat parsers memory is initialized to 0 (pand…
Browse files Browse the repository at this point in the history
…as-dev#21616)

Memory for numbers in sas7bdat-parsing was not initialized properly to 0.
For sas7bdat files with numbers smaller than 8 bytes this made the
least significant part of the numbers essentially random.
Fix it by initializing memory correctly.
  • Loading branch information
troels committed Sep 11, 2018
1 parent 0976e12 commit c2219c7
Show file tree
Hide file tree
Showing 4 changed files with 13 additions and 2 deletions.
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -734,7 +734,7 @@ I/O
- :func:`read_html()` no longer ignores all-whitespace ``<tr>`` within ``<thead>`` when considering the ``skiprows`` and ``header`` arguments. Previously, users had to decrease their ``header`` and ``skiprows`` values on such tables to work around the issue. (:issue:`21641`)
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
- :func:`read_csv()` will correctly parse timezone-aware datetimes (:issue:`22256`)
-
- :func:`read_sas()` will parse numbers in sas7bdat-files that have width less than 8 bytes correctly. (:issue:`21616`)

Plotting
^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion pandas/io/sas/sas7bdat.py
Original file line number Diff line number Diff line change
Expand Up @@ -614,7 +614,7 @@ def read(self, nrows=None):
ns = (self.column_types == b's').sum()

self._string_chunk = np.empty((ns, nrows), dtype=np.object)
self._byte_chunk = np.empty((nd, 8 * nrows), dtype=np.uint8)
self._byte_chunk = np.zeros((nd, 8 * nrows), dtype=np.uint8)

self._current_row_in_chunk_index = 0
p = Parser(self)
Expand Down
Binary file added pandas/tests/io/sas/data/cars.sas7bdat
Binary file not shown.
11 changes: 11 additions & 0 deletions pandas/tests/io/sas/test_sas7bdat.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,17 @@ def test_date_time(datapath):
tm.assert_frame_equal(df, df0)


def test_compact_numerical_values(datapath):
# Regression test for #21616
fname = datapath("io", "sas", "data", "cars.sas7bdat")
df = pd.read_sas(fname, encoding='latin-1')
# The two columns CYL and WGT in cars.sas7bdat have column
# width < 8 and only contains integral values. Test
# that pandas doesn't corrupt the less significant bits.
tm.assert_series_equal(df['WGT'], df['WGT'].round(), check_exact=True)
tm.assert_series_equal(df['CYL'], df['CYL'].round(), check_exact=True)


def test_zero_variables(datapath):
# Check if the SAS file has zero variables (PR #18184)
fname = datapath("io", "sas", "data", "zero_variables.sas7bdat")
Expand Down

0 comments on commit c2219c7

Please sign in to comment.