Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master' into to_html-to_string
Browse files Browse the repository at this point in the history
* upstream/master:
  BUG: Casting tz-aware DatetimeIndex to object-dtype ndarray/Index (pandas-dev#23524)
  BUG: Delegate more of Excel parsing to CSV (pandas-dev#23544)
  API: DataFrame.__getitem__ returns Series for sparse column (pandas-dev#23561)
  CLN: use float64_t consistently instead of double, double_t (pandas-dev#23583)
  DOC: Fix Order of parameters in docstrings (pandas-dev#23611)
  TST: Unskip some Categorical Tests (pandas-dev#23613)
  TST: Fix integer ops comparison test (pandas-dev#23619)
  DOC: Fixes to docstring to add validation to CI (pandas-dev#23560)
  DOC: Remove incorrect periods at the end of parameter types (pandas-dev#23600)
  MAINT: tm.assert_raises_regex --> pytest.raises (pandas-dev#23592)
  DOC: Updating Series.resample and DataFrame.resample docstrings (pandas-dev#23197)
  • Loading branch information
thoo committed Nov 11, 2018
2 parents b87dc8c + 58a59bd commit e19bf6f
Show file tree
Hide file tree
Showing 273 changed files with 3,327 additions and 3,124 deletions.
2 changes: 1 addition & 1 deletion ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ if [[ -z "$CHECK" || "$CHECK" == "doctests" ]]; then

MSG='Doctests generic.py' ; echo $MSG
pytest -q --doctest-modules pandas/core/generic.py \
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -resample -to_json -transpose -values -xs"
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -to_json -transpose -values -xs"
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Doctests top-level reshaping functions' ; echo $MSG
Expand Down
29 changes: 28 additions & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2861,7 +2861,13 @@ to be parsed.
read_excel('path_to_file.xls', 'Sheet1', usecols=2)
If `usecols` is a list of integers, then it is assumed to be the file column
You can also specify a comma-delimited set of Excel columns and ranges as a string:

.. code-block:: python
read_excel('path_to_file.xls', 'Sheet1', usecols='A,C:E')
If ``usecols`` is a list of integers, then it is assumed to be the file column
indices to be parsed.

.. code-block:: python
Expand All @@ -2870,6 +2876,27 @@ indices to be parsed.
Element order is ignored, so ``usecols=[0, 1]`` is the same as ``[1, 0]``.

.. versionadded:: 0.24

If ``usecols`` is a list of strings, it is assumed that each string corresponds
to a column name provided either by the user in ``names`` or inferred from the
document header row(s). Those strings define which columns will be parsed:

.. code-block:: python
read_excel('path_to_file.xls', 'Sheet1', usecols=['foo', 'bar'])
Element order is ignored, so ``usecols=['baz', 'joe']`` is the same as ``['joe', 'baz']``.

.. versionadded:: 0.24

If ``usecols`` is callable, the callable function will be evaluated against
the column names, returning names where the callable function evaluates to ``True``.

.. code-block:: python
read_excel('path_to_file.xls', 'Sheet1', usecols=lambda x: x.isalpha())
Parsing Dates
+++++++++++++

Expand Down
8 changes: 8 additions & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,7 @@ Other Enhancements
- Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
- :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`)
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexsistent` (:issue:`8917`)
- :meth:`read_excel()` now accepts ``usecols`` as a list of column names or callable (:issue:`18273`)

.. _whatsnew_0240.api_breaking:

Expand Down Expand Up @@ -563,6 +564,7 @@ changes were made:
- The result of concatenating a mix of sparse and dense Series is a Series with sparse values, rather than a ``SparseSeries``.
- ``SparseDataFrame.combine`` and ``DataFrame.combine_first`` no longer supports combining a sparse column with a dense column while preserving the sparse subtype. The result will be an object-dtype SparseArray.
- Setting :attr:`SparseArray.fill_value` to a fill value with a different dtype is now allowed.
- ``DataFrame[column]`` is now a :class:`Series` with sparse values, rather than a :class:`SparseSeries`, when slicing a single column with sparse values (:issue:`23559`).

Some new warnings are issued for operations that require or are likely to materialize a large dense array:

Expand Down Expand Up @@ -1128,6 +1130,9 @@ Datetimelike
- Bug in :class:`PeriodIndex` with attribute ``freq.n`` greater than 1 where adding a :class:`DateOffset` object would return incorrect results (:issue:`23215`)
- Bug in :class:`Series` that interpreted string indices as lists of characters when setting datetimelike values (:issue:`23451`)
- Bug in :class:`Timestamp` constructor which would drop the frequency of an input :class:`Timestamp` (:issue:`22311`)
- Bug in :class:`DatetimeIndex` where calling ``np.array(dtindex, dtype=object)`` would incorrectly return an array of ``long`` objects (:issue:`23524`)
- Bug in :class:`Index` where passing a timezone-aware :class:`DatetimeIndex` and `dtype=object` would incorrectly raise a ``ValueError`` (:issue:`23524`)
- Bug in :class:`Index` where calling ``np.array(dtindex, dtype=object)`` on a timezone-naive :class:`DatetimeIndex` would return an array of ``datetime`` objects instead of :class:`Timestamp` objects, potentially losing nanosecond portions of the timestamps (:issue:`23524`)

Timedelta
^^^^^^^^^
Expand Down Expand Up @@ -1174,6 +1179,7 @@ Offsets
- Bug in :class:`FY5253` where date offsets could incorrectly raise an ``AssertionError`` in arithmetic operatons (:issue:`14774`)
- Bug in :class:`DateOffset` where keyword arguments ``week`` and ``milliseconds`` were accepted and ignored. Passing these will now raise ``ValueError`` (:issue:`19398`)
- Bug in adding :class:`DateOffset` with :class:`DataFrame` or :class:`PeriodIndex` incorrectly raising ``TypeError`` (:issue:`23215`)
- Bug in comparing :class:`DateOffset` objects with non-DateOffset objects, particularly strings, raising ``ValueError`` instead of returning ``False`` for equality checks and ``True`` for not-equal checks (:issue:`23524`)

Numeric
^^^^^^^
Expand Down Expand Up @@ -1301,6 +1307,8 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
- Bug in :meth:`read_csv()` in which :class:`MultiIndex` index names were being improperly handled in the cases when they were not provided (:issue:`23484`)
- Bug in :meth:`read_html()` in which the error message was not displaying the valid flavors when an invalid one was provided (:issue:`23549`)
- Bug in :meth:`read_excel()` in which ``index_col=None`` was not being respected and parsing index columns anyway (:issue:`20480`)
- Bug in :meth:`read_excel()` in which ``usecols`` was not being validated for proper column names when passed in as a string (:issue:`20480`)

Plotting
^^^^^^^^
Expand Down
3 changes: 0 additions & 3 deletions pandas/_libs/algos.pxd
Original file line number Diff line number Diff line change
@@ -1,9 +1,6 @@
from util cimport numeric


cpdef numeric kth_smallest(numeric[:] a, Py_ssize_t k) nogil


cdef inline Py_ssize_t swap(numeric *a, numeric *b) nogil:
cdef:
numeric t
Expand Down
18 changes: 8 additions & 10 deletions pandas/_libs/algos.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,7 @@ from numpy cimport (ndarray,
NPY_FLOAT32, NPY_FLOAT64,
NPY_OBJECT,
int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t,
uint32_t, uint64_t, float32_t, float64_t,
double_t)
uint32_t, uint64_t, float32_t, float64_t)
cnp.import_array()


Expand All @@ -32,10 +31,9 @@ import missing

cdef float64_t FP_ERR = 1e-13

cdef double NaN = <double>np.NaN
cdef double nan = NaN
cdef float64_t NaN = <float64_t>np.NaN

cdef int64_t iNaT = get_nat()
cdef int64_t NPY_NAT = get_nat()

tiebreakers = {
'average': TIEBREAK_AVERAGE,
Expand Down Expand Up @@ -199,7 +197,7 @@ def groupsort_indexer(ndarray[int64_t] index, Py_ssize_t ngroups):

@cython.boundscheck(False)
@cython.wraparound(False)
cpdef numeric kth_smallest(numeric[:] a, Py_ssize_t k) nogil:
def kth_smallest(numeric[:] a, Py_ssize_t k) -> numeric:
cdef:
Py_ssize_t i, j, l, m, n = a.shape[0]
numeric x
Expand Down Expand Up @@ -812,23 +810,23 @@ def is_monotonic(ndarray[algos_t, ndim=1] arr, bint timelike):
n = len(arr)

if n == 1:
if arr[0] != arr[0] or (timelike and <int64_t>arr[0] == iNaT):
if arr[0] != arr[0] or (timelike and <int64_t>arr[0] == NPY_NAT):
# single value is NaN
return False, False, True
else:
return True, True, True
elif n < 2:
return True, True, True

if timelike and <int64_t>arr[0] == iNaT:
if timelike and <int64_t>arr[0] == NPY_NAT:
return False, False, True

if algos_t is not object:
with nogil:
prev = arr[0]
for i in range(1, n):
cur = arr[i]
if timelike and <int64_t>cur == iNaT:
if timelike and <int64_t>cur == NPY_NAT:
is_monotonic_inc = 0
is_monotonic_dec = 0
break
Expand All @@ -853,7 +851,7 @@ def is_monotonic(ndarray[algos_t, ndim=1] arr, bint timelike):
prev = arr[0]
for i in range(1, n):
cur = arr[i]
if timelike and <int64_t>cur == iNaT:
if timelike and <int64_t>cur == NPY_NAT:
is_monotonic_inc = 0
is_monotonic_dec = 0
break
Expand Down
4 changes: 2 additions & 2 deletions pandas/_libs/algos_common_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,9 @@ def put2d_{{name}}_{{dest_name}}(ndarray[{{c_type}}, ndim=2, cast=True] values,

{{endfor}}

#----------------------------------------------------------------------
# ----------------------------------------------------------------------
# ensure_dtype
#----------------------------------------------------------------------
# ----------------------------------------------------------------------

cdef int PLATFORM_INT = (<ndarray>np.arange(0, dtype=np.intp)).descr.type_num

Expand Down
10 changes: 5 additions & 5 deletions pandas/_libs/algos_rank_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -74,9 +74,9 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average',
{{elif dtype == 'float64'}}
mask = np.isnan(values)
{{elif dtype == 'int64'}}
mask = values == iNaT
mask = values == NPY_NAT

# create copy in case of iNaT
# create copy in case of NPY_NAT
# values are mutated inplace
if mask.any():
values = values.copy()
Expand Down Expand Up @@ -149,7 +149,7 @@ def rank_1d_{{dtype}}(object in_arr, ties_method='average',
{{if dtype != 'uint64'}}
isnan = sorted_mask[i]
if isnan and keep_na:
ranks[argsorted[i]] = nan
ranks[argsorted[i]] = NaN
continue
{{endif}}

Expand Down Expand Up @@ -257,7 +257,7 @@ def rank_2d_{{dtype}}(object in_arr, axis=0, ties_method='average',
{{elif dtype == 'float64'}}
mask = np.isnan(values)
{{elif dtype == 'int64'}}
mask = values == iNaT
mask = values == NPY_NAT
{{endif}}

np.putmask(values, mask, nan_value)
Expand Down Expand Up @@ -317,7 +317,7 @@ def rank_2d_{{dtype}}(object in_arr, axis=0, ties_method='average',
{{else}}
if (val == nan_value) and keep_na:
{{endif}}
ranks[i, argsorted[i, j]] = nan
ranks[i, argsorted[i, j]] = NaN

{{if dtype == 'object'}}
infs += 1
Expand Down
4 changes: 2 additions & 2 deletions pandas/_libs/algos_take_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Template for each `dtype` helper function for take
WARNING: DO NOT edit .pxi FILE directly, .pxi is generated from .pxi.in
"""

#----------------------------------------------------------------------
# ----------------------------------------------------------------------
# take_1d, take_2d
#----------------------------------------------------------------------
# ----------------------------------------------------------------------

{{py:

Expand Down
34 changes: 16 additions & 18 deletions pandas/_libs/groupby.pyx
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
# -*- coding: utf-8 -*-

cimport cython
from cython cimport Py_ssize_t
import cython
from cython import Py_ssize_t

from libc.stdlib cimport malloc, free

import numpy as np
cimport numpy as cnp
from numpy cimport (ndarray,
double_t,
int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t,
uint32_t, uint64_t, float32_t, float64_t)
cnp.import_array()
Expand All @@ -20,10 +19,9 @@ from algos cimport (swap, TiebreakEnumType, TIEBREAK_AVERAGE, TIEBREAK_MIN,
TIEBREAK_MAX, TIEBREAK_FIRST, TIEBREAK_DENSE)
from algos import take_2d_axis1_float64_float64, groupsort_indexer, tiebreakers

cdef int64_t iNaT = get_nat()
cdef int64_t NPY_NAT = get_nat()

cdef double NaN = <double>np.NaN
cdef double nan = NaN
cdef float64_t NaN = <float64_t>np.NaN


cdef inline float64_t median_linear(float64_t* a, int n) nogil:
Expand Down Expand Up @@ -67,13 +65,13 @@ cdef inline float64_t median_linear(float64_t* a, int n) nogil:
return result


# TODO: Is this redundant with algos.kth_smallest?
# TODO: Is this redundant with algos.kth_smallest
cdef inline float64_t kth_smallest_c(float64_t* a,
Py_ssize_t k,
Py_ssize_t n) nogil:
cdef:
Py_ssize_t i, j, l, m
double_t x, t
float64_t x, t

l = 0
m = n - 1
Expand Down Expand Up @@ -109,7 +107,7 @@ def group_median_float64(ndarray[float64_t, ndim=2] out,
cdef:
Py_ssize_t i, j, N, K, ngroups, size
ndarray[int64_t] _counts
ndarray data
ndarray[float64_t, ndim=2] data
float64_t* ptr

assert min_count == -1, "'min_count' only used in add and prod"
Expand Down Expand Up @@ -139,8 +137,8 @@ def group_median_float64(ndarray[float64_t, ndim=2] out,
@cython.boundscheck(False)
@cython.wraparound(False)
def group_cumprod_float64(float64_t[:, :] out,
float64_t[:, :] values,
int64_t[:] labels,
const float64_t[:, :] values,
const int64_t[:] labels,
bint is_datetimelike,
bint skipna=True):
"""
Expand Down Expand Up @@ -177,7 +175,7 @@ def group_cumprod_float64(float64_t[:, :] out,
@cython.wraparound(False)
def group_cumsum(numeric[:, :] out,
numeric[:, :] values,
int64_t[:] labels,
const int64_t[:] labels,
is_datetimelike,
bint skipna=True):
"""
Expand Down Expand Up @@ -217,7 +215,7 @@ def group_cumsum(numeric[:, :] out,

@cython.boundscheck(False)
@cython.wraparound(False)
def group_shift_indexer(ndarray[int64_t] out, ndarray[int64_t] labels,
def group_shift_indexer(int64_t[:] out, const int64_t[:] labels,
int ngroups, int periods):
cdef:
Py_ssize_t N, i, j, ii
Expand Down Expand Up @@ -291,7 +289,7 @@ def group_fillna_indexer(ndarray[int64_t] out, ndarray[int64_t] labels,
"""
cdef:
Py_ssize_t i, N
ndarray[int64_t] sorted_labels
int64_t[:] sorted_labels
int64_t idx, curr_fill_idx=-1, filled_vals=0

N = len(out)
Expand Down Expand Up @@ -327,10 +325,10 @@ def group_fillna_indexer(ndarray[int64_t] out, ndarray[int64_t] labels,

@cython.boundscheck(False)
@cython.wraparound(False)
def group_any_all(ndarray[uint8_t] out,
ndarray[int64_t] labels,
ndarray[uint8_t] values,
ndarray[uint8_t] mask,
def group_any_all(uint8_t[:] out,
const int64_t[:] labels,
const uint8_t[:] values,
const uint8_t[:] mask,
object val_test,
bint skipna):
"""Aggregated boolean values to show truthfulness of group elements
Expand Down
Loading

0 comments on commit e19bf6f

Please sign in to comment.