Skip to content

Commit

Permalink
API: Series.str-accessor infers dtype (and Index.str does not raise o…
Browse files Browse the repository at this point in the history
…n all-NA) (pandas-dev#23167)
  • Loading branch information
h-vetinari authored and vaibhavhrt committed Jun 6, 2019
1 parent 210e2dc commit a2f9013
Show file tree
Hide file tree
Showing 4 changed files with 233 additions and 79 deletions.
10 changes: 10 additions & 0 deletions doc/source/user_guide/text.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,16 @@ and replacing any remaining whitespaces with underscores:
``.str`` methods which operate on elements of type ``list`` are not available on such a
``Series``.

.. _text.warn_types:

.. warning::

Before v.0.25.0, the ``.str``-accessor did only the most rudimentary type checks. Starting with
v.0.25.0, the type of the Series is inferred and the allowed types (i.e. strings) are enforced more rigorously.

Generally speaking, the ``.str`` accessor is intended to work only on strings. With very few
exceptions, other uses are not supported, and may be disabled at a later point.


Splitting and Replacing Strings
-------------------------------
Expand Down
40 changes: 38 additions & 2 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,43 @@ returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwi
Providing any ``SparseSeries`` or ``SparseDataFrame`` to :func:`concat` will
cause a ``SparseSeries`` or ``SparseDataFrame`` to be returned, as before.

The ``.str``-accessor performs stricter type checks
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Due to the lack of more fine-grained dtypes, :attr:`Series.str` so far only checked whether the data was
of ``object`` dtype. :attr:`Series.str` will now infer the dtype data *within* the Series; in particular,
``'bytes'``-only data will raise an exception (except for :meth:`Series.str.decode`, :meth:`Series.str.get`,
:meth:`Series.str.len`, :meth:`Series.str.slice`), see :issue:`23163`, :issue:`23011`, :issue:`23551`.

*Previous Behaviour*:

.. code-block:: python
In [1]: s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object)
In [2]: s
Out[2]:
0 b'a'
1 b'ba'
2 b'cba'
dtype: object
In [3]: s.str.startswith(b'a')
Out[3]:
0 True
1 False
2 False
dtype: bool
*New Behaviour*:

.. ipython:: python
:okexcept:
s = pd.Series(np.array(['a', 'ba', 'cba'], 'S'), dtype=object)
s
s.str.startswith(b'a')
.. _whatsnew_0250.api_breaking.incompatible_index_unions
Incompatible Index Type Unions
Expand Down Expand Up @@ -331,7 +368,6 @@ This change is backward compatible for direct usage of Pandas, but if you subcla
Pandas objects *and* give your subclasses specific ``__str__``/``__repr__`` methods,
you may have to adjust your ``__str__``/``__repr__`` methods (:issue:`26495`).


.. _whatsnew_0250.api_breaking.deps:

Increased minimum versions for dependencies
Expand Down Expand Up @@ -537,7 +573,7 @@ Conversion
Strings
^^^^^^^

-
- Bug in the ``__name__`` attribute of several methods of :class:`Series.str`, which were set incorrectly (:issue:`23551`)
-
-

Expand Down
Loading

0 comments on commit a2f9013

Please sign in to comment.