Strings behave a little strangely in NumPy code because NumPy needs to know how many bytes to expect, which isn‚Äôt usually a factor in Python programming. Luckily, NumPy does a pretty good job at taking care of less complex cases for you:

In [1]:
import numpy as np

In [2]:
names = np.array(["bob", "amy", "han"], dtype=str)

In [3]:
names

array(['bob', 'amy', 'han'], dtype='<U3')

In [4]:
names.itemsize

12

In [5]:
names = np.array(["bob", "amy", "han"])

In [6]:
names

array(['bob', 'amy', 'han'], dtype='<U3')

In [8]:
more_names = np.array(["bobo", "jehosephat"])

In [9]:
np.concatenate((names, more_names))

array(['bob', 'amy', 'han', 'bobo', 'jehosephat'], dtype='<U10')

In input 2, you provide a dtype of Python‚Äôs built-in str type, but in output 3, it‚Äôs been converted into a little-endian Unicode string of size 3. When you check the size of a given item in input 4, you see that they‚Äôre each 12 bytes: three 4-byte Unicode characters.

**Note**: When dealing with NumPy data types, you have to think about things like the endianness of your values. In this case, the dtype '<U3' means that each value is the size of three Unicode characters, with the least-significant byte stored first in memory and the most-significant byte stored last. A dtype of '>U3' would signify the reverse.

As an example, NumPy represents the Unicode character ‚Äúüêç‚Äù with the bytes 0xF4 0x01 0x00 with a dtype of '<U1' and 0x00 0x01 0xF4 with a dtype of '>U1'. Try it out by creating an array full of emoji, setting the dtype to one or the other, and then calling .tobytes() on your array!

If you‚Äôd like to study up on how Python treats the ones and zeros of your normal Python data types, then the official documentation for the struct library, which is a standard library module that works with raw bytes, is another good resource.

When you combine that with an array that has a larger item to create a new array in input 8, NumPy helpfully figures out how big the new array‚Äôs items need to be and grows them all to size <U10.

But here‚Äôs what happens when you try to modify one of the slots with a value larger than the capacity of the dtype:

In [10]:
names[2] = "jamima"

In [11]:
names

array(['bob', 'amy', 'jam'], dtype='<U3')

It doesn‚Äôt work as expected and truncates your value instead. If you already have an array, then NumPy‚Äôs automatic size detection won‚Äôt work for you. You get three characters and that‚Äôs it. The rest get lost in the void.

This is all to say that, in general, NumPy has your back when you‚Äôre working with strings, but you should always keep an eye on the size of your elements and make sure you have enough space when modifying or changing arrays in place.

https://realpython.com/numpy-tutorial/