<a href="https://colab.research.google.com/github/ubsuny/PHY386/blob/main/2025/handson/LogicalAndBitwiseOperators.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Logical vs. Bitwise Operators in Pandas Boolean Indexing

based on: [https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas]

This notebook illustrates the crucial difference between using Python's built-in logical operators (`and`, `or`, `not`) and Pandas' bitwise operators (`&`, `|`, `~`) for boolean indexing.

### Why the Difference Matters

Pandas uses NumPy arrays under the hood for its Series and DataFrames. When you create a boolean mask for indexing (e.g., `df['column'] > 0`), you're creating a NumPy boolean array.  Python's `and`, `or`, and `not` operators are designed to work on single boolean values, not arrays of booleans.  Attempting to use them with boolean arrays leads to a `ValueError` because Python doesn't know how to resolve the "truthiness" of an entire array.  Do *all* the values need to be `True`?  Does *any* value need to be `True`?  It's ambiguous.

Bitwise operators, on the other hand, are overloaded in NumPy (and therefore Pandas) to perform element-wise logical operations on boolean arrays.


### Example Setup

Let's create a sample Pandas DataFrame:

In [None]:
import pandas as pd

data = {'col1':list(range(1,6)),  \
        'col2':list(range(6,11)), \
        'col3': [True, False, True, False, True]}

df = pd.DataFrame(data)

print(df)

Let's examine the selection of a dataframe:

In [None]:
df['col1'] > 2

So what I want to to is a elementwise comparison for multiple conditions:

In [None]:
9>8 and 1>2

In [None]:
[9,8] > [1,2]

In [None]:
x = "anything"
if x:
    print("x is truthy")
else:
    print("x is falsey")

**For python is any variable "True" if it exists!**
Convenient but can lead to issues

### Using Bitwise Operators (`&`, `|`, `~`)

These operators work correctly for boolean indexing:


In [None]:
# Selecting rows where 'col1' is greater than 2 AND 'col2' is less than 10
filtered_df = df[(df['col1'] > 2) & (df['col2'] < 10)]
print("Filtered DataFrame with '&':")
print(filtered_df)

In [None]:
# Selecting rows where 'col1' is greater than 4 OR 'col2' is less than 7

filtered_df = df[(df['col1'] > 4) | (df['col2'] < 7)]
print("Filtered DataFrame with '|':")
print(filtered_df)

In [None]:
# Selecting rows where 'col3' is NOT True (i.e., False)

filtered_df = df[~df['col3']]
print("Filtered DataFrame with '~':")
print(filtered_df)


### Attempting to Use Logical Operators (`and`, `or`, `not`)

This will raise a `ValueError`:


In [None]:
try:
    filtered_df = df[(df['col1'] > 2) and (df['col2'] < 10)]
    print(filtered_df)
except ValueError as e:
    print(f"Error: {e}")


As you can see, the `and` operator raises a `ValueError` because it doesn't know how to handle the boolean Series `(df['col1'] > 2)` and `(df['col2'] < 10)`.  The same would happen with `or` and `not`.

### Precedence

Be careful of operator precedence.  `&` and `|` have higher precedence than comparison operators like `>` and `<`.  That's why you need to enclose each comparison in parentheses, like this: `(df['col1'] > 2) & (df['col2'] < 10)`.  Without the parentheses, you might get unexpected results or errors.


### Summary

-   Use `&` (bitwise AND), `|` (bitwise OR), and `~` (bitwise NOT) for boolean indexing in Pandas.
-   Avoid `and`, `or`, and `not`, as they are not designed for operating on arrays of boolean values.
-   Remember to use parentheses to ensure correct operator precedence.
