# 2.2 Lowercase

An important first step in working with text data is simply converting it into lowercase. Why do we do this? Well, it helps maintain consistency in our data and our output. When we're working with text, be that exploratory analysis or machine learning, we want to ensure words are understood and counted as the same word, your model might treat a word with a capital letter different from the same word  without any capital letter. Lowercasing ensures conformity.

It also make it easier to continue with additonal cleaning of the data as we don’t have to account for different cases.

However, do remember that lowercasing can change the meaning of some text e.g "US" in uppercase is understood as a country, as opposed to "us".

Let's take a look at how easy it is to convert our data to lowercase using python's built in lower() function.

In [20]:
sentence = "Her cat's name is Luna"
print(sentence)

Her cat's name is Luna


In [21]:
# .lower() converts all characters in the sentence to lowercase
lower_sentence = sentence.lower()
print(lower_sentence)

her cat's name is luna


In [22]:
sentence_list = ['Could you pass me the TV remote?', 
                 'It is IMPOSSIBLE to find this hotel', 
                 'Want to go for dinner on Tuesday?']
print(sentence_list)

['Could you pass me the TV remote?', 'It is IMPOSSIBLE to find this hotel', 'Want to go for dinner on Tuesday?']


In [23]:
# List comprehension to lowercase each sentence in the list = short and efficient
lower_sentence_list = [x.lower() for x in sentence_list]
print(lower_sentence_list)

['could you pass me the tv remote?', 'it is impossible to find this hotel', 'want to go for dinner on tuesday?']


## Another Example

In [24]:
# Let's normalize email subjects before filtering for keywords
emails = ["URGENT: Password reset", "Welcome to our Newsletter", "YOU WON a prize!"]
emails_lower = [email.lower() for email in emails]
print(emails_lower)

['urgent: password reset', 'welcome to our newsletter', 'you won a prize!']


## What I Learned

- .lower() is useful for case-insensitive text processing -> **use .lower() to normalize text for easier comparison, search, or preprocessing**

- **List comprehensions** are great for applying transformations across lists in one line.

- Always normalize casing before comparing strings in NLP or text-based tasks.