
### 1. Vectorized String Operations
**Definition:**
Vectorized string operations in Pandas allow for efficient string manipulations on Series and DataFrame objects. These operations are applied element-wise and are optimized for performance.

**Use Cases in Algorithmic Trading:**
- **Cleaning and Preprocessing Data:** Removing unwanted characters, trimming whitespace, and normalizing text in trading data.
- **Feature Extraction:** Extracting meaningful features from textual data such as news headlines, analyst reports, or social media posts.
- **Text Processing:** Preparing and transforming text data for natural language processing (NLP) tasks to generate trading signals.

**Example:**
```python
import pandas as pd

# Sample news headlines related to trading
df = pd.DataFrame({
    'headline': [
        '  Apple releases new iPhone ',
        'GOOGLE launches new AI tool!',
        'Tesla stock surges after earnings report  '
    ]
})

# Removing leading and trailing whitespace
df['cleaned_headline'] = df['headline'].str.strip()

# Converting to lowercase
df['lowercase_headline'] = df['cleaned_headline'].str.lower()

# Checking if a headline contains the word 'new'
df['contains_new'] = df['lowercase_headline'].str.contains('new')

print(df)
```

### Summary

- **Vectorized string operations** clean and preprocess textual data from news headlines, analyst reports, or social media to extract meaningful features and generate trading signals.


In [1]:
import pandas as pd
import numpy as np

In [10]:
# price = [100,200,300,None,400,500]
# mprice =[]
# for i in price:
#     mprice.append(i+10)
# mprice


ticker = ['AAPL','MSFT','META',None,'HPCL']
mprice =[]
for i in ticker:
    mprice.append(i.lower())
mprice

AttributeError: 'NoneType' object has no attribute 'lower'

In [13]:
# What are vectorized operations
price = [100,200,300,400,500]
npprice = np.array(price)
npprice + 10

array([110, 210, 310, 410, 510])

In [None]:
# problem in vectorized opertions in vanilla python
ticker = ['AAPL','MSFT','META',None,'HPCL']
mprice =[]
for i in ticker:
    mprice.append(i.lower())
mprice

In [15]:
# How pandas solves this issue?
ticker = ['AAPL','MSFT','META',None,'HPCL']
ser = pd.Series(ticker)


# string accessor
ser.str.lower()

# fast and optimized

0    aapl
1    msft
2    meta
3    None
4    hpcl
dtype: object

In [21]:
# import real world algotrading set

df = pd.read_csv('constituents-financials_csv.csv')
dfs = df[['Symbol','Name','Sector','SEC Filings']]
dfs


Unnamed: 0,Symbol,Name,Sector,SEC Filings
0,MMM,3M Company,Industrials,http://www.sec.gov/cgi-bin/browse-edgar?action...
1,AOS,A.O. Smith Corp,Industrials,http://www.sec.gov/cgi-bin/browse-edgar?action...
2,ABT,Abbott Laboratories,Health Care,http://www.sec.gov/cgi-bin/browse-edgar?action...
3,ABBV,AbbVie Inc.,Health Care,http://www.sec.gov/cgi-bin/browse-edgar?action...
4,ACN,Accenture plc,Information Technology,http://www.sec.gov/cgi-bin/browse-edgar?action...
...,...,...,...,...
500,XYL,Xylem Inc.,Industrials,http://www.sec.gov/cgi-bin/browse-edgar?action...
501,YUM,Yum! Brands Inc,Consumer Discretionary,http://www.sec.gov/cgi-bin/browse-edgar?action...
502,ZBH,Zimmer Biomet Holdings,Health Care,http://www.sec.gov/cgi-bin/browse-edgar?action...
503,ZION,Zions Bancorp,Financials,http://www.sec.gov/cgi-bin/browse-edgar?action...


In [34]:
# Common Functions
# lower/upper/capitalize/title
dfs['Name'].str.lower()
dfs['Name'].str.upper()
dfs['Name'].str.capitalize()
dfs['Name'].str.title()
# len
dfs['Name'][dfs['Name'].str.len() > 30]
# strip
dfs['Name'].str.strip()

0                  3M Company
1             A.O. Smith Corp
2         Abbott Laboratories
3                 AbbVie Inc.
4               Accenture plc
                ...          
500                Xylem Inc.
501           Yum! Brands Inc
502    Zimmer Biomet Holdings
503             Zions Bancorp
504                    Zoetis
Name: Name, Length: 505, dtype: object

In [48]:
# split -> get
dfs['Name'][dfs['Name'].str.split().str.len() > 4]

15             Air Products & Chemicals Inc
19      Alexandria Real Estate Equities Inc
37         American Water Works Company Inc
56                Arthur J. Gallagher & Co.
64               Baker Hughes, a GE Company
202          Fortune Brands Home & Security
258           J. B. Hunt Transport Services
277     Laboratory Corp. of America Holding
436       The Bank of New York Mellon Corp.
452        Twenty-First Century Fox Class A
453        Twenty-First Century Fox Class B
457    Ulta Salon Cosmetics & Fragrance Inc
Name: Name, dtype: object

In [55]:
# replace

dfs['Name'].str.replace(' ','')


0                 3MCompany
1             A.O.SmithCorp
2        AbbottLaboratories
3                AbbVieInc.
4              Accentureplc
               ...         
500               XylemInc.
501           Yum!BrandsInc
502    ZimmerBiometHoldings
503            ZionsBancorp
504                  Zoetis
Name: Name, Length: 505, dtype: object

In [73]:
# filtering
# startswith/endswith
dfs['Name'][dfs['Name'].str.startswith('A')]
dfs['Name'][dfs['Name'].str.startswith('Z')]

dfs['Name'][dfs['Name'].str.endswith('S')]


dfs['Name'][(dfs['Name'].str.startswith('A')) & (dfs['Name'].str.endswith('A'))]

# isdigit/isalpha...


dfs['Name'][dfs['Name'].str.isalpha()]


22           Allegion
45           Andeavor
46              ANSYS
73          BlackRock
76         BorgWarner
81           Broadcom
100            Cerner
126    ConocoPhillips
154         DowDuPont
174           Equinix
232           Hologic
245            Incyte
267           KeyCorp
288    LyondellBasell
290          Macerich
324           Navient
325            NetApp
334              Nike
337         Nordstrom
348             ONEOK
355            PayPal
359       PerkinElmer
360           Perrigo
376          Prologis
382             Qorvo
392         Regeneron
395            ResMed
433        TechnipFMC
451       TripAdvisor
504            Zoetis
Name: Name, dtype: object

In [162]:
# applying regex
# contains
# search john -> both case

dfs['Name'][dfs['Name'].str.contains('JOHN',case=False)]

# find lastnames with start and end char vowel

#dfs['Last Name']= dfs['Name'].str.split().str[-1]

dfs['Last Name'][(dfs['Last Name'].str.startswith('A')) & dfs['Last Name'].str.endswith('a')]


dfs['Last Name'][dfs['Last Name'].str.contains('^[aeiouAEIOU].+[aeiouAEIOU]$')]


160       E*Trade
230    Enterprise
245        Incyte
248      Exchange
345    Automotive
351       America
481      Alliance
Name: Last Name, dtype: object

In [168]:
# slicing

dfs['Name'].str[::-1]

0                  ynapmoC M3
1             proC htimS .O.A
2         seirotarobaL ttobbA
3                 .cnI eiVbbA
4               clp erutneccA
                ...          
500                .cnI melyX
501           cnI sdnarB !muY
502    sgnidloH temoiB remmiZ
503             procnaB snoiZ
504                    siteoZ
Name: Name, Length: 505, dtype: object