#1. DEFINE A PANDAS SERIES
## using pd.Series function


In [5]:
# Pandas is a data manipulation and analysis tool that is built on Numpy.
# Pandas uses a data structure known as DataFrame (think of it as Microsoft excel in Python). 
# DataFrames empower programmers to store and manipulate data in a tabular fashion (rows and columns).
# Series Vs. DataFrame? Series is considered a single column of a DataFrame.
import pandas as pd 



In [1]:
# Let's define a Python list that contains 5 crypto currencies 
df1 = ["BTC" , "XRP" , "LTC"  , 'ADA' , 'ETH']
df1

['BTC', 'XRP', 'LTC', 'ADA', 'ETH']

In [2]:
# Let's confirm the Datatype
type(df1)


list

In [6]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that series is formed of data and associated index (numeric index has been automatically generated) 
# Check Pandas Documentation for More information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html#pandas.Series
# Object datatype is used for text data (String)
df2 = pd.Series(data = df1)
df2



0    BTC
1    XRP
2    LTC
3    ADA
4    ETH
dtype: object

In [9]:
# Let's confirm the Pandas Series Datatype
type(df2)

pandas.core.series.Series

In [10]:
# Let's define another Pandas Series that contains numeric values (crypto prices) instead of text data
# Note that we have int64 datatype which means it's integer stored in 64 bits in memory
df3 = pd.Series(data = [200, 500 , 1000 ,1200 , 50])
df3

0     200
1     500
2    1000
3    1200
4      50
dtype: int64

**MINI CHALLENGE #1:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Confirm the datatype of "my_series"**

In [13]:
my_series = pd.Series(data = ["TATA" , "ADANI" , "APPLE"])
my_series


0     TATA
1    ADANI
2    APPLE
dtype: object

In [14]:
type(my_series)

pandas.core.series.Series

 #2. DEFINE A PANDAS SERIES WITH CUSTOM INDEX

 ## using index function 

In [15]:
# Let's define a Python list that contains 5 Crypto currencies
df1 = [" BTC" , "XRP" , "LTC" ,'ADA', 'ETH']
df1

[' BTC', 'XRP', 'LTC', 'ADA', 'ETH']

In [17]:
# Let's define a python list as shown below. This python list will be used for the Series index:
df3 = ['crypto#1', 'crypto#2', 'crypto#3', 'crypto#4', 'crypto#5']
df3

['crypto#1', 'crypto#2', 'crypto#3', 'crypto#4', 'crypto#5']

In [20]:
# Let's create a one dimensional Pandas "series" 
# Let's use Pandas Constructor Method to create a series from a Python list
# Note that this series is formed of data and associated labels 
df4 = pd.Series(data = df1 , index = df3)

In [21]:
# Let's view the series
df4

crypto#1     BTC
crypto#2     XRP
crypto#3     LTC
crypto#4     ADA
crypto#5     ETH
dtype: object

In [22]:
# Let's obtain the datatype
type(df4)

pandas.core.series.Series

**MINI CHALLENGE #2:**
- **Define a Pandas Series named "my_series" that contains your top 3 favourite stocks. Instead of using default numeric indexes (similar to mini challenge #1), use the following indexes "stock #1", "stock #2", and "stock #3"**

In [27]:

my_series = pd.Series(data = ["TATA" , "ADANI" , "APPLE"] , index = ["stock #1", "stock #2", "stock #3"])
my_series

stock #1     TATA
stock #2    ADANI
stock #3    APPLE
dtype: object

In [28]:
type(my_series)

pandas.core.series.Series

 #3. DEFINE A PANDAS SERIES FROM A DICTIONARY
## dict_name = { " " : }

In [29]:
# A Dictionary consists of a collection of key-value pairs. Each key-value pair maps the key to its corresponding value.
# Keys are unique within a dictionary while values may not be. 
# List elements are accessed by their position in the list, via indexing while Dictionary elements are accessed via keys
# Define a dictionary named "my_dict" using key-value pairs

my_dict = {'Employee ID ' : 1 , "Employee Name" : 'YASH' , " Salary" : 4000  , " years " : 10}


In [30]:
# Show the dictionary
my_dict


{'Employee ID ': 1, 'Employee Name': 'YASH', ' Salary': 4000, ' years ': 10}

In [32]:
# Confirm the dictionary datatype 
type(my_dict)

dict

In [34]:
# Let's define a Pandas Series Using the dictionary
df5 = pd.Series(data = my_dict)
df5

Employee ID         1
Employee Name    YASH
 Salary          4000
 years             10
dtype: object

**MINI CHALLENGE #3:**
- **Create a Pandas Series from a dictionary with 3 of your favourite stocks and their corresponding prices** 

In [36]:
stock = {'tata' : 1000 , 'apple' : 1202  , 'tesla' : 10000}
df6 = pd.Series(data = stock)  
df6 

tata      1000
apple     1202
tesla    10000
dtype: int64

 #4. PANDAS ATTRIBUTES
 ## IMP READ BELOW

In [37]:
# Attributes/Properties: do not use parantheses "()" and are used to get Pandas Series Properties. Ex: my_series.values, my_series.shape
# Methods: use parantheses "()" and might include arguments and they actually alter/change the Pandas Series. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()
# Indexers: use square brackets "[]" and are used to access specific elements in a Pandas Series or DataFrame. Ex: my_series.loc[], my_series.iloc[]

# Let's redefine a Pandas Series containing our favourite 5 cryptos 
df1 = ["BTC" , "XRP" , "LTC"  , 'ADA' , 'ETH']
df2 = pd.Series(data = df1)
df2

0    BTC
1    XRP
2    LTC
3    ADA
4    ETH
dtype: object

##Below all are Attributes of Pandas which does not required () parenthesis at the end


In [39]:
# ".Values" attribute is used to return Series as ndarray depending on its dtype
# Check this for more information: https://pandas.pydata.org/docs/reference/api/pandas.Series.values.html#pandas.Series.values
df2.values

array(['BTC', 'XRP', 'LTC', 'ADA', 'ETH'], dtype=object)

In [40]:
# index is used to return the index (axis labels) of the Series
df2.index

RangeIndex(start=0, stop=5, step=1)

In [41]:
# dtype is used to return the datatype of the Series ('O' stands for 'object' datatype)
df2.dtype

dtype('O')

In [44]:
# Check if all elements are unique or not
df2.is_unique

True

In [43]:
# Check the shape of the Series
# note that a Series is one dimensional
df2.shape

(5,)

**MINI CHALLENGE #4:** 
- **What is the size of the Pandas Series? (External Research for the proper attribute is Required)**

In [45]:
df2.size

5

#5. PANDAS METHODS

In [47]:
# Methods have parentheses and they actually alter/change the Pandas Series
# Methods: use parantheses "()" and might include arguments. Ex: my_series.tail(), my_series.head(), my_series.drop_duplicates()

# Let's define another Pandas Series that contains numeric values (crypto prices) instead of text data
# Note that we have int64 datatype which means it contains integer values stored in 64 bits in memory
df1 = [200, 500 , 1000 ,1200 , 50]
df2 = pd.Series( data = df1)
df2


0     200
1     500
2    1000
3    1200
4      50
dtype: int64

In [48]:
# Let's obtain the sum of all elements in the Pandas Series
df2.sum()

2950

In [50]:
# Let's obtain the multiplication of all elements in the Pandas Series
df2.product()

6000000000000

In [51]:
# Let's obtain the average
df2.mean()

590.0

In [52]:
# Let's show the first couple of elements in the Pandas Series
df2.head(2)

0    200
1    500
dtype: int64

In [53]:
# Note that head creates a new dataframe 
new_df2 = df2.head(4)
new_df2

0     200
1     500
2    1000
3    1200
dtype: int64

**MINI CHALLENGE #5:** 
- **Show the last 2 rows in the Pandas Series (External Research is Required)** 
- **How many bytes does this Pandas Series consume in memory? (External Research is Required)**

In [54]:
df2.tail(2)

3    1200
4      50
dtype: int64

In [55]:
df2.memory_usage()

168

#6. IMPORT CSV DATA (1-D) USING PANDAS

In [61]:
# Pandas read_csv is used to read a csv file and store data in a DataFrame by default (DataFrames will be covered shortly!)
# Use Squeeze to convert it into a Pandas Series (One-dimensional)
# Notice that no foramtting exists when a Series is plotted
df1 = pd.read_csv( "/content/crypto.csv" , squeeze = True )
df1





  df1 = pd.read_csv( "/content/crypto.csv" , squeeze = True )


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [62]:


type(df1)

pandas.core.series.Series

**MINI CHALLENGE #6:**
- **Set Squeeze = False and rerun the cell, what do you notice? Use Type to compare both outputs**

In [63]:
##squeez = false helps us to represnt data in excel way
##if u apply false and use built in function like min max u will get the name of cloumn and not the value 
#its by defalut set to false if u dont set it to true
df1 = pd.read_csv( "/content/crypto.csv" , squeeze = False )
df1



  df1 = pd.read_csv( "/content/crypto.csv" , squeeze = False )


Unnamed: 0,BTC-USD Price
0,457.334015
1,424.440002
2,394.795990
3,408.903992
4,398.821014
...,...
2380,55950.746090
2381,57750.199220
2382,58917.691410
2383,58918.832030


In [65]:
type(df1)

pandas.core.frame.DataFrame

 #7. PANDAS BUILT-IN FUNCTIONS

In [75]:
# Pandas works great with pre-existing python functions 
# You don't have to play with pandas methods and directly leverage Python functions
# Check Python built-in functions here: https://docs.python.org/3/library/functions.html

df1 = pd.read_csv("/content/crypto.csv" , squeeze = True)
df1




  df1 = pd.read_csv("/content/crypto.csv" , squeeze = True)


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [67]:
# Obtain the Data Type of the Pandas Series
type(df1)

pandas.core.frame.DataFrame

In [69]:
# Obtain the length of the Pandas Series
len(df1)

2385

In [76]:
# Obtain the maximum value of the Pandas Series
max(df1)

61243.08594

In [77]:
# Obtain the minimum value of the Pandas Series
min(df1)

178.1029968

#7:** 
## important
- **Given the following Pandas Series, convert all positive values to negative using python built-in functions**
- **Obtain only unique values (ie: Remove duplicates) using python built-in functions**
- my_series = pd.Series(data = [-10, 100, -30, 50, 100])


In [80]:
my_series = pd.Series(data = [-10, 100, -30, 50, 100])
abs(my_series)

0     10
1    100
2     30
3     50
4    100
dtype: int64

In [81]:
set(my_series)

{-30, -10, 50, 100}

 #8. SORTING PANDAS SERIES

In [83]:
# Let's import CSV data as follows:
df1 = pd.read_csv("/content/crypto.csv" , squeeze=True)
df1



  df1 = pd.read_csv("/content/crypto.csv" , squeeze=True)


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [87]:
# You can sort the values in the dataframe as follows
df1.sort_values()

119       178.102997
122       199.259995
121       208.097000
120       209.843994
123       210.339004
            ...     
2382    58917.691410
2383    58918.832030
2384    59095.808590
2366    59302.316410
2365    61243.085940
Name: BTC-USD Price, Length: 2385, dtype: float64

In [88]:
# Let's view Pandas Series again after sorting, Note that nothing changed in memory! you have to make sure that inplace is set to True
df1.head(5)

0    457.334015
1    424.440002
2    394.795990
3    408.903992
4    398.821014
Name: BTC-USD Price, dtype: float64

In [89]:
# Set inplace = True to ensure that change has taken place in memory 
df1.sort_values(inplace=True)

In [90]:
# Note that now the change (ordering) took place 
df1.head(5)

119    178.102997
122    199.259995
121    208.097000
120    209.843994
123    210.339004
Name: BTC-USD Price, dtype: float64

In [93]:
# Notice that the indexes are now changed 
# You can also sort by index (revert back to the original Pandas Series) as follows: 
df1.sort_index()
df1

0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

**MINI CHALLENGE #8:**
- **Sort the BTC_price_series values in a decending order instead. Make sure to update values in-memory.**

In [94]:
df1.sort_values(ascending = False)
df1


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

 #9. PERFORM MATH OPERATIONS ON PANDAS SERIES

In [96]:
# Let's import CSV data as follows:
df1 = pd.read_csv("/content/crypto.csv" , squeeze = True)
df1



  df1 = pd.read_csv("/content/crypto.csv" , squeeze = True)


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [97]:
# Apply Sum Method on Pandas Series
df1.sum()

15435379.738852698

In [98]:
# Apply count Method on Pandas Series
df1.count()

2385

In [99]:
# Obtain the maximum value
df1.max()

61243.08594

In [100]:
# Obtain the minimum value
df1.min()

178.1029968

In [101]:
# My favourite: Describe! 
# Describe is used to obtain all statistical information in one place 
df1.describe()

count     2385.000000
mean      6471.857333
std       9289.022505
min        178.102997
25%        454.618988
50%       4076.632568
75%       8864.766602
max      61243.085940
Name: BTC-USD Price, dtype: float64

**MINI CHALLENGE #9:**
- **Obtain the average price of the BTC_price_series using two different methods**

In [102]:
df1.mean()


6471.857332852284

In [106]:
df1.sum()/df1.count()

6471.857332852284

 #10. CHECK IF A GIVEN ELEMENT EXISTS IN A PANDAS SERIES

In [107]:
# Let's import CSV data as follows:
df1 = pd.read_csv("/content/crypto.csv" , squeeze = True)
df1



  df1 = pd.read_csv("/content/crypto.csv" , squeeze = True)


0         457.334015
1         424.440002
2         394.795990
3         408.903992
4         398.821014
            ...     
2380    55950.746090
2381    57750.199220
2382    58917.691410
2383    58918.832030
2384    59095.808590
Name: BTC-USD Price, Length: 2385, dtype: float64

In [108]:
# Check if a given number exists in a Pandas Series values
# Returns a boolean "True" or "False"
1295.5 in  df1.values

False

In [112]:
# Check if a given number exists in a Pandas Series index
1295 in df1.index

True

In [113]:
# Note that by default 'in' will search in Pandas index and not values
1 in df1

True

**MINI CHALLENGE #10:**
- **Check if the stock price 399 exists in the BTC_price_series Pandas Series or not**
- **Round stock prices to the nearest integer and check again**

In [114]:
 399 in df1.values

False

In [116]:
df2 = round(df1)
df2

0         457.0
1         424.0
2         395.0
3         409.0
4         399.0
         ...   
2380    55951.0
2381    57750.0
2382    58918.0
2383    58919.0
2384    59096.0
Name: BTC-USD Price, Length: 2385, dtype: float64

In [118]:
399 in df2.values

True