In [313]:
import pandas as pd

people = { # <- Each of the dictionary represents a dataFrame
    "firstName": ["Corey", "Pritam", "Jane", "John"], # <- each of the key represents a column
    "lastName": ["Chaufer", "Kundu", "Doe", "Doe"], # each of the values represent a row
    "email": ["CoreyMSchaufer@gmail.com", "pritamkundu771@gmail.com", "JaneDoe@email.com", "JohnDoe@email.com"]
}
df = pd.DataFrame(people)



Change default index of the dataFrame to a different column 



In [314]:
df.set_index('firstName')

Unnamed: 0_level_0,lastName,email
firstName,Unnamed: 1_level_1,Unnamed: 2_level_1
Corey,Chaufer,CoreyMSchaufer@gmail.com
Pritam,Kundu,pritamkundu771@gmail.com
Jane,Doe,JaneDoe@email.com
John,Doe,JohnDoe@email.com


❗Now, if we display the dataFrame again, we notice that the dataFrame has not changed as expected.  
Q. Why is this behaviour shown? 🤔  
  - Its because Pandas doesn't do all these changes inPlace  
  - just to allow us experiment with the data without changing or modifying our data in unexpected ways.   
  - But just in case we do want to set our index to some column, we might mention `inplace=True`

In [315]:
store = pd.DataFrame(df) # Copying the dataFrame into a temporary variable store to avoid editing the real dataFrame
store.set_index('firstName', inplace=True)  # Now, performing the queries on the new dataFrame
store
store.index # checking

Index(['Corey', 'Pritam', 'Jane', 'John'], dtype='object', name='firstName')

Q. Why would someone find it useful to change a pre-existing index to a different index?
- These indexes are actually nice custom keys for the rows
- When someone uses `df.loc[<label-name>]` to search the dataframe by label, 
- The indexes now acts as a primary key or unique identifier for the row instead of integers.

Example: Suppose an office needs to run down annual Report on employees performance. To fetch an employee detail they might require their Custom EmployeeID to be the Primary key instead of the default index which Pandas provide

In [316]:
try:
    store.loc[0]  # ❌ default integer is now not used -> Would raise Key Error
except KeyError:
    print(store.iloc[0], '\n') # using iloc to access the 0th row
finally:
    print(store.loc["Pritam"]) # using loc to access the row with custom index "Pritam"


lastName                     Chaufer
email       CoreyMSchaufer@gmail.com
Name: Corey, dtype: object 

lastName                       Kundu
email       pritamkundu771@gmail.com
Name: Pritam, dtype: object


⚠️ On accidental changes in index, we can restore the previous index by using `reset_index()` method.


⚠️ TIP: A Rookie mistake done by new dataScience engineers is that compare 2 dataFrames by `==, >, <. <=, >=` operator which is ambiguous.  
- Since the operator has been overloaded in the DataFrame and Series classes to perform some other functionalities 
- Always use methods like `df.equals` method to check for equality and avoid `ValueErrors`.  
[SO](https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o)  
[Official Docs](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.equals.html)  

In [317]:
store.reset_index(inplace=True)
assert store.equals(df) 


Assigning a custom index just at the time of reading Data

In [318]:
schema_df = pd.read_csv(r"data/survey_results_schema.csv", index_col="qname")
df = pd.read_csv(r"data/survey_results_public.csv", index_col="ResponseId")
df[:3 ]

Unnamed: 0_level_0,MainBranch,Employment,Country,US_State,UK_Country,EdLevel,Age1stCode,LearnCode,YearsCode,YearsCodePro,...,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,I am a developer by profession,"Independent contractor, freelancer, or self-em...",Slovakia,,,"Secondary school (e.g. American high school, G...",18 - 24 years,Coding Bootcamp;Other online resources (ex: vi...,,,...,25-34 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,62268.0
2,I am a student who is learning to code,"Student, full-time",Netherlands,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",7.0,,...,18-24 years old,Man,No,Straight / Heterosexual,White or of European descent,None of the above,None of the above,Appropriate in length,Easy,
3,"I am not primarily a developer, but I write co...","Student, full-time",Russian Federation,,,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)",11 - 17 years,"Other online resources (ex: videos, blogs, etc...",,,...,18-24 years old,Man,No,Prefer not to say,Prefer not to say,None of the above,None of the above,Appropriate in length,Easy,


In [319]:
schema_df

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
S0,QID16,"<div><span style=""font-size:19px;""><strong>Hel...",False,DB,TB
MetaInfo,QID12,Browser Meta Info,False,Meta,Browser
S1,QID1,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
MainBranch,QID2,Which of the following options best describes ...,True,MC,SAVR
Employment,QID24,Which of the following best describes your cur...,False,MC,MAVR
Country,QID6,"Where do you live? <span style=""font-weight: b...",True,MC,DL
US_State,QID7,<p>In which state or territory of the USA do y...,False,MC,DL
UK_Country,QID9,In which part of the United Kingdom do you liv...,False,MC,DL
S2,QID190,"<span style=""font-size:22px; font-family: aria...",False,DB,TB
EdLevel,QID25,Which of the following best describes the high...,False,MC,SAVR


In [320]:
schema_df.loc[:, 'question']

qname
S0                    <div><span style="font-size:19px;"><strong>Hel...
MetaInfo                                              Browser Meta Info
S1                    <span style="font-size:22px; font-family: aria...
MainBranch            Which of the following options best describes ...
Employment            Which of the following best describes your cur...
Country               Where do you live? <span style="font-weight: b...
US_State              <p>In which state or territory of the USA do y...
UK_Country            In which part of the United Kingdom do you liv...
S2                    <span style="font-size:22px; font-family: aria...
EdLevel               Which of the following best describes the high...
Age1stCode            At what age did you write your first line of c...
LearnCode             How did you learn to code? Select all that apply.
YearsCode             Including any education, how many years have y...
YearsCodePro          NOT including education, how many ye

How to sort the index alphabetically?

In [321]:
schema_df.sort_index(ascending=True) # To sort in reverse order: ascending=False

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Accessibility,QID124,"Which of the following describe you, if any? P...",False,MC,MAVR
Age,QID127,What is your age?,False,MC,MAVR
Age1stCode,QID149,At what age did you write your first line of c...,False,MC,MAVR
CompFreq,QID52,"Is that compensation weekly, monthly, or yearly?",False,MC,MAVR
CompTotal,QID51,What is your current total compensation (salar...,False,TE,SL
Country,QID6,"Where do you live? <span style=""font-weight: b...",True,MC,DL
Currency,QID50,Which currency do you use day-to-day? If your ...,True,MC,SB
Database,QID262,Which <b>database environments </b>have you do...,False,Matrix,Likert
DevType,QID31,Which of the following describes your current ...,False,MC,MAVR
EdLevel,QID25,Which of the following best describes the high...,False,MC,SAVR
