---
# Indexes
How to use, set, and reset index.

---

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Function for printing a horizontal line. For display purpose
def printhr(s: str = None, n: int = 40):
    """Print a horizontal rule of the character "=" of length n.

    Args:
        s (str, optional): Header message. Defaults to None.
        n (int, optional): Number of characters. Defaults to 50.
    """

    if s:
        print("=" * int(n / 2), s, "=" * int(n / 2))
    else:
        print("=" * n)

In [3]:
people = {
    "first": ["Lorem", "Foo", "Cat"],
    "last": ["Ipsum", "Bar", "Dog"],
    "email": ["loripsum@a.a", "foobar@a.a", "catdog@a.a"],
}

df = pd.DataFrame(people)
df

Unnamed: 0,first,last,email
0,Lorem,Ipsum,loripsum@a.a
1,Foo,Bar,foobar@a.a
2,Cat,Dog,catdog@a.a


---
### Frequent Reminder

It is good practice to see your changes first by not setting inplace to True.  
If changes are confirmed, you can set inplace to True.

---

---
## Setting Index

---

In [4]:
# Set the column "email" to be the index. <inplace> parameter means to
# save the changes made to DataFrame since pandas normally just returns 
# a df instead of changing the original df. 
df.set_index("email", inplace=True)
df

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
loripsum@a.a,Lorem,Ipsum
foobar@a.a,Foo,Bar
catdog@a.a,Cat,Dog


In [5]:
# Now we can use loc to filter out through the email indices.
df.loc["loripsum@a.a":"foobar@a.a"]

Unnamed: 0_level_0,first,last
email,Unnamed: 1_level_1,Unnamed: 2_level_1
loripsum@a.a,Lorem,Ipsum
foobar@a.a,Foo,Bar


---
## Resetting Index

`.reset_index()` will reset the index column to the default integer indexes. Resetting index will place the current column index as the first column.

---

In [6]:
# Reset 
df.reset_index(inplace=True)
df

Unnamed: 0,email,first,last
0,loripsum@a.a,Lorem,Ipsum
1,foobar@a.a,Foo,Bar
2,catdog@a.a,Cat,Dog


---
## Using Index

We can access indexes by the previously discussed `.loc()` and `.iloc()` methods.

---

In [7]:
# Change the email of "Lorem Ipsum"
display(df)
printhr()

df.loc[0, "email"] = "new@email.com"
display(df)


Unnamed: 0,email,first,last
0,loripsum@a.a,Lorem,Ipsum
1,foobar@a.a,Foo,Bar
2,catdog@a.a,Cat,Dog




Unnamed: 0,email,first,last
0,new@email.com,Lorem,Ipsum
1,foobar@a.a,Foo,Bar
2,catdog@a.a,Cat,Dog


---
## Example from stackoverflow Data Set
---

In [8]:
# Load csv files as df and set indices to corresponding columns.
df = pd.read_csv("data/survey_results_public_2022.csv", index_col="ResponseId")
schema_df = pd.read_csv("data/survey_results_schema.csv", index_col="qname")

In [9]:
# Configure display options
pd.set_option("display.max_rows", 80)
pd.set_option("display.max_columns", 80)

In [10]:
# return top 5 items from the top of df
df.head()

Unnamed: 0_level_0,MainBranch,Employment,RemoteWork,CodingActivities,EdLevel,LearnCode,LearnCodeOnline,LearnCodeCoursesCert,YearsCode,YearsCodePro,DevType,OrgSize,PurchaseInfluence,BuyNewTool,Country,Currency,CompTotal,CompFreq,LanguageHaveWorkedWith,LanguageWantToWorkWith,DatabaseHaveWorkedWith,DatabaseWantToWorkWith,PlatformHaveWorkedWith,PlatformWantToWorkWith,WebframeHaveWorkedWith,WebframeWantToWorkWith,MiscTechHaveWorkedWith,MiscTechWantToWorkWith,ToolsTechHaveWorkedWith,ToolsTechWantToWorkWith,NEWCollabToolsHaveWorkedWith,NEWCollabToolsWantToWorkWith,OpSysProfessional use,OpSysPersonal use,VersionControlSystem,VCInteraction,VCHostingPersonal use,VCHostingProfessional use,OfficeStackAsyncHaveWorkedWith,OfficeStackAsyncWantToWorkWith,OfficeStackSyncHaveWorkedWith,OfficeStackSyncWantToWorkWith,Blockchain,NEWSOSites,SOVisitFreq,SOAccount,SOPartFreq,SOComm,Age,Gender,Trans,Sexuality,Ethnicity,Accessibility,MentalHealth,TBranch,ICorPM,WorkExp,Knowledge_1,Knowledge_2,Knowledge_3,Knowledge_4,Knowledge_5,Knowledge_6,Knowledge_7,Frequency_1,Frequency_2,Frequency_3,TimeSearching,TimeAnswering,Onboarding,ProfessionalTech,TrueFalse_1,TrueFalse_2,TrueFalse_3,SurveyLength,SurveyEase,ConvertedCompYearly
ResponseId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1
1,None of these,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,I am a developer by profession,"Employed, full-time",Fully remote,Hobby;Contribute to open-source projects,,,,,,,,,,,Canada,CAD\tCanadian dollar,,,JavaScript;TypeScript,Rust;TypeScript,,,,,,,,,,,,,macOS,Windows Subsystem for Linux (WSL),Git,,,,,,,,Very unfavorable,Collectives on Stack Overflow;Stack Overflow f...,Daily or almost daily,Yes,Daily or almost daily,Not sure,,,,,,,,No,,,,,,,,,,,,,,,,,,,,Too long,Difficult,
3,"I am not primarily a developer, but I write co...","Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Master’s degree (M.A., M.S., M.Eng., MBA, etc.)",Books / Physical media;Friend or family member...,Technical documentation;Blogs;Programming Game...,,14.0,5.0,Data scientist or machine learning specialist;...,20 to 99 employees,I have some influence,,United Kingdom of Great Britain and Northern I...,GBP\tPound sterling,32000.0,Yearly,C#;C++;HTML/CSS;JavaScript;Python,C#;C++;HTML/CSS;JavaScript;TypeScript,Microsoft SQL Server,Microsoft SQL Server,,,Angular.js,Angular;Angular.js,Pandas,.NET,,,Notepad++;Visual Studio,Notepad++;Visual Studio,Windows,Windows,Git,Code editor,,,,,Microsoft Teams,Microsoft Teams,Very unfavorable,Collectives on Stack Overflow;Stack Overflow;S...,Multiple times per day,Yes,Multiple times per day,Neutral,25-34 years old,Man,No,Bisexual,White,None of the above,"I have a mood or emotional disorder (e.g., dep...",No,,,,,,,,,,,,,,,,,,,,Appropriate in length,Neither easy nor difficult,40205.0
4,I am a developer by profession,"Employed, full-time",Fully remote,I don’t code outside of work,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Books / Physical media;School (i.e., Universit...",,,20.0,17.0,"Developer, full-stack",100 to 499 employees,I have some influence,Other (please specify):,Israel,ILS\tIsraeli new shekel,60000.0,Monthly,C#;JavaScript;SQL;TypeScript,C#;SQL;TypeScript,Microsoft SQL Server,Microsoft SQL Server,,,ASP.NET;ASP.NET Core,ASP.NET;ASP.NET Core,.NET,.NET,,,Notepad++;Visual Studio;Visual Studio Code,Notepad++;Visual Studio;Visual Studio Code,Windows,Windows,Git,Code editor;Command-line;Version control hosti...,,,Jira Work Management;Trello,Jira Work Management;Trello,Slack;Zoom,Slack;Zoom,Very unfavorable,Collectives on Stack Overflow;Stack Overflow f...,Daily or almost daily,Yes,A few times per week,"Yes, definitely",35-44 years old,Man,No,Straight / Heterosexual,White,None of the above,None of the above,No,,,,,,,,,,,,,,,,,,,,Appropriate in length,Easy,215232.0
5,I am a developer by profession,"Employed, full-time","Hybrid (some remote, some in-person)",Hobby,"Bachelor’s degree (B.A., B.S., B.Eng., etc.)","Other online resources (e.g., videos, blogs, f...",Technical documentation;Blogs;Stack Overflow;O...,,8.0,3.0,"Developer, front-end;Developer, full-stack;Dev...",20 to 99 employees,I have some influence,Start a free trial;Visit developer communities...,United States of America,USD\tUnited States dollar,,,C#;HTML/CSS;JavaScript;SQL;Swift;TypeScript,C#;Elixir;F#;Go;JavaScript;Rust;TypeScript,Cloud Firestore;Elasticsearch;Microsoft SQL Se...,Cloud Firestore;Elasticsearch;Firebase Realtim...,Firebase;Microsoft Azure,Firebase;Microsoft Azure,Angular;ASP.NET;ASP.NET Core ;jQuery;Node.js,Angular;ASP.NET Core ;Blazor;Node.js,.NET,.NET;Apache Kafka,npm,Docker;Kubernetes,Notepad++;Visual Studio;Visual Studio Code;Xcode,Rider;Visual Studio;Visual Studio Code,Windows,macOS;Windows,Git;Other (please specify):,Code editor,,,,,Microsoft Teams;Zoom,,Unfavorable,Collectives on Stack Overflow;Stack Overflow f...,Multiple times per day,Yes,Daily or almost daily,"Yes, definitely",25-34 years old,,,,,,,No,,,,,,,,,,,,,,,,,,,,Too long,Easy,


In [11]:
# Select a row and see its record
df.loc[38]

MainBranch                                           I am a developer by profession
Employment                        Independent contractor, freelancer, or self-em...
RemoteWork                                                             Fully remote
CodingActivities                                       I don’t code outside of work
EdLevel                           Secondary school (e.g. American high school, G...
LearnCode                         Books / Physical media;Friend or family member...
LearnCodeOnline                   Technical documentation;Blogs;Written Tutorial...
LearnCodeCoursesCert                                                            NaN
YearsCode                                                                        19
YearsCodePro                                                                     14
DevType                           Developer, front-end;Developer, full-stack;Dev...
OrgSize                           Just me - I am a freelancer, sole propriet

In [12]:
# You may notice that some columns are not descriptive enough.
# To check for what a field means, we can
# use the schema provided along with the survey.

schema_df.loc["PurchaseInfluence", "question"]

'What level of influence do you, personally, have over new technology purchases at your organization?'

In [13]:
# Sort label indexes
schema_df.sort_index()

Unnamed: 0_level_0,qid,question,force_resp,type,selector
qname,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Accessibility,QID124,"Which of the following describe you, if any? P...",False,MC,MAVR
Age,QID127,What is your age?,False,MC,MAVR
Blockchain,QID305,"How favorable are you about blockchain, crypto...",False,MC,SAVR
BuyNewTool,QID279,"When buying a new tool or software, how do you...",False,MC,MAVR
CodingActivities,QID297,Which of the following best describes the code...,False,MC,MAVR
CompFreq,QID52,"Is that compensation weekly, monthly, or yearly?",False,MC,MAVR
CompTotal,QID51,What is your current total compensation (salar...,False,TE,SL
Country,QID6,"Where do you live? <span style=""font-weight: b...",True,MC,DL
Currency,QID50,Which currency do you use day-to-day? If your ...,True,MC,DL
Database,QID262,Which <b>database environments </b>have you do...,False,Matrix,Likert
