## # Introduction
<p><img src="https://i.imgur.com/kjWF1So.jpg" alt="Different characters on a computer screen"></p>
<p>According to a 2019 <a href="https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/PasswordCheckup-HarrisPoll-InfographicFINAL.pdf">Google / Harris Poll</a>, 24% of Americans have used common passwords, like <code>abc123</code>, <code>Password</code>, and <code>Admin</code>. Even more concerning, 59% of Americans have incorporated personal information, such as their name or birthday, into their password. This makes it unsurprising that 4 in 10 Americans have had their personal information compromised online. Passwords with commonly used phrases and personal information makes cracking a password drastically easier.</p>
<p>You may have noticed over the years that password requirements have increased in complexity, including recommendations to change your passwords every couple of months. Compiled from industry recommendations, below is a list of passwords requirements you will be asked to test: </p>
<p><strong>Password Requirments:</strong></p>
<ol>
<li>Must be at least 10 characters in length</li>
<li>Must contain at least:<ul>
<li>one lower case letter </li>
<li>one upper case letter </li>
<li>one numeric character </li>
<li>one non-alphanumeric character</li></ul></li>
<li>Must not contain the phrase <code>password</code> (case insensitive)</li>
<li>Must not contain the user's first or last name, e.g., if the user's name is <code>John Smith</code>, then <code>SmItH876!</code> is not a valid password.</li>
</ol>
<p>Here is the dataset that you will investigate this project:</p>
<div style="background-color: #ebf4f7; color: #595959; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/logins.csv</b></div>
Each row represents a login credential. There are no missing values and you can consider the dataset "clean".
<ul>
    <li><b>id:</b> the user's unique ID.</li>
    <li><b>username:</b> the username with the format {firstname}.{lastname}.</li>
    <li><b>password:</b> the password that may or may not meet the requirements. <i>Note, passwords should never be saved in plaintext, always encrypt them when working with real live passwords!</i></li>
</ul>
</div>
<p>Warning: This dataset contains some <strong>real</strong> passwords leaked from <strong>real</strong> websites. These passwords have been filtered, but may still include words that are explicit and offensive.</p>
<p>From here on out, it will be your task to explore and manipulate the existing data until you can answer the two questions described in the instructions panel. Feel free to import as many packages as you need to complete your task, and add cells as necessary. Finally, remember that you are only tested on your answer, not on the methods you use to arrive at the answer!</p>
<p><strong>Note:</strong> To complete this project, you need to know how to manipulate strings in pandas DataFrames and be familiar with regular expressions. Before starting this project we recommend that you have completed the following courses: <a href="https://learn.datacamp.com/courses/data-cleaning-in-python">Data Cleaning in Python</a> and <a href="https://learn.datacamp.com/courses/regular-expressions-in-python">Regular Expressions in Python</a>.</p>

In [1]:
# Import pandas module
import pandas as pd

# Import dataset
logins = pd.read_csv('datasets/logins.csv')

# Check dataset
logins.info()
logins.head(5)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 982 entries, 0 to 981
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        982 non-null    int64 
 1   username  982 non-null    object
 2   password  982 non-null    object
dtypes: int64(1), object(2)
memory usage: 23.1+ KB


Unnamed: 0,id,username,password
0,1,vance.jennings,vanceRules888!
1,2,consuelo.eaton,Mail_Pen%Scarlets.414
2,3,mitchel.perkins,Z00+1960
3,4,odessa.vaughan,D-rockyou
4,5,araceli.wilder,Araceli}r3


In [2]:
# Check if password has 10 or more characters
length_check = logins['password'].str.len() >= 10

valid_pw = logins[length_check]
bad_pw = logins[~length_check]

In [3]:
# Check if password has lower case characters
lower_case = valid_pw['password'].str.contains('[a-z]')

In [4]:
# Check if password has upper case characters
upper_case = valid_pw['password'].str.contains('[A-Z]')

In [5]:
# Check if password has numeric characters
numeric = valid_pw['password'].str.contains('\d')

In [6]:
# Check if password has non-alphanumeric characters
non_alphanumeric = valid_pw['password'].str.contains('\W')

In [7]:
pd.concat([valid_pw, lower_case, upper_case, numeric, non_alphanumeric], axis=1)

Unnamed: 0,id,username,password,password.1,password.2,password.3,password.4
0,1,vance.jennings,vanceRules888!,True,True,True,True
1,2,consuelo.eaton,Mail_Pen%Scarlets.414,True,True,True,True
4,5,araceli.wilder,Araceli}r3,True,True,True,True
5,6,shawn.harrington,126_239_123,False,False,True,False
6,7,evelyn.gay,`4:&iAt$'o~(,True,True,True,True
...,...,...,...,...,...,...,...
975,976,freeman.rose,cHw2Leth5JXY,True,True,True,False
976,977,monica.flores,*;~a8dq5%s',True,False,True,True
978,979,miriam.haynes,Gizzard.Muse+Patters_857,True,True,True,True
979,980,genaro.russo,Rm3OwUfobjYxq,True,True,True,False


In [8]:
# Flagging all bad passwords and valid passwords
char_check = lower_case & upper_case & numeric & non_alphanumeric

bad_pw = bad_pw.append(valid_pw[~char_check], ignore_index=True)
valid_pw = valid_pw[char_check]

In [9]:
bad_pw.info()
display(bad_pw.head())

valid_pw.info()
display(valid_pw.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 724 entries, 0 to 723
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        724 non-null    int64 
 1   username  724 non-null    object
 2   password  724 non-null    object
dtypes: int64(1), object(2)
memory usage: 17.1+ KB


Unnamed: 0,id,username,password
0,3,mitchel.perkins,Z00+1960
1,4,odessa.vaughan,D-rockyou
2,10,brant.zimmerman,L?4)OSB$r
3,17,domingo.dyer,VeOw{*p
4,18,martin.pacheco,MP1985???


<class 'pandas.core.frame.DataFrame'>
Int64Index: 258 entries, 0 to 978
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        258 non-null    int64 
 1   username  258 non-null    object
 2   password  258 non-null    object
dtypes: int64(1), object(2)
memory usage: 8.1+ KB


Unnamed: 0,id,username,password
0,1,vance.jennings,vanceRules888!
1,2,consuelo.eaton,Mail_Pen%Scarlets.414
4,5,araceli.wilder,Araceli}r3
6,7,evelyn.gay,`4:&iAt$'o~(
8,9,gladys.ward,=Wj1`i)xYYZ


In [10]:
# Check if password has 'password' word
password_word = valid_pw['password'].str.contains('password', case=False)

bad_pw = bad_pw.append(valid_pw[password_word], ignore_index=True)
valid_pw = valid_pw[~password_word]

In [11]:
bad_pw.info()
display(bad_pw.head())

valid_pw.info()
display(valid_pw.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 725 entries, 0 to 724
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        725 non-null    int64 
 1   username  725 non-null    object
 2   password  725 non-null    object
dtypes: int64(1), object(2)
memory usage: 17.1+ KB


Unnamed: 0,id,username,password
0,3,mitchel.perkins,Z00+1960
1,4,odessa.vaughan,D-rockyou
2,10,brant.zimmerman,L?4)OSB$r
3,17,domingo.dyer,VeOw{*p
4,18,martin.pacheco,MP1985???


<class 'pandas.core.frame.DataFrame'>
Int64Index: 257 entries, 0 to 978
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   id        257 non-null    int64 
 1   username  257 non-null    object
 2   password  257 non-null    object
dtypes: int64(1), object(2)
memory usage: 8.0+ KB


Unnamed: 0,id,username,password
0,1,vance.jennings,vanceRules888!
1,2,consuelo.eaton,Mail_Pen%Scarlets.414
4,5,araceli.wilder,Araceli}r3
6,7,evelyn.gay,`4:&iAt$'o~(
8,9,gladys.ward,=Wj1`i)xYYZ


In [12]:
# Extract user's first and last name
valid_pw['first_name'] = valid_pw['username'].str.extract(r'(^[a-z]+)', expand=False)
valid_pw['last_name'] = valid_pw['username'].str.extract(r'([a-z]+$)')

valid_pw

Unnamed: 0,id,username,password,first_name,last_name
0,1,vance.jennings,vanceRules888!,vance,jennings
1,2,consuelo.eaton,Mail_Pen%Scarlets.414,consuelo,eaton
4,5,araceli.wilder,Araceli}r3,araceli,wilder
6,7,evelyn.gay,`4:&iAt$'o~(,evelyn,gay
8,9,gladys.ward,=Wj1`i)xYYZ,gladys,ward
...,...,...,...,...,...
966,967,taylor.kent,">L0/d""8=omzy",taylor,kent
970,971,noel.montoya,Riskier:Spikes_Grasped=27,noel,montoya
971,972,josef.hoffman,Unhidden-Flatus*753-Figurer,josef,hoffman
972,973,jorge.patrick,Freedom_85!,jorge,patrick


In [13]:
# Check if password contains user's first or last name
for i, row in valid_pw.iterrows():
    if row['first_name'] in row['password'].lower() or row['last_name'] in row['password'].lower():
        print(row)
        valid_pw = valid_pw.drop(index=i)
        bad_pw = bad_pw.append(row, ignore_index=True)

id                         1
username      vance.jennings
password      vanceRules888!
first_name             vance
last_name           jennings
Name: 0, dtype: object
id                         5
username      araceli.wilder
password          Araceli}r3
first_name           araceli
last_name             wilder
Name: 4, dtype: object
id                         12
username      milford.hubbard
password         Milford<3Tom
first_name            milford
last_name             hubbard
Name: 11, dtype: object
id                        141
username        ronald.brooks
password      P1G_bT”_zBrooks
first_name             ronald
last_name              brooks
Name: 140, dtype: object
id                       150
username      raymundo.haley
password      HaleyComet333$
first_name          raymundo
last_name              haley
Name: 149, dtype: object
id                      668
username      simon.miranda
password         SimonR0ck$
first_name            simon
last_name           miranda
Name:

In [14]:
# Percentage of users who have invalid passwords
bad_pass = round(bad_pw.shape[0] / logins.shape[0], 2)
bad_pass

0.75

In [15]:
# List of users who need to change their passwords
email_list = bad_pw['username'].sort_values()
email_list

405       abdul.rowland
309        addie.cherry
372        adele.moreno
517        adeline.bush
279         adolfo.kane
             ...       
373    yvette.whitfield
232        yvonne.munoz
264        zachary.huff
172        zelma.abbott
49        zelma.rosario
Name: username, Length: 736, dtype: object