# Task 1: Instructions

Load in and inspect the usernames and passwords of the fictional users.

- Load the `pandas` module.
- Load the user data from the file contained in the path `datasets/users.csv` and store it as a DataFrame called `users`.
- Print the number of rows (i.e. users).
- Show the first 12 rows in `users`.

## Good to know

To complete this project, you need to know how to manipulate strings in `pandas` DataFrames and be familiar with regular expressions. Before starting this project we recommend that you have completed the following courses:

- [Data Cleaning in Python](https://www.datacamp.com/courses/data-cleaning-in-python).
- [Regular Expressions in Python](https://www.datacamp.com/courses/regular-expressions-in-python).

An excellent companion while doing this project is the _Working with text data_ page from the `pandas` documentation which you can find [here](https://pandas.pydata.org/pandas-docs/stable/text.html). At the bottom of that page you'll also find a list of all the string functions `pandas` supports.

# Task 2: Instructions

Flag the passwords that are too short.

- Add the column `length` to `users` which should list the number of characters in each `password`.
- Flag the users with too short passwords by adding the column `users['too_short']` which should be `True` when `users['length']` is less than 8.
- Print the count of the number of users with passwords that are too short.
- Show the first 12 rows in `users`.

To solve this task, you need to be able to figure out the number of characters in each password. Check the [`pandas` string documentation](https://pandas.pydata.org/pandas-docs/stable/text.html) for a method that does just that.

# Task 3: Instructions

Load in the list with the 10,000 most common passwords.

- Read in `datasets/10_million_password_list_top_10000.txt` as a Series and put it in the variable `common_passwords`.
- Take a look at the top 20 common passwords.

_Note: The passwords are stored in a plain text file with one password per row. To read this in as a Series you can use the_ `read_csv` _function from_ `pandas` _but to make it return a Series (rather than a DataFrame) you have to set the arguments_ `header=None` _and_ `squeeze=True`._

# Task 4: Instructions

Flag the `user` passwords that are among the top 10,000 used passwords.

- Flag common `user` passwords by adding the column `users['common_password']` which should be `True` when a `password` is one of the `common_passwords`.
- Count and print the number of users using common passwords.
- Show the first 12 rows in `users`.

# Task 5: Instructions

Flag the passwords that are among the 10,000 most common English words.

- Read in `datasets/google-10000-english.txt` as a Series and put it in the variable `words`.
- Flag `user` passwords that are common words by adding the column `users['common_word']` which should be `True` when a `password` is one of the `words`. The comparison should be _case-insensitive_.
- Count and print the number of users using common words as passwords.
- Show the first 12 rows in `users`.

# Task 6: Instructions

Flag passwords that are the same as the users first or last name.

- Extract users first names from `users['user_name']` into the new column `users['first_name']`.
- Similarly, extract last names into the new column `users['last_name']`.
- Add the column `users['uses_name']` which should be `True` when a `password` is the same as each users' first or last name.
- Count and print the number of users using names as passwords.
- Show the first 12 rows in `users`.

- To extract the first and last names you can use the `.str.extract()` method. Check out the [`pandas` documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#extracting-substrings) under **Extracting substrings** for more info.

# Task 7: Instructions

Flag passwords that contain 4 or more repeated characters.

- Add the column `users['too_many_repeats']` which should be True when a `password` has 4 or more repeated characters.
- Take a look at only the `users` with too many repeats.

This task can be solved using the `.str.contains()` method that is described in the [`pandas` text documentation](https://pandas.pydata.org/pandas-docs/stable/text.html#testing-for-strings-that-match-or-contain-a-pattern). The regexp you need to craft here is a bit tricky, so you may need to revisit this [video lesson](https://campus.datacamp.com/courses/regular-expressions-in-python/advanced-regular-expression-concepts?ex=7) in Regular Expressions in Python.

# Task 8: Instructions

Flag _all_ the bad passwords.

- Add the column `users['bad_password']` which should be `True` when a password is bad according to `too_short`, `common_password`, `common_word`, `uses_name`, or `too_many_repeats`.
- Count and print the number of users with bad passwords.
- Show the first 25 bad passwords in `users`.

# Task 9: Instructions

- Assign a new password to `new_password` that passes the NIST rules you've implemented in this project.

## If you want to know more

- You can [read the full NIST Special Publication 800-63B online](https://pages.nist.gov/800-63-3/sp800-63b.html).
- [This blog](https://www.passwordping.com/surprising-new-password-guidelines-nist/) post also gives you a summary of 800-63B.
- [Here is an article](https://xato.net/today-i-am-releasing-ten-million-passwords-b6278bbe7495) explaining where the 10,000 common passwords you used in this project come from.
- Finally, [some advice](https://xkcd.com/936/) on how to come up with a good password.