# Password Strength Checker with Machine Learning

## Introduction

Creating strong passwords is essential for online security. In this project, we will build a password strength checker using machine learning. By training a model on a labeled dataset of passwords, the model will learn to classify passwords as strong or weak based on factors like character combinations and length. This application will help evaluate the strength of any given password.


+ 0 means: the password’s strength is weak;
+ 1 means: the password’s strength is medium;
+ 2 means: the password’s strength is strong;

In [3]:
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


In [4]:
df = pd.read_csv("data.csv", on_bad_lines='skip')

In [5]:
df.head()

Unnamed: 0,password,strength
0,kzde5577,1
1,kino3434,1
2,visi7k1yr,1
3,megzy123,1
4,lamborghin1,1


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 669640 entries, 0 to 669639
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype 
---  ------    --------------   ----- 
 0   password  669639 non-null  object
 1   strength  669640 non-null  int64 
dtypes: int64(1), object(1)
memory usage: 10.2+ MB


In [7]:
df = df.dropna()

In [8]:
df["strength"] = df["strength"].map({0: "Weak", 1: "Medium",2: "Strong"})

In [9]:
def word(password):
    character=[]
    for i in password:
        character.append(i)
    return character
  

In [10]:
x = np.array(df["password"])
y = np.array(df["strength"])


In [11]:
tdif = TfidfVectorizer(tokenizer=word)
x = tdif.fit_transform(x)



In [12]:
x_train, x_test, y_train, y_test = train_test_split(x, y, 
                                                test_size=.08, 
                                                random_state=42)

In [13]:
 # DecisionTreeClassifier

In [14]:
d= DecisionTreeClassifier()

In [15]:
d.fit(x_train, y_train)

In [16]:
dpred= d.predict(x_test)

In [17]:
accuracy_score(dpred, y_test)

0.9305607406854327

## Conclusion

Using a Decision Tree Classifier and **TfidfVectorizer** to transform passwords into meaningful features, I achieved an accuracy of 93% in predicting password strength. The **TfidfVectorizer** effectively captured key patterns in passwords, allowing the model to accurately classify their strength.
