Create a Classification model to predict default_payment_next_month using draft method Supervised Learning.
Open Google Cloud Platform, log in to BigQuery, then open the bigquery-public-data.
Pay attention to the instructions for using the dataset!
- Use the ml_datasets dataset from the database named credit_card_default.
- Select ONLY the column limit_balance, sex, education_level, marital_status, age, pay_0, pay_2, pay_3, pay_4, pay_5, pay_6, bill_amt_1, bill_amt_2, bill_amt_3, bill_amt_4, bill_amt_5, bill_amt_6, pay_amt_3, pay_amt_2, pay_amt_4,_ , default_payment_next_month.
Supervised learning is an approach to creating artificial intelligence (AI), where a computer algorithm is trained on input data that has been labeled for a particular output. The model is trained until it can detect the underlying patterns and relationships between the input data and the output labels, enabling it to yield accurate labeling results when presented with never-before-seen data.
Supervised learning is good at classification and regression problems, such as determining what category a news article belongs to or predicting the volume of sales for a given future date. In supervised learning, the aim is to make sense of data within the context of a specific question.
Default is the failure to make required interest or principal repayments on a debt, whether that debt is a loan or a security. Individuals, businesses, and even countries can default on their debt obligations
Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples.
Then the supervised learning analysis is carried out with several algorithms as follows:
- Preprocessing: Able to do dataset preprocessing before doing the modeling process (split data, normalization, encoding, etc.)
- Logistic Regression: Implement Logistic Regression and determine the right hyperparameters with Scikit-Learn
- SVM: Implements Logistic Regression and determines the right hyperparameters with Scikit-Learn
- Decision Tree: Implement Decision Tree and determine the right hyperparameters with Scikit-Learn
- Random Forest : Implement Decision Tree and determine the right hyperparameter with Scikit-Learn
- K-Nearest Neighbor : Implements KNN and determines the right hyperparameter with Scikit-Learn
- Naive Bayes : Implement Naive Bayes and define the right hyperparameters with Scikit-Learn
- AdaBoosting : Implement AdaBoosting and define the right hyperparameters with Scikit-Learn
- Cross Validation: Implementing Cross Validation with Scikit-Learn
- Grid Search : Implementing Grid Search with Scikit-Learn
- Model Inference: Trying out a model that has been created with new data
- https://www.techtarget.com/searchenterpriseai/definition/supervised-learning
- https://www.kaggle.com/code/gpreda/credit-card-fraud-detection-predictive-models
- https://www.kaggle.com/datasets/kartik2112/fraud-detection
- https://www.investopedia.com/terms/d/default2.asp#:~:text=Key%20Takeaways,their%20future%20access%20to%20credit.