Scripting Exercices

Below is a data mining and scripting exercise. Note that we will use it to evaluate :

problem solving skills.
machine learning skills; and,
programming skills;

To Do :

Follow the instructions below while maintaining a presentable (clean) script. Ideally we ask that you make your script available on your own github.com account (Use the free version here : https://github.com/).
Send us the link to your final commit before 9am the day of your interview
During the interview, we will ask that you present your work (preprocessing, model training, performance assessment, results & discussion). We encourage you to present the results using either a notebook or a README file. At the very least, you should ensure that your results are presentable.

Remember :

Make sure to apply best practices as you move through the examples. (data preprocessing, missing values, hyper parameter search, model evaluation, result visualisation, etc.)
Make assumptions where necessary, we are interested in your approach primarily.
A good story is as important as an algorithm. We expect you to be able to communicate and present your ideas, methodology and implementations.

Good Luck!

Exercise 1 : Fraudulent Transactions (Classification)

The file fraud_prep.csv contains credit card transactions.

Evaluate multiple classification algorithms to identify whether the transactions are fraudulent or not.
Compare the performance of each model & identify the best performing one.
Present how your model generalizes and performs on unseen data.
Make sure to present all steps taken

BONUS Points : Can you think of some unsupervised methods to accomplish this same task? If so, describe them (do not script them)

Exercise 2. Crime Dataset (Regression)

The Crime Dataset contains 128 socio-economic features from the US 1990 Census. The target is the crime rate per community.

Ref. : https://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.names

Using the crime_prep.csv file :

Identify the variables that are the most highly correlated with the target
Apply either dimensionality reduction or feature selection on the dataset
Evaluate multiple regression algorithms to predict the crime rate.
Compare the performance of each model & identify the best performing one.
Present how your model generalizes and performs on unseen data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Crime prediction.ipynb		Crime prediction.ipynb
Fraud classification.ipynb		Fraud classification.ipynb
README.md		README.md
crime_prep.csv.gz		crime_prep.csv.gz
fraud_prep.csv.gz		fraud_prep.csv.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scripting Exercices

Exercise 1 : Fraudulent Transactions (Classification)

Exercise 2. Crime Dataset (Regression)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scripting Exercices

Exercise 1 : Fraudulent Transactions (Classification)

Exercise 2. Crime Dataset (Regression)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages