Skip to content

vbfall/data_scientist_exercise

Repository files navigation

Scripting Exercices

Below is a data mining and scripting exercise. Note that we will use it to evaluate :

  1. problem solving skills.
  2. machine learning skills; and,
  3. programming skills;

To Do :

  1. Follow the instructions below while maintaining a presentable (clean) script. Ideally we ask that you make your script available on your own github.com account (Use the free version here : https://github.com/).
  2. Send us the link to your final commit before 9am the day of your interview
  3. During the interview, we will ask that you present your work (preprocessing, model training, performance assessment, results & discussion). We encourage you to present the results using either a notebook or a README file. At the very least, you should ensure that your results are presentable.

Remember :

  1. Make sure to apply best practices as you move through the examples. (data preprocessing, missing values, hyper parameter search, model evaluation, result visualisation, etc.)
  2. Make assumptions where necessary, we are interested in your approach primarily.
  3. A good story is as important as an algorithm. We expect you to be able to communicate and present your ideas, methodology and implementations.

Good Luck!



Exercise 1 : Fraudulent Transactions (Classification)

The file fraud_prep.csv contains credit card transactions.

  1. Evaluate multiple classification algorithms to identify whether the transactions are fraudulent or not.
  2. Compare the performance of each model & identify the best performing one.
  3. Present how your model generalizes and performs on unseen data.
  4. Make sure to present all steps taken

BONUS Points : Can you think of some unsupervised methods to accomplish this same task? If so, describe them (do not script them)



Exercise 2. Crime Dataset (Regression)

The Crime Dataset contains 128 socio-economic features from the US 1990 Census. The target is the crime rate per community.

Ref. : https://archive.ics.uci.edu/ml/machine-learning-databases/communities/communities.names

Using the crime_prep.csv file :

  1. Identify the variables that are the most highly correlated with the target
  2. Apply either dimensionality reduction or feature selection on the dataset
  3. Evaluate multiple regression algorithms to predict the crime rate.
  4. Compare the performance of each model & identify the best performing one.
  5. Present how your model generalizes and performs on unseen data.



About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors