Skip to content

Latest commit

 

History

History
49 lines (34 loc) · 1.07 KB

File metadata and controls

49 lines (34 loc) · 1.07 KB

Building a Data Analysis pipeline tutorial

adapted from Software Carpentry

This example data analysis project analyzes the word count for all words in 4 novels. It reports the top 10 most occurring words in each book in a report.

Current usage:

Set-up (first time only)

  1. Clone this repo, and using the command line, navigate to the root of this project.

  2. Run the following commands to create the conda environment:

conda env create -f environment.yml 

Run the analysis

Activate the conda environment:

conda activate data-analysis-practice

Run the analysis:

bash runall.sh

Exercise:

Your task is to add a "smarter" data analysis pipeline using GNU Make! It should accomplish the same task as bash runall.sh when you type make all.

It should reset the analysis the starting point (the state when you first copied this repo) when you type make clean.

Depenedencies

  • GNU Make
  • Quarto
  • Python & Python libraries:
    • click
    • matplotlib
    • pandas