# Course Curriculum
- Data and data science
- Introduction to Numpy
- Data analysis with Pandas
- Data visualization with Matplotlib and Seaborn

# Data and data science

Data is present everywhere and is collected every day. We make calls and send messages on the phone every minute. We tweet messages on Twitter, post pictures and videos on Instagram, countries like Kenya, Somalia, South Sudan, Uganda, and Ethiopia count their citizens and foreigners at a well-defined point of time, hospitals take clinical records of patients, etc.

With this available huge amounts of data, organizations are focusing more and more on using the insights from data to evaluate progress, build solutions and make an informed decision. The need to extract useful insight is a must for a business in today’s world.
## Definition of data

Data refers to unorganized and unprocessed facts, which does not hold complete meaning unless it is being processed to drive meaningful insights. In other words, data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things. That is, data can be words (texts, sounds, images, or numbers written on papers, stored on a computer, and infact, it could be a fact that is stored inside your mind right now.

![](images/data.jpg)

**Source**: https://www.twinkl.ae/teaching-wiki/data

The problem that we have with data, is that, data is fundamentally inert or static in nature, and has no real meaning or value until we analyze it. Data is raw material for information and the result of data processing is called information.

## Type of data

Data can be qualitative or quantitative.

### 1. Qualitative data

This refers to data that can be observed and recorded. It is non-numerical in nature. This type of data is collected through methods of observations, one-to-one interviews, survey, opinion of people on a particular topic, and similar methods. Examples of qualitative data include gender (male or female), opinion (agree, neutral, disagree), Blood type (A, B, AB, and O), state of origin (Ethiopia, Somalia, South Sudan, Uganda, Kenya, etc.), and so on.


### 2. Quatitative data

Quantitative data is the type of data that can be measured in the form of numbers or counts. For example, distance from Unilorin to TAU, number of hours to complete introduction to data science 1, length of a table, revenue realised by Somalian government, speed covered by a car, age of a student, weight of a goat, etc.

Quantitative data has quantifiable information that can be used for mathematical computations and statistical analysis which informs real-life decisions. For example, a manufacturing company in Uganda will need an answer to the question, “How much is the production cost?”. Quantitative data can be used to answer questions such as “How many?”, “How often?”, “How much?”.


Quantitative data can be divided into two types, namely; discrete and continuous data.

#### 1. Discrete data

Discrete data is a type of data that consists of counting numbers only. That is, it can be counted
and has a finite number of possible values. Examples of discrete data includes the number of students taking introduction to data science 1, the number of days in a year, number of females in South Sudan, etc. You can see that these data take on only certain numerical values. Also, if you count the number of phone calls you receive for each day of the week, you might get values such as zero (no call), one, two, or five.

When trying to identify discrete data, we ask the following questions; Can it be counted? Can it be divided into smaller parts?


#### 2. Continuous data

Continuous data is a type of data that arise as a result of measurement. It has an infinite number of possible values within a given range. Example of continuous data includes height, weight, temperature and length.

# Summary

![](images/qualitative-quantitative-data.png)


# Source of data

The following are the two sources of data:

1. Primary data
1. Secondary data


## 1.  Primary Data
These are first-hand information collected by an investigator. The data collected are pure and original and collected for a specific purpose. This type of data has never undergone any data preprocessing before.
For example population census conducted by the government of Nigeria over the last 20 years.

## 2. Secondary Data
Secondary data refers to second-hand information. They are data acquired from optional sources like magazines, books, documents, journals, reports, the web and more. That is, they are not originally collected rather obtained from already published or unpublished sources. Secondary data are impure in the sense that they have undergone data preprocessing at least once.


# Summary

![](images/Sources_of_data_collection.png)


---

For more resources in this section please consider the following:


1. <https://www.mathsisfun.com/data/data.html>

1. <https://www.twinkl.ae/teaching-wiki/data>

1. <https://www.questionpro.com/blog/quantitative-data/>

1. <https://courses.lumenlearning.com/odessa-introstats1-1/chapter/sampling-and-data/>

1. <https://www.mymarketresearchmethods.com/data-types-in-statistics/>

1. <https://studiousguy.com/sources-of-data-collection/>

1. <https://byjus.com/commerce/what-are-the-sources-of-data/>


# Introduction to Data Science

In our previous section, you learnt that data is fundamentally inert, and has no real meaning or value until we give it. In this section you will learn how to use data science to give powerful meaning to your data.

## What is data science?

Data science is a set of fundamental principles that support and guide the principled extraction of information and knowledge from data. The field of data science enables us to turn raw data into understanding, insight, and knowledge.


![](images/data_science.png)

## Who is a data scientist?

A data scientist in the other hand is a professional who deals with a massive explosion of data and uses their skills in mathematics, statistics, computing, business domain and the scientific method to give data a shape so that it can better express itself. Data scientist makes sense out of this tsunami of information, identify hidden patterns and draw conclusions and insights. In other words, data scientist focuses on analysing the past and current data, predicting the outcomes with the sole aim of making information.

Harvard Business Review (HBD) in 2012 named data scientist the sexiest job of the 21st century. Data science helps us to build a strong foundation for the data-driven world, access the power of artificial inteliigent (AI) related technology and developing an operational model to derive business insights from raw data to support decision making.

## Data scientists skills

Data scientists must acquire skills like data cleaning, data analysis, and data visualization to be able to effectively communicate information or findings to inform high-level decisions in an organization. As such, it incorporates skills from computer science, mathematics, statistics, communication and business.

## How to become a data scientist?


The following are general steps to becoming a data scientist:

1. Practise every class activity in this course

1. Be active in the group discussion

1. Question everything about data

1. Ask questions when you don't understand a concept

1. Visit other learning resources provided in this course

1. Learn from your course mate (other students doing this course with you)

1. Know how to programme (Introduction to programming language and this course will help you)

1. Build projects using the skills learn in this course.


For more additional information, check this resource [page](https://www.dataquest.io/blog/how-to-become-a-data-scientist/).


## Programing languages for data science

A programming language is a formal language comprising a set of instructions that produce various kinds of output. There are several programming languages for data science and you as a data scientist should learn and master at least one language for data science project.

![](images/programming_langs.png)


Python is one of the most widely used data science programming language in the world today. It is an open-source, easy-to-use language that has been around since the year 1991. This course will teach you how to use Python for data cleaning, analysis and visualization.


![](images/r2py.png)

For more information about R, Scala, Julia, and other programming languages please visit this [source](https://www.upgrad.com/blog/data-science-programming-languages).

## Data Science with Python - Libraries

Packages are a collection of related modules that aim to achieve a particular goal. Python standard library is a collection of packages and modules that can be used to access built-in functionality. In an ideal world, you'd import any necessary modules into your Python scripts without any issues.

Top 6 most important Python libraries and packages for Data Science includes:

1. Numpy

1. Pandas

1. Matplotlib

1. Scikit-Learn

1. TensorFlow

1. Keras

![](images/python_libraries.png)

In this course, you will learn how to use Numpy, Pandas, Matplotlib and or Seaborn for various data science activities.

# Lifecycle of Data science.

The general lifecycle of data science involves:

![](images/ds_phases.png)

Sources: Hadly Wickham, R for data science

**Import**: The first step in data science is to import data into Python, R, Spreadsheet (MS Excel), or any other data processing software. Without data, there is no data science.

**Tidy**: Another step is data cleaning or munging, and it is one of the most time-consuming tasks of a data scientist. Since much of the datasets available are not cleaned, it is, therefore, a data scientist's work to prepare the datasets in an easy to use format.

Principle of tidy data states that:

1. Every column in the data is a variable

1. Every row in the data is an observation

1. Every cell is a single value relating to a specific variable and observation

Tidying data is an aspect of data munging or wrangling.

**Transform**: The other step of data munging is transforming or creating a new variable from the existing variables

**Visualize**: Data visualization refers to the graphical representation of data by using visual elements such as charts, Infographics, and maps in understanding the data.

**Models**: Models are used to answer questions from the data. A good data scientist will implement various machine learning algorithms which require good coding and interpretation skills. Data science uses a branch of Artificial Intelligence (AI) called Machine Learning (ML) to analyze data and predict the future.

**Communicate**: The last step of data science is communication, an absolutely critical part of any data analysis project. A data scientist will need to report findings to the stakeholders, and tools like Jupyter notebook, Rmarkdown, or PowerPoint or MS-Word make this communication easier.

# Data Science Use Cases

Data science helps us achieve some major goals that either were not possible or required a great deal more time and energy just a few years ago. The following are the most relevant and efficient data science use cases:

- Forecasting (sales, revenue and customer retention)

- Pattern detection (weather patterns, financial market patterns, etc.)

- Recommendations (movies, restaurants and books you may like)

- Anomaly detection (fraud, disease, crime, etc.)

- Automation and decision-making (background checks, credit worthiness, etc.)

- Classifications (email spam, diseases, phishing website)

- Recognition (facial, voice, text, etc.)

# Other use cases by some organizations

https://data-flair.training/blogs/data-science-use-cases/

For more uses cases, please vist [this](https://builtin.com/data-science) and [this](https://data-flair.training/blogs/data-science-use-cases/).