# Tools for Data Science

## Introduction to Tools for Data Science

Data science involves extracting insights and knowledge from data to make informed decisions. To effectively work with data, data scientists rely on a variety of tools that aid in data collection, analysis, visualization, and more. In this section, we'll briefly introduce some of the essential tools used in data science:

### Programming Languages

1. **Python**

2. **R**

### Data Manipulation and Analysis

1. **Pandas**

2. **NumPy**

### Visualization

1. **Matplotlib**

2. **Seaborn**

3. **ggplot2**

### Machine Learning

1. **scikit-learn**

2. **TensorFlow** and **PyTorch**

### Version Control

1. **Git**

2. **GitLab**

# Data Science Languages

Data science involves working with data to gain insights and make informed decisions. Various programming languages are used in this field to manipulate, analyze, visualize, and model data. Here are some of the prominent languages used in data science:

## 1. Python

Python is one of the most popular and versatile languages in data science. It offers a wide range of libraries and frameworks such as Pandas, NumPy, scikit-learn, Matplotlib, and TensorFlow, making it suitable for tasks ranging from data cleaning to machine learning and deep learning.

## 2. R

R is another language widely used in statistics and data analysis. It has a rich ecosystem of packages for data manipulation (dplyr), data visualization (ggplot2), and statistical modeling, making it a preferred choice for statisticians and researchers.

## 3. SQL

Structured Query Language (SQL) is essential for working with relational databases. Data scientists use SQL to retrieve, manipulate, and analyze data stored in databases, which is crucial for various data-related tasks.

## 4. Julia

Julia is a newer programming language that's gaining traction in the data science community due to its performance and ease of use. It's designed for numerical and scientific computing, making it suitable for data analysis and high-performance computations.

## 5. SAS

Statistical Analysis System (SAS) is widely used in industries and academia for advanced analytics, business intelligence, and data management. It provides tools for data transformation, statistical analysis, and predictive modeling.

## 6. Scala

Scala is often used with Apache Spark, a powerful framework for distributed data processing and big data analysis. It combines object-oriented and functional programming paradigms and is suitable for handling large-scale data.

These are some of the languages used in data science. 

# Essential Data Science Libraries

Data science involves using specialized libraries to efficiently manipulate, analyze, visualize, and model data. Here are some fundamental libraries that data scientists often use:

## Data Manipulation and Analysis

- **Pandas**

- **NumPy**

## Data Visualization

- **Matplotlib**

- **Seaborn**

- **Plotly**

## Machine Learning

- **scikit-learn**

- **TensorFlow**

- **PyTorch**


## Statistical Computing

- **SciPy**

## Natural Language Processing (NLP)

- **NLTK**
- **spaCy**

## Big Data and Distributed Computing

- **PySpark**

These libraries are some of the libraries.


# Data Science Tools

| Tool                | Description                                                                                                   |
|:--------------------|:--------------------------------------------------------------------------------------------------------------|
| Programming Languages |  |
| Python              | A versatile language with rich libraries for data manipulation, analysis, and machine learning.              |
| R                   | Widely used for statistical analysis, data visualization, and creating interactive reports.                  |
| Julia               | Known for its high performance and suitability for numerical and scientific computing.                     |
| SQL                 | Essential for querying and managing data stored in relational databases.                                   |
| SAS                 | Used for advanced analytics, business intelligence, and data management in industries and academia.         |
| Scala               | Often paired with Apache Spark for distributed data processing and analysis.                               |

<br>

| Data Manipulation and Analysis Libraries |  |
|:------------------------|:---------------------------------------------------------------------------------------------------------|
| Pandas                  | Python library for data manipulation and analysis, providing DataFrames and Series for easy data handling. |
| NumPy                   | Fundamental package for numerical computation in Python, essential for working with arrays and matrices.  |

<br>

| Data Visualization Libraries |  |
|:--------------------------|:-----------------------------------------------------------------------------------------------------|
| Matplotlib               | A versatile Python plotting library for creating static, interactive, and animated visualizations.  |
| Seaborn                  | Built on Matplotlib, it simplifies creating informative and visually appealing statistical plots. |
| Plotly                   | Enables interactive visualizations, dashboards, and web-based applications with rich graphics.   |

<br>

| Machine Learning Frameworks |  |
|:-------------------------|:---------------------------------------------------------------------------------------------|
| scikit-learn             | A comprehensive machine learning library for Python.|
| TensorFlow               | Widely used open-source deep learning framework for building and training neural networks.                     |
| PyTorch                  | Deep learning framework known for its dynamic computational graph and popularity in research.                  |

<br>

| Statistical Computing Library |  |
|:----------------------------|:-----------------------------------------------------------------------------------------|
| SciPy                       | Extends NumPy with functions for optimization, integration, interpolation, and more. |

<br>

| Natural Language Processing (NLP) Libraries |  |
|:-------------------------------------------|:-------------------------------------------------------------------------|
| NLTK                                      | A toolkit for NLP tasks like tokenization, stemming, tagging, and parsing. |
| spaCy                                     | A fast NLP library with pre-trained models for tasks like entity recognition. |

<br>

| Big Data and Distributed Computing Tools |  |
|:----------------------------------------|:------------------------------------------------------------|
| PySpark                                | Python library for Apache Spark, a framework for distributed data processing. |


# Arithmetic Expression Examples

Arithmetic expressions play a fundamental role in mathematics and programming. They involve mathematical operations like addition, subtraction, multiplication, and division, allowing us to perform calculations. In this section, we'll explore some common arithmetic expression examples:

## Addition and Subtraction

Arithmetic expressions involving addition and subtraction are used to combine or subtract numerical values. Here are some examples:

- **Example 1**: \(5 + 3\) results in \(8\).
- **Example 2**: \(12 - 7\) equals \(5\).
- **Example 3**: \(2 + (-6)\) gives \(-4\), where the negative sign indicates subtraction.

## Multiplication and Division

Expressions with multiplication and division are used to scale or distribute values. Here are a few examples:

- **Example 4**: \(4 \times 6\) equals \(24\).
- **Example 5**: \(15 \div 3\) results in \(5\).
- **Example 6**: \((-8) \times 2\) gives \(-16\).

## Mixed Operations

Arithmetic expressions can involve multiple operations. Here are some examples:

- **Example 7**: \(3 + 2 \times 4\) equals \(11\), following the order of operations (PEMDAS/BODMAS).
- **Example 8**: \((6 - 2) \times 5\) gives \(20\).
- **Example 9**: \(12 \div (4 - 1)\) results in \(4\).

## Parentheses for Clarity

Using parentheses helps clarify the order of operations and ensures the desired calculations. Here's an example:

- **Example 10**: \(2 \times (3 + 4)\) equals \(14\), emphasizing that the addition should be done before multiplication.



In [2]:
# Multiplication and Addition Examples

# Multiplication
num1 = 555
num2 = 313
result_multiplication = num1 * num2
print(f"Multiplying {num1} and {num2} gives {result_multiplication}")

# Addition
num3 = 7
num4 = 12
result_addition = num3 + num4
print(f"Adding {num3} and {num4} results in {result_addition}")


Multiplying 555 and 313 gives 173715
Adding 7 and 12 results in 19


In [3]:
# Conversion from Minutes to Hours

minutes = 12458  # Change this value to the number of minutes you want to convert

# Convert minutes to hours and minutes
hours = minutes // 60  # Get the whole number of hours
remaining_minutes = minutes % 60  # Get the remaining minutes

print(f"{minutes} minutes is equal to {hours} hours and {remaining_minutes} minutes.")


12458 minutes is equal to 207 hours and 38 minutes.


# Objectives

In this section, we will focus on achieving the following objectives:

Exercise 2 - Create a markdown cell with the title of the notebook.

Exercise 3 - Create a markdown cell for an introduction.

Exercise 4 - Create a markdown cell to list data science languages.

Exercise 5 - Create a markdown cell to list data science libraries.

Exercise 6 - Create a markdown cell with a table of Data Science tools.

Exercise 7 - Create a markdown cell introducing arithmetic expression examples.

Exercise 8 - Create a code cell to multiply and add numbers.

Exercise 9 - Create a code cell to convert minutes to hours.



# **Author**: Ulisses Ferreira


[Notebook Link](https://github.com/ulisses1672/Jupyter_Tools_For_Data_Science)
