# Python and Data Science

Python is open source, interpreted, high level language and provides great approach for object-oriented programming. It is one of the best language used by data scientist for various data science projects/application. Python provide great functionality to deal with mathematics, statistics and scientific function. It provides great libraries to deals with data science application.

One of the main reasons why Python is widely used in the scientific and research communities is because of its ease of use and simple syntax which makes it easy to adapt for people who do not have an engineering background. It is also more suited for quick prototyping.

![](https://www.brsoftech.com/blog/wp-content/uploads/2019/11/most-in-demand-programming-languages-2020.png)

# Is Python a New Language?

Python was first released in 1991. It was created by Guido van Rossum as a hobby project. 

It was named after a comedy TV series.

![Monty Python](https://upload.wikimedia.org/wikipedia/en/c/cd/Monty_Python%27s_Flying_Circus_Title_Card.png)

# Computing for Everybody
As python was becoming popular, Van Rossum submitted a funding proposal to DARPA called "Computer Programming for Everybody", in which he further defined his goals for Python:
- An easy and intuitive language just as powerful as major competitors
- Open source, so anyone can contribute to its development
- Code that is as understandable as plain English
- Suitability for everyday tasks, allowing for short development times

> In 2021, Python was the second most popular language on GitHub, a social coding website, behind JavaScript and was the most popular language in the last quarter of the year.

According to a programming language popularity survey it is consistently among the top 10 most mentioned languages in job postings. Furthermore, Python has been among the 10 most popular programming languages every year since 2004 according to the TIOBE Programming Community Index.

# The Zen of Python

The Zen of Python is a collection of 19 "guiding principles" for writing computer programs that influence the design of the Python programming language. Software engineer Tim Peters wrote this set of principles and posted it on the Python mailing list in 1999. Peters's list left open a 20th principle "for Guido to fill in", referring to Guido van Rossum, the original author of the Python language. The vacancy for a 20th principle has not been filled.

- Beautiful is better than ugly.
- Explicit is better than implicit.
- Simple is better than complex.
- Complex is better than complicated.
- Flat is better than nested.
- Sparse is better than dense.
- Readability counts.
- Special cases aren't special enough to break the rules.
- Although practicality beats purity.
- Errors should never pass silently.
- Unless explicitly silenced.
- In the face of ambiguity, refuse the temptation to guess.
- There should be one—and preferably only one—obvious way to do it.
- Although that way may not be obvious at first unless you're Dutch.
- Now is better than never.
- Although never is often better than right now.
- If the implementation is hard to explain, it's a bad idea.
- If the implementation is easy to explain, it may be a good idea.
- Namespaces are one honking great idea—let's do more of those!

# Try Python Now

Select the following code block. 
Click Cell Menu (on the top) > Run Cells.

You can also press Ctrl+Enter. 

In [None]:
print ("This is Python!")

But this isn't fun right? The following block creates two variables and put numbers in them, and then compares which number is larger. Feel free to play around and change the numbers and see how it affects the results.

In [None]:
a = 10
b = 15

if b > a:
    print("B is greater")
elif a > b:
    print ("A is greater")
else:
    print ("Both are same")

The following block creates a list of three fruit names (saved as String). Then we loop over all the fruit names and print a sentence.

In [None]:
fruits = ["Apple", "Banana", "Mango"]

for fruit in fruits:
    print ("I eat "+fruit) 

# What is Anaconda?

Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment. 

There are several alternatives, however Anaconda is the most popular due to simplicity of managing the python components.

Jupyter Notebook (formerly IPython Notebooks) is a web-based interactive computational environment for creating Jupyter notebook documents. The "notebook" term can colloquially make reference to many different entities, mainly the Jupyter web application, Jupyter Python web server, or Jupyter document format.

# Markdown

Hello this is a text. This is not python code. This will not run.

But this will be displayed properly.

# This is a heading

## This is a smaller heading

### This is an even smaller heading

If you see hash (#) symbol in the beginning of each heading, you are currently in the edit mode. If you don't, double click this text and edit something. Click Run Cell (or Ctrl+Enter) again to update it. Remember you'll need to save the file to preserve the changes. psst... sometimes anaconda may save automatically.



Here're the reasons why you should use markdown cells:
1. It makes your notes look better
1. It helps other programmers understand what you are doing

# More Python

Create a new block below. Click on Insert Menu > Insert Cell below. You can also use the shortcut key.

In that cell, write 10+20 and verify that the output is correct. 

Did you create a new block above this cell? If you didn't, you can still do it by pressing ESCAPE key to go to the command mode of Jupyter, and press 'A' key to create. You can do all the basic arithmetic or logical operations on number literals or variables. Play around with the following code block.

In [None]:
20/3

In [None]:
20%3

The % operator is called **modulus** operator, which will divide the first number by the second number and return the *reminder* as output.
> Some of you might know it already 

In Python, we can write a print statement like:
`print ("Hello Julia")`

and it should get printed.

# Table in Markdown

| Sno | Student Name |
| --- | ------------ |
| 1   | Narender     |

```
| Sno | Student Name |
| --- | ------------ |
| 1   | Narender |
```

You can double click this cell to see the actual raw syntax behind the fancy formatting.

# Comments in Python

Comments are used to explain the code, make notes to help other programmers, or make notes for future scope.
They are mostly used to make code readable.

In [None]:
# This is a comment
print ("Hello World!") # This is a string statement
print (5+9) # This is a number
print ('The end!') # Bye

# Errors in Python

If something went wrong, python gives a detailed description of what went wrong. 

**NameError** is raised when a local or global name is not found. The associated value is an error message that includes the name that could not be found. In simple words, Python interpreter doesn't understand what a particular word you used means. One of the most common causes for this kind of error are misspelling or not initializing or importing the mentioned object. 

The following block shows an error that tells you the line that caused the error, and an error message that explains the error. Can you fix it?

In [1]:
myname = "Jones"
print ("Hello "+myname)

Hello Jones


# Importing Libraries

Anaconda is a suite of tools you need to build modern software and data science projects. These tools and features are packed in *packages* that you need to *import* before you can use them in your code. 

Run the following block. It's a joke feature added in python. It will open another browser tab containing a comic about how powerful Python is. 



In [None]:
import antigravity

## Testing Required Packages

We should see if the packages we need for this course are ready to be used. The following code should run without errors. 

### ModuleNotFoundError
If you see an error that looks like the following, the mentioned module or package is not present in your python environment. 

```
----> 1 import pandas as pd
      2 import numpy as np
      3 import matplotlib.pyplot as plt
      4 
      5 np.random.seed(0)

ModuleNotFoundError: No module named 'pandas'
```

If you see an error like this, you can install it by visiting Package Manager in the *Anaconda Navigator*. You can also install it by adding a code block below and typing `%pip install pandas`. If some other package caused this issue, you can replace its name instead. The output should say `Successfully installed pandas`.

If everything went right, you should be able to see a chart showing a normal distribution.

In [None]:
%pip install matplotlib

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(0)

values = np.random.randn(100)
s = pd.Series(values) 
s.plot(kind='hist', title='Normally distributed random values') 
plt.show()   

Verifying if scikit-learn package is working fine. Run the following block. If you see a ModuleNotFoundError, you can install it using `%pip install scikit-learn`. 

In [None]:
import sklearn

# A Fun Mini-Game

Run the following block to play a short game in which you have to guess a number. The first block imports the required package and creates an empty list of winners. The second block runs the game. You can play as often as you like. Play around and make changes in the code. This code block also introduces some more basics of Python programming. If something breaks, you can still look at the git and copy the correct code. 

In [None]:
import random

score_history = []

In [None]:
secret_number = random.randint(1,100)
count = 0

playername = input("What's your name?")
gamewon = True
print ("Welcome "+playername+". In this game, you will guess a number between 1 to 100. I will give you hints. Let's see how you perform")

while True:
    guess = int (input("Enter your Guess: "))
    count += 1
    if guess == secret_number:
        print ("You win the game")
        break
    else:
        if guess < secret_number:
            print ("No. Try a higher number")
        if guess > secret_number:
            print ("No. Try a lower number")
    if count>11:
        print ("Sorry. I can't give you more tries. You lost.")
        gamewon = False
        break
score = 11-count

if gamewon:
    print ("Congratulations.. your score is {}".format(11-count))
    score_history.append(playername+"\t"+str(score))
print ()
print ('\n')
print (" = = = = = = Hall of Fame = = = = = = ")
for row in score_history:
    print (row)


