---   

<h1 align="center">Introduction to Data Analyst and Data Science for beginners</h1>
<h1 align="center">Lecture no 14</h1>

---
<h3><div align="right">Ehtisham Sadiq</div></h3>    

<h1 align="center">Lecture 2.14</h1>

## _Python Built-in Modules.ipynb_
#### [Check out the full list of Python Built-in modules](https://docs.python.org/3/py-modindex.html)

## Learning agenda of this notebook
Python has tons of Built-in modules that can be read from above link. In this notebook file, we will be discussing a short but important subset of it:
1. What are Python Built-in Modules
2. Different ways to import a Module in Python
3. The Math Module 
4. The Random Module
5. The Time Module
6. The DateTime Module
7. The Calendar Module
8. The OS Module
9. The URLLIB Module
10. Statistics Module

In [None]:
# import atm as at 

## 1.  What are Python Built-in Modules
- In Python, Modules are simply files with the `. py` extension containing Python code (variables, functions, classes etc) that can be imported inside another Python Program. 
- You can think of Python module like a C library, which is linked with C program during the linking phase.
- Some advantages of Modular programming are:
>- **Modularity:** We use modules to break down large programs into small manageable and organized files. 
>- **Simplicity:** Rather than focusing the entire problem at hand, a module typically focuses on one relatively small portion of the problem.
>- **Maintainability:** Modules are typically designed so that they enforce logical boundries between different problem domains.
>- **Reusability:** Functionality defined in a single module can be easily reused (through an appropriately defined interface) by other parts of the application. This eliminates the need to duplicate code. We can define our most used functions in a module and import it, instead of copying their definitions into different programs.
>- **Scoping:** Modules typically define a separate namespace, which helps avoid collisions between identifiers in different areas of a program. The key benefit of using modules is _namespaces_: you must import the module to use its functions within a Python script or notebook. Namespaces provide encapsulation and avoid naming conflicts between your code and a module or across modules.


**Note**: - A module is a single file of Python code that is meant to be imported, while a Python package is a simple directory having collections of Python modules under a common namespace. 

In [None]:
# Modules -> Single file of code
# Packages -> Multiple files of code

## 2. Ways to Import a Python Module
- Python math module contains rich set of functions, that allows you to perform mathematical tasks on numbers.
- Since the math module comes packaged with the Python release, you don't have to install it separately. Using it is just a matter of importing the module

### a. Option 1: `import math`
>- We can use the **`import`** keyword to import a module, and later using the module name we can access its functions using the dot . operator, like `math.ceil()`  

In [None]:
import math
# math.pow(3,5)

In [None]:
# We have seen the use of dir() function. When called without argument it displays symbols of current module
print(dir())
# del x

Python dir() function returns the list of names in the current local scope. If the object on which method is called has a method named __dir__(), this method will be called and must return the list of attributes. It takes a single object type argument.

In [None]:
print(dir(math))

In [None]:
print(math.__doc__ , math.__name__ , math.__package__)

In [None]:
# import math

In [None]:
import math
# print(dir())

print(math.ceil(2.3)) #ceil function for positive value round up our value 
# and for negative value , function round downs our value

print(math.floor(-17.2)) # It is opposite of ceil function

print(math.factorial(10))

math.ceil() function returns the smallest integral value greater than the number. If number is already integer, same number is returned.

### b. Option 2: `import math as m`
>- We can also import a module by using a short alias, thus saving typing time in some cases. Note that in this case, the name `math` will not be recognized in our scope. Hence, `math.ceil()` is invalid and `m.ceil()` is the correct implementation.

In [None]:
import math as m
print(dir())
print(m.ceil(2.3))
print(m.floor(45.5))

### c. Option 3:`from math import ceil`        OR       `from math import ceil, floor`
>- We can use the **`from`** keyword to import specific name(s) from a module instead of importing the entire contents of a module. This way we don't have to use the dot operator and can access the function directly by its name

In [None]:
from math import ceil, floor, pow
print(dir())
ceil(2.3)

### d. Option 4:`from mymath import *`
>- We can import all the attributes from a module using asterik `*` construct. The difference between `import math` and `from math import *` is that in the later case you can don't have to use the dot operator and can directly use the functions, e.g., `ceil()`


In [None]:
from math import *
print(dir())
ceil(2.3), ceil(4.5)

In [None]:
floor(5.7), pow(5,4), sin(30)

## 3. The `math` Module
- Python math module contains rich set of functions, that allows you to perform mathematical tasks on numbers.
- Since the math module comes packaged with the Python release, you don't have to install it separately. Using it is just a matter of importing the module
#### [Read Python Documentation for details about `math` module](https://docs.python.org/3/library/math.html#module-math)

### a. Constants of Math Module

- **PI:** 
    - PI is the ratio of a circle's circumference (c) to its diameter (d).
    - It is an irrational number, so it can be approximated to the value 22/7 = 3.141592...
    - You can access its value since it is defined as a constant inside the math module with the name of 'pi', and is given correct upto 15 digits after the decimal point
    - Pi has been calculated to over 50 trillion digits beyond its decimal point.  PI’s infinite nature makes it a fun challenge to memorize, and to computationally calculate more and more digits
    - Pi Day is celebrated on March 14th (3/14) around the world. 

In [None]:
# First Method
from math import *
pi

In [None]:
# Second Method
import math
math.pi

- **TAU:**
    - TAU is the ratio of a circule's circumference (c) to its radius (r).
    - This constant is equal to 2PI, or roughly 6.28
    - Like PI, TAU is also an irrational number, and can be approximated to the value 2PI = 6.28318...

In [None]:
# First Method
from math import *
tau

In [None]:
import math
math.tau

- **Euler's Number:**
    - Euler's number (e) is a constant that is the base of natural logarithm.
    - It is a mathematical function that is commonly used to calculate rates of growth of decay.
    - As with PI and TAU, `e` is also an irrational number with approximated value of 2.718

In [None]:
math.e

- **Infinity:**
    - Infinity can't be defined by a number or a numeric value
    - It is a mathematical concept representing something that is never ending or boundless.
    - Infinity can go in either direction (positive as well as negative)
    - `math.inf` (added to Python 3.5) is a special data type equivalent to a float

In [None]:
math.inf

In [None]:
type(math.inf)

In [None]:
# Proof of concept: Positive infinity is greater than any highest known number
math.inf > 99993999999999929999999999456748884839999883

In [None]:
# Proof of concept: Negative infinity is smaller than any smallest known number
-math.inf < -91876999999999999999999999954309873211234

In [None]:
# Proof of concept: Whatever number is added/subtracted to positive infinity, the result is positive infinity
math.inf + 3249876999999995455668679656836656943

In [None]:
#### Proof of concept: Whatever number is subtracted/added from negative infinity, the result is negative infinity
-math.inf - 324999999999876543874399795545

- **NaN (Not a Number):**
    - Not a Number is not a mathematical concept, rather is introduced in the field of computer science as a reference to values that are not numeric
    - `NaN` value can be due to invalid inputs, or it can indicate that a variable that should be numerical has been corrupted by text characters or symbols

In [None]:
math.nan

In [None]:
type(math.nan)

### b. Arithmetic Functions of Math Module

- Factorial of a number is obtained by multiplying that number and all numbers below it till one
- Factorial is not defined for negative values as well as for decimal values. Factorial of zero is 1

In [None]:
def fact_loop(num):
    if num < 0:
        return 0
    if num == 0:
        return 1

    factorial = 1
    for i in range(1, num + 1):
        factorial = factorial * i
    return factorial
fact_loop(100)

In [None]:
def fact_recursion(num):
    if num < 0:
        return 0
    if num == 0:
        return 1

    return num * fact_recursion(num - 1)
fact_loop(50)

In [None]:
import math
math.factorial(50)

**Lets compare the execution time of calculating factorial using above three ways, using the `timeit()` method which returns the time taken to execute the statements a specified number of times**
```
timeit.timeit(stmt, setup, globals, number)
```
Where
- `stmt`: Code statement(s) whose execution time is to be measured.(Use ; for multiple statements)
- `setup`: Used to import some modules or declare some necessary variables. (Use ; for multiple statements)
- `globals`: You can simplay pass `globals()` to the globals parameter, which will cause the code to be executed within your current global namespace
- `number`: It specifies the number of times stmt will be executed. (Default is 1 million times)

In [None]:
import timeit
timeit.timeit("fact_loop(50)", globals=globals(), number = 1000000)

In [None]:
timeit.timeit("fact_recursion(50)", globals=globals(), number = 1000000)

In [None]:
timeit.timeit("math.factorial(50)", setup = "import math", number = 1000000)

In [None]:
import math
print(math.ceil(20.222), math.ceil(-11.85))

In [None]:
import math
print(math.floor(20.99), math.floor(-13.1))

In [None]:
# math.pow(3,4)
# math.radians(60)
# math.fsum([3,4,54,65,76,7,6])

In [None]:
import math
math.trunc(20.99), math.trunc(-13.91)


Return x with the fractional part removed, leaving the integer part. This rounds toward 0: trunc() is equivalent to floor() for positive x, and equivalent to ceil() for negative x. If x is not a float, delegates to x.__trunc__, which should return an Integral value.

![](1.webp)

- Where n= set size, total number of items in the sample
- where r = subset size, the number of items to be selected from the sample

In [None]:
# perm(n,k) = n!/(n-k)!
import math
# math.perm()
math.perm(3,2)
#All permutations made by with letters a, b, c by taking two at a time are six (ab, ba, ac, ca, bc, cb)

In [None]:
# sample {x,y,z} -> {xy,xz,yz,yx,zx,zy}
math.perm(3,5)


In [None]:
from math import *
def per(n, k):
    result = factorial(n)//factorial(n-k)
    return result
per(4,2)

In [None]:
# comb(n,k) = n!/k!(n-k)!
import math
math.comb(4,2)
#All combinations made by with letters a, b, c by taking two at a time are three (ab, ac, bc)

In [None]:
math.comb(4,5)

![](2.png)

In [None]:
import math
math.lcm(5,10)

In [None]:
# LCM -> in data structure
# math.lcm() -> for version greater than 3.10

![](3.png)

Greatest common divisor or gcd is a mathematical expression to find the highest number which can divide both the numbers whose gcd has to be found with the resulting remainder as zero. It has many mathematical applications. Python has a inbuilt gcd function in the math module which can be used for this purpose.

In [None]:
import math
print(math.gcd(39,27), math.gcd(100,50))

LCM stands for Least Common Multiple. It is a concept of arithmetic and number system. The LCM of two integers a and b is denoted by LCM (a,b). It is the smallest positive integer that is divisible by both "a" and "b".

In [None]:
import math
math.lcm(20,30) # 60. Available on Python 3.9, I have currently Python3.8 :(

### c. Power and Logarithmic Functions of Math Module

In [None]:
# Example: The power(a,b) function returns a**b. Available in the math module as well as Python built-in function
# The pow() function in the math module is computationally faster
import math
a = 2
b = 5
print(a**b , pow(a,b), math.pow(a,b))

In [None]:
# Example: The sqrt(x) function returns a number y such that y² = x;
import math
print(math.sqrt(25), math.sqrt(34), math.sqrt(3623))

In [None]:
# Example: The exp(x) function returns e**x, where e is Euler's number (2.718281828459045)
import math
x = 3
print(math.e ** x, math.exp(x))

In [None]:
# Example: The log(x, base) function return the logarithm of x to the mentioned base. Default base is e
# Logarithm is the inverse function to exponentiation 

print(math.log(8), math.log(8, math.e), math.log(8, 2), math.log(8, 10))


### d. Trigonometric and Hyperbolic Functions of Math Module
- The word trigonometry comes from the Greek words trigonon (“triangle”) and metron (“to measure”). 
- Trigonometry is the branch of mathematics dealing with the relations of the sides and angles of triangles and with the relevant functions of any angles. 
- Trigonometric functions are used in obtaining unknown angles and distances from known or measured angles in geometric figures.
- Note: Hyperbolic functions are analogues of the ordinary trigonometric functions, but defined using the hyperbola rather than the circle. 

In [None]:
# Six functions of an angle: sin(), cos(), tan(), asin(), acos(), atan().
# The angle given to these functions should be in radians. A circle has 360 degrees and 2pi radians
import math
print(math.sin(0), math.sin(3.14), math.sin(90))

In [None]:
# Examples of Hyperbolic Functions: sinh(), cosh(), tanh(), asinh(), acosh(), atanh()
import math
math.sinh(3.14)

![](4.png)

![](5.png)

## 4. The `random` Module
- The Random module is  used to perform random actions such as generating random numbers, print random value for a list or string, etc.
#### [Read Python Documentation for details about `random` module](https://docs.python.org/3/library/random.html#module-random)

In [None]:
#import random module
import random

# use dir() to get the list of complete functions in random module
print("Existing functions in Random module: \n\n", dir(random))
print(random.__doc__)

### a. The `random.random()` Function
- This function `random.random()` returns a random float value in the interval [0,1), i.e., 0 is inclusive while 1 is not

In [None]:
import random
random.random()

#### Question:
Create a sequence of 10 random floating point numbers between 0-30.

In [None]:
floatSequence = []
for i in range(1,11):
    floatSequence.append(random.random()*30)
print(floatSequence)

In [None]:
rv = random.random()
rv

### b. The `random.uniform()` Function
- This function `random.uniform(a, b)` returns a random float value in the interval a and b 

In [None]:
rv = random.uniform(85, 100)
rv

### c. The `random.randint()` and `random.randrange()` Functions
- We have seen the use of the built-in `range()` function that provides a range object, using which you can generate a list of numbers in that specific range
- The `random.randint(start, stop)` returns one random integer (with start and stop both inclusive).
- The `random.randrange(start, stop=None, step=1)` returns one random integer (with stop NOT inclusive). 

In [None]:
# Note the stop value is also inclusive, which is unlike what we expect in Python
import random
# random.randint(0, 5)
random.randint(10001,99999)

In [None]:
# This fixes the problem and does not include the endpoint
random.randrange(0, 15,2)
# 0,2,4

### d. The `random.choice()` Function
- This function is passed a non-empty sequence and returns a random element from that sequence
```
random.choice(seq)
```

In [None]:
import random

# select a random element from a list
list1 = ['Ehtisham','Ali', 'Ayesha', 'Dua','Adeen', 'Ahmed', 'Fizza', 'Azka']
print("Random element from list: ", random.choice(list1))
  
# select a random character from a string
string = "HappyLearning"
print("Fetching Random item from string: ", random.choice(string))
  
# select a random item from a tuple
tuple1 = (1, 2, 3, 4, 5)
print("Fetching Random element from Tuple: ",random.choice(tuple1))

**With this background, one should be able explore other methods as and when required**

In [None]:
# random.random() -> float value between [0,1]
# random.uniform(a,b) -> float value between [a,b]
# random.randint(a,b) -> integer value between [a,b]
# random.randrange(a,b) -> integer value
# random.choice(sequence) -> return random item from sequence

## Bonus Info:
- Before, we discuss Python's `time`, `datetime`, and `calendar` modules, let me put the stage right by having a brief discussion on the concept of time and time zones:

### Calendar Time:
- The time measured from some fixed/reference point is called real time and once category of it is calendar time. Some famous reference points and their corresponding calendars are:
    - **Hijri Calendar** (AH), measures time from the year of Hijrat, when prophet Muhammad (Peace be upon Him) migrated from Mecca to Madina
    - **Gregorian Calendar** (AD), measures time from birth year of Jesus Christ. AD stands for Anno Domini in Latin, means "In the year of Jesus Christ"
    - **UNIX Calendar**, measures time from birth year of UNIX called UNIX epoch (00:00:00 UTC on 1 January 1970)


In [None]:
import time

# use dir() to get the list of complete functions in time module
print("Existing functions in time module: \n\n", dir(time))

##  Getting the Local Time
- To get the current time in Python in a more perceivable Python Date Time format, we use the localtime() method. It returns the Python time according to the area you’re in.

In [None]:
import time
local_time = time.localtime()
print("Time:",local_time)
print("Current year:", local_time.tm_year)
print("Current hour:", local_time.tm_hour)
print("Current minute:", local_time.tm_min)
print("Current second:", local_time.tm_sec)

## Getting the Formatted Time
- When we use the `asctime()` method, we get something much more readable.

In [None]:
time.asctime()

In [None]:
# You can also provide a tuple or a struct_time structure as an argument.
time.asctime(time.localtime())

**(i) The `time.sleep(seconds)` method is used to delay execution for a given number of seconds. The argument may be a floating point number for subsecond precision.**

In [None]:
import time
print("This is printed immediately.")
time.sleep(5)
print("This is printed after 5 seconds.")

**(ii) The `time.time()` method returns the current time in seconds since UNIX Epoch (00:00:00 UTC on 1 January 1970).**

In [None]:
import time
seconds = time.time()
seconds

**(iii) The `time.ctime(seconds)` method takes seconds passed since epoch as argument and returns a string representing local time.**

In [None]:
import time
gettime = time.time()
print(f"Time in seconds : {gettime}")
tm = time.ctime(gettime)
print(tm)

In [None]:
import time
dtg1 = time.ctime(0)
dtg1

In [None]:
import time
seconds = time.time()
dtg2 = time.ctime(seconds)
dtg2

In [None]:
import time
seconds = time.time()
dtg2 = time.ctime(3434643565.34554)
dtg2

**With this background, one should be able explore other methods as and when required**

In [None]:
# time.localtime -> cuurent time of laptop , return in struct_time format
# time.asctime() -> current time but in string format
# time.sleep(second) -> execution delay for that seconds
# time.time() -> return current time but in seconds
# time.ctime(seconds) -> return time in string format according seconds passed

## 6. The `datetime` Module
- The Python datetime module offers functions and classes for working with date and time parsing, formatting, and arithmetic. 
- The `datetime` module can support many of the same operations as `time` module, but provides a more object oriented set of types, and also has some limited support for time zones.
#### [Read Python Documentation for details about `datetime` module](https://docs.python.org/3/library/datetime.html#module-datetime)

In [None]:
# import datetime module
import datetime

# use dir() to get the list of complete functions in datetime module
print("Existing functions in datetime module: \n\n", dir(datetime))

In [None]:
#The smallest year number allowed in a date or datetime object. MINYEAR is 1.
datetime.MINYEAR

In [None]:
#The largest year number allowed in a date or datetime object. MAXYEAR is 9999.
datetime.MAXYEAR
# current year -> 2022

**(i) The `datetime.datetime.today()` and `datetime.datetime.now()` methods return a datetime object as per the time zone of the system**

In [None]:
dtg = datetime.datetime.today()
dtg

In [None]:
dtg = datetime.datetime.now()
print(dtg)
print(type(dtg))

**(ii) Let us explore some commonly used attributes related with the `datetime` object.**
- `dtg.year:` returns the year
- `dtg.month:` returns the month
- `dtg.day:` returns the date
- `dtg.hour:` returns the hour
- `dtg.minute:` returns the minutes
- `dtg.second:` returns the seconds
- `dtg.microsecond:` returns the microseconds

In [None]:
dtg.year

In [None]:
dtg.day

In [None]:
dtg.hour

In [None]:
dtg.microsecond

**(iii) The `datetime.datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])` method is used to create a `datetime` object**

In [None]:
import datetime
dtg = datetime.datetime(2021,12,31)
print(dtg)
print(type(dtg))

In [None]:
dtg = datetime.datetime(2021, 12, 31, 4, 30, 54, 678)
print(dtg)

**(iv)  The `datetime.time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) ` methods returns a `time` object.**

In [None]:
t1 = datetime.time(10, 15, 54, 247)
print(t1)
print(type(t1))

**With this background, one should be able explore other methods as and when required**

### What is Timedelta in Python?
- A `timedelta represents a duration` which is the difference between two dates, time, or datetime instances, to the microsecond resolution.
- Use the `timedelta` to add or subtract weeks, days, hours, minutes, seconds, microseconds, and milliseconds from a given date and time.

<img src="images/14.png">

In [None]:
from datetime import datetime, timedelta
print(f"Current Date and time : {datetime.now()}")
# adding  3 weeks and 4 days and 5 hours
newDateTime = datetime.now() + timedelta(weeks=3, days=4, hours=5)
newDateTime

#### Example 1: Calculate the difference between two dates

In [None]:
import datetime
current_date = datetime.datetime.now()
givenDate = datetime.datetime(year=2020,month=3,day=23)
date = current_date-givenDate
print(date)
print(type(date))

#### Example 2: Calculate Future Datetime
Let’s see how to use timedelta class to calculate future dates by `adding four weeks` to a given date.

In [None]:
from datetime import datetime, timedelta
currentDate = datetime.now()
print(f"Current Date : {currentDate}")
# adding four weeks to the current date
FutureDate = current+timedelta(weeks=4)
print(f"Future Date : {FutureDate}")

In [None]:
# datetime.datetime.now()/today() -> return current date and time in datetime format
# datetime.datetime(years, months, days, hours, mintues,seconds) -> return a object of datetime type
# datetime.time() -> return  a object of datetime.time 
# datetime.timedelta(weeks, days, hours, mintues)

## 7. The `calendar` Module
- This module allows you to output calendars like the Unix `cal` program, and provides additional useful functions related to the calendar. By default, these calendars have Monday as the first day of the week, and Sunday as the last
#### [Read Python Documentation for details about `calendar` module](https://docs.python.org/3/library/calendar.html#module-calendar)

In [None]:
import calendar

# use dir() to get the list of complete functions in calendar module
print("Existing functions in calendar module: \n\n", dir(calendar))

In [None]:
# calendar() method to print the calendar of whole year
import calendar
cy = calendar.calendar(2022) 
print(cy)

In [None]:
import calendar
# month() method is used to print calendar of specific month

#print calendar of November 2021
c = calendar.month(2022,4) 
print(c)

In [None]:
import calendar
# can check wether the year is leap year or not
print("2021 is leap year: ", calendar.isleap(2021))

print("2020 was be leap year: ", calendar.isleap(2020))

**With this background, one should be able explore other methods as and when required**

In [None]:
# calendar.calendar(year) -> return a calendar of complete year
# calendar.month(year, months) -> return a month of specific year

## 8. The `os` Module
- Python OS module provides the facility to establish the interaction between the user and the operating system. It offers many useful OS functions that are used to perform OS-based tasks and get related information about operating system.
- This module provides a portable way of using operating system dependent functionality and provides dozens of functions for interacting with the operating system
#### [Read Python Documentation for details about `os` module](https://docs.python.org/3/library/os.html#module-os)

In [None]:
import os
# to get the list of complete functions in OS module
print("Existing functions in OS module: \n\n", dir(os))


## **1- os.name()**
- This function provides the name of the operating system module that it imports.
- Currently, it registers 'posix', 'nt', 'os2', 'ce', 'java' and 'riscos'.

In [None]:
import os

In [None]:
os.name

Windows NT is a family of operating systems developed by Microsoft that featured multi-processing capabilities, processor independence and multi-user support. The first version was released in 1993 as Windows NT 3.1, which was produced for servers and workstations.

In [None]:
# os module
import os

### a. The `os.getcwd()`  and `os.listdir()` Function
- The `os.getcwd()` return a unicode string representing the current working directory.
- The `os.listdir(path=None)` return a list containing the names of the files in the pwd in arbitrary order. Does not display '.' and '..' directories. An optional path can be specified
- `os.listdir(path=None)` method returns a list of all the files and folders present inside the specified directory. If no directory is specified then the list of files and folders inside the CWD is returned.


In [None]:
import os

# getcwd() function is used to return the current working directory
cwd = os.getcwd()
print("Current working directory:\n", cwd )

# lisdir() function is used to return the contents of current working directory
mylist = os.listdir(os.getcwd())
print("\nContents of directory: \n", mylist )


In [None]:
path = "/home/dell/Data/Introduction to data analyst and data science for begineers"
mylist = os.listdir(path)
print("\nContents of directory: \n", mylist )

### b. The `os.chdir()` Function
- The `os.chdir(path)` function is used to change the current working directory to the specified path.

In [None]:
import os

print("Get current working directory:\n\n\n", os.getcwd())

path = "/home/dell/Data/Introduction to data analyst and data science for begineers/Helloworld"
os.chdir(path)
# os.chdir("F:\Python Programs")

print("\nGet current working directory again:\n\n\n", os.getcwd())
os.listdir()

In [None]:
# os.listdir()
os.mkdir("PythonFolder")

In [None]:
os.rmdir("PythonFolder")

In [None]:
os.listdir()

### c. The   `os.mkdir()` and `os.rmdir()`Function
- The `os.mkdir(path)` function creates a new directory
- The `os.rmdir(path)` function removes a directory


In [None]:
import os

print(f"Current working directory : {os.getcwd()}")

list1 = os.listdir(os.getcwd())
print("Contents of directory: ", list1)

os.mkdir("ANewDir")




In [None]:
list2 = os.listdir(os.getcwd())
print("Contents of directory: ", list2)

In [None]:

# os.mkdir("ANewDir2")
list3 = os.listdir(os.getcwd())
print("Contents of directory: ", list3)


In [None]:
# os.rmdir("ANewDir")
os.rmdir("ANewDir2")

print(f"Content of directory : {os.listdir(os.getcwd())}")

## os.rename()
A file or directory can be renamed by using the function os.rename(). A user can rename the file if it has privilege to change the file.

In [None]:
!ls

In [None]:
import os 
filename = "Fig1.PNG"
renameFile = os.rename(filename, "NewFig.PNG")

In [None]:
os.listdir()

### d. The   `os.system()` Function
- The `os.system(command)` method is used to execute the command in a subshell

In [None]:
help(os.system)

In [None]:
import os


print("\n")
os.system('echo "This is getting more and more interesting"')

print("\n")
os.system('date')

In [None]:
print(os.system('ls'))

In [None]:
os.system('echo "This is getting more and more"')

In [None]:
import os
cmd = 'date'
os.system(cmd)

**Students should explore other functions like `chmod()`, `chown()`, `fstat()`, `getpid()`, `getuid()`**

os.getuid() method in Python is used to get the current process’s real user id while os.setuid() method is used to set the current process’s real user id.

os.getpid() method in Python is used to get the process ID of the current process.

In [None]:
# os.name -> current operating system
# os.getcwd() -> current directory present/working
# os.listdir() -> return all the files and folders/directories in cwd
# os.chdir(path) -> return our cwd to given path
# os.mkdir() -> used to create a new directory in cwd
# os.rmdir() ->  cwd , remove specific folder/file
# os.rename() -> to remane any file and folder in cwd

## 10. The `urllib` Package
- The `urllib` package in Python 3 is a collection of following Python modules used for working with Uniform Resource Locators:
    - `urllib.request` for opening and reading URLs, using variety of protocols
    - `urllib.error` containing the exceptions raised by urllib.request
    - `urllib.parse` for parsing URLs
    - `urllib.robotparser` for parsing robots.txt files

#### [Read Python Documentation for details about `urllib` package](https://docs.python.org/3/library/urllib.html#module-urllib)

In [None]:
os.getcwd()

In [None]:
import urllib
print(dir(urllib))

In [None]:
# os.getcwd()
import urllib
try:
    link = "https://githu1b.com/"
    request_url = urllib.request.urlopen(link)
    print(type(request_url))
    print(request_url.read())
except Exception as e:
    print(str(e))

## Create a GitHub Gist

In [None]:
url = "https://raw.githubusercontent.com/bsef19m521/DatasetsForProjects/master/tips.csv"
print(url)

In [None]:
import urllib
try:
    dataset = urllib.request.urlretrieve(url,"tips.csv")
    print(dataset)
except:
    print("Invalid URL!")

>**The `urllib.request.urlopen()`, may return a URLError saying `SSL: CERTIFICATE_VERIFY_FAILED`. To handle this error set  the `_create_default_https_context` attribute of `ssl` to `_create_unverified_context`**

In [None]:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

### b. The   `urllib.request.urlretrieve()` Function
- The `urllib.request.urlretrieve(url, filename=None)` method is used to retrieve a remote file into a temporary location on disk.
- Let us download a public csv file from github gist

In [None]:
import os
os.getcwd()

In [None]:
Link = "https://raw.githubusercontent.com/tuangauss/DataScienceProjects/master/data/history.csv"
import urllib
dataset = urllib.request.urlretrieve(Link, './History.csv')

In [None]:
import urllib
url ="https://gist.githubusercontent.com/mbejda/45db05ea50e79bc42016/raw/52d5ca99398b495e096f6eace20f5872129633e3/Fortune-1000-Company-Twitter-Accounts.csv"
Data = urllib.request.urlretrieve(url, './Twitter.csv')
print(Data)

In [None]:
os.listdir()

In [None]:
import urllib
url1 ="https://raw.githubusercontent.com/adacollege/cs-python-csv/master/students.csv"

urllib.request.urlretrieve(url1, './student.csv')

In [None]:
os.listdir()

# Bonus Part

In [None]:
# import pandas library to load stduent dataset
import pandas as pd

# load and read data using read_csv() function of pandas library 
df = pd.read_csv("Twitter.csv")

# pd.head() is used to read first five rows of dataframe
df.head()

# 10. Statistics Module

- This module provides functions for calculating mathematical statistics of numeric (Real-valued) data.

In [None]:
import statistics
# import stat

In [None]:
print(dir(statistics))

### Averages and measures of central location
- These functions calculate an average or typical value from a population or sample.

- mean() : Arithmetic mean (“average”) of data.
- fmean() : Fast, floating point arithmetic mean.
- geometric_mean() : Geometric mean of data.
- harmonic_mean() : Harmonic mean of data.
- median() : Median (middle value) of data.
- median_low() : Low median of data.
- median_high() : High median of data.
- median_grouped() : Median, or 50th percentile, of grouped data.
- mode() : Single mode (most common value) of discrete or nominal data.
- multimode() : List of modes (most common values) of discrete or nominal data.
- quantiles() : Divide data into intervals with equal probability.

#### statistics.mean(data)
Return the sample arithmetic mean of data which can be a sequence or iterable.

In [None]:
list1 = [43,5,65,76,87] 
# matrix=[[2,3,4],[6,7,8]]
#1-dimensional list, 2-D is called matrix, collection of 2-d lists is called dataset
statistics.mean(list1)

#### statistics.fmean(data)
- Convert data to floats and compute the arithmetic mean.
- This runs faster than the mean() function and it always returns a float. The data may be a sequence or iterable. If the input dataset is empty, raises a StatisticsError.

In [None]:
# example1
statistics.fmean(list1)

In [None]:
# example 2
try:
    print(statistics.fmean([]))
except Exception as e:
    print(e)

#### statistics.geometric_mean(data)
- Convert data to floats and compute the geometric mean.
- The geometric mean indicates the central tendency or typical value of the data using the product of the values (as opposed to the arithmetic mean which uses their sum).
- Raises a StatisticsError if the input dataset is empty, if it contains a zero, or if it contains a negative value. The data may be a sequence or iterable.

<img src="images/6.png" width="450" height="400">

In [None]:
print(list1)
print(statistics.geometric_mean(list1))

### statistics.harmonic_mean(data, weights=None)
- Return the harmonic mean of data, a sequence or iterable of real-valued numbers. If weights is omitted or None, then equal weighting is assumed.
- The harmonic mean is the reciprocal of the arithmetic mean() of the reciprocals of the data. For example, the harmonic mean of three values a, b and c will be equivalent to `3/(1/a + 1/b + 1/c)`. If one of the values is zero, the result will be zero.

<img src="images/7.jpg" width="450" height="400">

#### Example no 01: 
Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr. What is the average speed?

In [None]:
statistics.harmonic_mean([40,60])

#### Example no 02:
Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds-up to 60 km/hr for the remaining 30 km of the journey. What is the average speed?

In [None]:
statistics.harmonic_mean([40,60], weights =[ 5,30])

## statistics.median(data)
Return the median (middle value) of numeric data, using the common “mean of middle two” method. If data is empty, StatisticsError is raised. data can be a sequence or iterable.
The median is a robust measure of central location and is less affected by the presence of outliers. When the number of data points is odd, the middle data point is returned:

In [None]:
from statistics import *
median([1, 3, 5])

When the number of data points is even, the median is interpolated by taking the average of the two middle values:

In [None]:
median([1,3,5,7])

#### Note: 
- This is suited for when your data is discrete, and you don’t mind that the median may not be an actual data point.
- If the data is ordinal (supports order operations) but not numeric (doesn’t support addition), consider using `median_low()` or `median_high()` instead.

## statistics.median_low(data)
Return the low median of numeric data. If data is empty, `StatisticsError` is raised. data can be a sequence or iterable.
The low median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned.

In [None]:
print(median_low([1, 3, 5]), median_low([1, 4, 5, 7]))

## statistics.median_high(data)
Return the high median of data. If data is empty, `StatisticsError` is raised. data can be a sequence or iterable.
The high median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned.

In [None]:
median_high([1, 3, 5]), median_high([1, 3, 5, 7])

## statistics.mode(data)
Return the single most common data point from `discrete` or `nominal` data. The mode (when it exists) is the most typical value and serves as a measure of central location.
If there are multiple modes with the same frequency, returns the first one encountered in the data. If the smallest or largest of those is desired instead, use `min(multimode(data))` or `max(multimode(data))`. If the input data is empty, `StatisticsError` is raised.
mode assumes discrete data and returns a single value. This is the standard treatment of the mode as commonly taught in schools

In [None]:
mode([1, 1, 2, 3, 3, 3, 3, 4])

The mode is unique in that it is the only statistic in this package that also applies to nominal (non-numeric) data

In [None]:
mode(["red", "blue", "blue", "red", "green", "red", "red"])

## statistics.multimode(data)
Return a list of the most frequently occurring values in the order they were first encountered in the data. Will return more than one result if there are multiple modes or an empty list if the data is empty

In [None]:
multimode('aabbbbccddddeeffffgg')

In [None]:
multimode('')

## Measures of spread
These functions calculate a measure of how much the population or sample tends to deviate from the typical or average values.
- pstdev() : Population standard deviation of data.
- pvariance() : Population variance of data.
- stdev(): Sample standard deviation of data.
- variance(): Sample variance of data.

![](images/8.png)

![](images/9.png)

### statistics.pvariance(data, mu=None)
Return the population variance of data, a non-empty sequence or iterable of real-valued numbers. Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean.
If the optional second argument mu is given, it is typically the mean of the data. It can also be used to compute the second moment around a point that is not the mean. If it is missing or None (the default), the arithmetic mean is automatically calculated.

In [None]:
from statistics import *
data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]
statistics.pvariance(data)

In [None]:
# If you have already calculated the mean of your data, 
# you can pass it as the optional second argument mu to avoid recalculation:

In [None]:
mu = statistics.mean(data)
print(f"Mean : {mu}")
pv=statistics.pvariance(data, mu)
print(f"Pvariance : {pv}")

In [None]:
mu = statistics.mean(data)
print(f"Mean : {mu}")
pv=statistics.pvariance(data, 1.95)
print(f"Pvariance : {pv}")

### statistics.pstdev(data, mu=None)
Return the population standard deviation (the square root of the population variance). See pvariance() for arguments and other details.

In [None]:
list2 = [1.5, 2.5, 2.5, 2.75, 3.25, 4.75]
Mean = mean(list2)
print(f"Mean : {Mean}")


ps=pstdev(list2)

print(f"Pstdev : {ps}")

In [None]:
list2 = [1.5, 2.5, 2.5, 2.75, 3.25, 4.75]
Mean = mean(list2)
print(f"Mean : {Mean}")
ps=pstdev(list2,3.2)
print(f"Pstdev : {ps}")

### statistics.variance(data, xbar=None)
Return the sample variance of data, an iterable of at least two real-valued numbers. Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean.
If the optional second argument xbar is given, it should be the mean of data. If it is missing or None (the default), the mean is automatically calculated.
Use this function when your data is a sample from a population. To calculate the variance from the entire population, see `pvariance()`.

In [None]:
data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
Mean = mean(data)
print(f"Mean : {Mean}")
print(f"Sample Variance : {variance(data)}")
print(f"Population Variance : {pvariance(data)}")

In [None]:
data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5]
Mean = mean(data)
print(f"Mean : {Mean}")
print(f"Sample Variance : {variance(data,1.8)}")
print(f"Population Variance : {pvariance(data)}")

### statistics.stdev(data, xbar=None)
Return the sample standard deviation (the square root of the sample variance). See variance() for arguments and other details.    

In [None]:
Data = [1.5, 2.5, 2.5, 2.75, 3.25, 4.75]
stdev(Data), variance(Data)
# The square of stdev is called variance

# Example Problem
You grow 20 crystals from a solution and measure the length of each crystal in millimeters. Here is your data:      
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4     
Calculate the sample standard deviation of the length of the crystals. 

In [None]:
# Here write your answer
data = [9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4]
import statistics
statistics.stdev(data)

## Statistics for relations between two inputs
These functions calculate statistics regarding relations between two inputs.
- covariance() : Sample covariance for two variables.
- correlation() : Pearson’s correlation coefficient for two variables.
- linear_regression() : Slope and intercept for simple linear regression.

## statistics.covariance(x, y, /)
Return the sample covariance of two inputs x and y. Covariance is a measure of the joint variability of two inputs.
Both inputs must be of the same length (no less than two), otherwise `StatisticsError` is raised.

<img src="images/11.png" width="450" height="400">

In [None]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [1, 2, 3, 1, 2, 3, 1, 2, 3]
print(f"Covariance(x,y) : {statistics.covariance(x, y)}")
z = [9, 8, 7, 6, 5, 4, 3, 2, 1]
print(f"Covariance(x,z) : {statistics.covariance(x, z)}")
print(f"Covariance(z,x) : {statistics.covariance(z, x)}")

## statistics.correlation(x, y, /)
Return the `Pearson’s` correlation coefficient for two inputs. Pearson’s correlation coefficient r takes values between `-1` and `+1`. It measures the strength and direction of the linear relationship, where +1 means very strong, positive linear relationship, -1 very strong, negative linear relationship, and 0 no linear relationship.
Both inputs must be of the same length (no less than two), and need not to be constant, otherwise StatisticsError is raised.

<img src="images/10.png" width="450" height="400">

In [None]:
x = [1, 2, 3, 4, 5, 6, 7, 8, 9]
y = [9, 8, 7, 6, 5, 4, 3, 2, 1]
print(f"Correlation(x,y) : {correlation(x,y)}")
print(f"Correlation(x,x) : {correlation(x,x)}")

<img src="images/13.png" width="500" height="450">

## Check your Concepts

Try answering the following questions to test your understanding of the topics covered in this notebook:

1. What are modules in Python?
2. What is a Python library?
3. What is the Python Standard Library?
4. What are some popular Python libraries?
5. Where can you learn about the modules and functions available in the Python standard library?
6. How do you install a third-party library?
7. What is a module namespace? How is it useful?
8. What problems would you run into if Python modules did not provide namespaces?
9. How do you import a module?
10. How do you use a function from an imported module? Illustrate with an example.
11. What are some popular Python libraries?
12. What is the purpose of the `os` module in Python?
13. How do you identify the current working directory in a Jupyter notebook?
14. How do you retrieve the list of files within a directory using Python?
15. How do you create a directory using Python?
16. How do you check whether a file or directory exists on the filesystem? Hint: `os.path.exists`.
17. Where can you find the full list of functions contained in the `os` module?
18. Give examples of 5 useful functions from the `os` and `os.path` modules.
19. What are some popular Python libraries?


#### What is a Python library?
- A Python library is a collection of related modules. 
- It contains bundles of code that can be used repeatedly in different programs. 
- It makes Python Programming simpler and convenient for the programmer. As we don’t need to write the same code again and again for different programs. 
- Python libraries play a very vital role in fields of Machine Learning, Data Science, Data Visualization, etc.

### What is Python Standard Library?
- The Python Standard Library is a collection of script modules accessible to a Python program to simplify the programming process and removing the need to rewrite commonly used commands. 
- They can be used by 'calling/importing' them at the beginning of a script.

#### Exercise 1: Print current date,year , month, day, time and hour in Python.

In [None]:
import datetime
current = datetime.datetime.now()
print(f"Date : {current}")
print(f"Year : {current.year}")
print(f"Month : {current.month}")
print(f"Day : {current.day}")
print(f"Time : {current.time()}")
print(f"Hours : {current.hour}")

#### Exercise 2: Subtract a week (7 days)  from a given date in Python
- Input : given_date = datetime(2020, 2, 25)
- Output : 2020-02-18

In [None]:
import datetime
givenTime = datetime.datetime(2020,2,25)
resTime = givenTime - datetime.timedelta(days=7)
resTime