<a href="https://vigneashpandiyan.github.io/publications/Codes/" target="_blank" rel="noopener noreferrer">
  <img src="https://vigneashpandiyan.github.io/images/Link.png"
       style="max-width: 800px; width: 100%; height: auto;">
</a>

 ## Data Manipulation

### PyTorch
PyTorch is an open-source machine learning library. It provides tools for building and training deep learning models, with a focus on flexibility and ease of use. The main datastructures in PyTorch are tensors, similiar to arrays in NumPy. The .tensor method is the basic way of creating a tensor from an array.

In [None]:
import torch

# Create a tensor
x = torch.tensor([1.0, 2.0, 3.0])

# Simple tensor operation
y = x * 2
print(y)


Many tensor operations are similiar to those done with NumPy arrays.

In [None]:
a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[10, 20], [30, 40]])
result = a + b    # Addition operation

print(result)

In [None]:
a = torch.tensor([2, 4, 6])
b = torch.tensor([1, 2, 3])
result = a * b   # Element wise multiplication operation
print(result)

In [None]:
a = torch.tensor([[1, 2], [3, 4]])
b = torch.tensor([[5, 6], [7, 8]])
result = torch.mm(a, b)    # Matrix multiplication operation
print(result)

To reshape a 1D tensor of 6 elements into a 2x3 tensor:

In [None]:
x = torch.arange(6)
reshaped = x.view(2, 3)
print(reshaped)

The following methods enable conversion between NumPy arrays and PyTorch tensors.

In [None]:
np_arr = np.array([1, 2, 3])
torch_tensor = torch.from_numpy(np_arr) #Array to Tensor
back_to_np = torch_tensor.numpy()   #Tensor to Array
print(torch_tensor)
print(back_to_np)

Excercise: Given the tensor 'a', add 3 to each element of 'a' and return sums inside a tensor named 'result'.

```
a = torch.tensor([[1], [2], [3]])
```



In [None]:
a = torch.tensor([[1], [2], [3]])

''' Your code here! '''

print(result)

Excercise: Given the below tensor, compute the sum along columns in a tensor named 'sum_cols'.

In [None]:
mat = torch.tensor([[1, 2, 3], [4, 5, 6]])

''' Your code here! '''

print(sum_cols)

### Pandas

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data. The module may be installed using the following,

```
pip install pandas
```
, but in colab it is preinstalled.
So it may be directly imported as such:


In [None]:
import pandas as pd
print(pd.__version__) #Print Pandas' version number


Series are to Pandas what Lists are to Python.







In [None]:
import pandas as pd

a = [1, 7, 2] #Python list

myvar = pd.Series(a)  # Series

print(myvar)
print(myvar[1]) #Retrieve by index

In [None]:
s = pd.Series([0, 1, 1, 2, 3, 5, 8])

In [None]:
print(s)

In [None]:
s = pd.Series([0.0, 1, 1, 2, 3, 5, 8])

In [None]:
print(s)

In [None]:
s.values

In [None]:
s.index

In [None]:
for v in s.values:
    print(v)

In [None]:
for i in s.index:
    print(i)

In [None]:
for item in zip(s.index, s.values):
    print(item)

In [None]:
s[0]

In [None]:
s[1]

In [None]:
s[5]

In [None]:
mercury = pd.Series([0.33, 57.9, 4222.6], index=['mass', 'diameter', 'dayLength'])

In [None]:
print(mercury)

In [None]:
mercury['mass']

In [None]:
mercury['dayLength']

In [None]:
mercury.mass

In [None]:
arr = np.random.randint(0, 10, 10)

In [None]:
arr

In [None]:
ind = np.arange(10, 20)

In [None]:
rand_series = pd.Series(arr, index=ind)

In [None]:
print(rand_series)

In [None]:
# mercury = pd.Series([0.33, 57.9, 4222.6], index=['mass', 'diameter', 'dayLength'])

d = {}
d['mass'] = 0.33
d['diameter'] = 57.9
d['dayLength'] = 4222.6


In [None]:
print(d)

In [None]:
mercury = pd.Series(d)

In [None]:
print(mercury)

In [None]:
mercury = pd.Series(d, index=['mass', 'diameter', 'dayLength'])

In [None]:
print(mercury)

In [None]:
mercury = pd.Series(d, index=['mass', 'diameter'])

In [None]:
print(mercury)

Pandas use the locate method to return one or more specified row(s). There are loc() and iloc() variants of it.

In [None]:
s = pd.Series([0.0, 1, 1, 2, 3, 5, 8], index=[1, 2, 3, 4, 5, 6, 7])

In [None]:
print(s)

In [None]:
s.loc[4]

In [None]:
s.iloc[4]

In [None]:
s.iloc[0]

In [None]:
s.loc[0]

In [None]:
mercury = pd.Series(d, index=['mass', 'diameter', 'dayLength'])

In [None]:
mercury.loc['mass']

In [None]:
mercury.iloc[0]

In [None]:
mercury.iloc[-1]

In [None]:
mercury.iloc[0:1]

In [None]:
mercury.loc[:'dayLength']

### Simple operations

In [None]:
mass = pd.Series([0.33, 4.87, 5.97, 0.642, 1898, 568, 86.8, 102, 0.0146],
                 index=['Mercury', 'Venus', 'Earth', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto'])

In [None]:
print(mass)

In [None]:
mass[1]

In [None]:
mass.iloc[1]

In [None]:
mass.loc['Earth']

In [None]:
mass['Earth']

In [None]:
mass['Earth': 'Jupiter']

In [None]:
mass[2:5]

In [None]:
mass.iloc[2:5]

In [None]:
mass > 100

In [None]:
mass[mass > 100]

In [None]:
mass[(mass > 100) & (mass < 600)]

In [None]:
mass

In [None]:
mass * 2

In [None]:
mass / 10

In [None]:
np.mean(mass)

In [None]:
np.amin(mass)

In [None]:
np.amax(mass)

In [None]:
np.median(mass)

In [None]:
mass + mass

In [None]:
mass - mass

In [None]:
big_mass = mass[mass > 100]

In [None]:
big_mass

In [None]:
mass

In [None]:
new_mass = mass + big_mass

In [None]:
print(new_mass)

In [None]:
pd.isnull(new_mass)

In [None]:
new_mass[~pd.isnull(new_mass)]

In [None]:
mass

In [None]:
mass['Moon'] = 0.7346

In [None]:
mass

In [None]:
mass.drop(['Pluto'])

**Task 1**

Collect numbers for the diameters of these planets (heavenly bodies) and store it as a Series object. Then given these two Series objects mass and diameter, compute the density of each planet.

In [None]:
diameter = pd.Series([4879, 12104, 12756, 3475, 6792, 142984, 120536, 51118, 49528, 2370],
                     index=['Mercury', 'Venus', 'Earth', 'Moon', 'Mars', 'Jupiter', 'Saturn', 'Uranus', 'Neptune', 'Pluto'])

In [None]:
density = pd.Series([])

In [None]:
print(density)

In [None]:
mass

In [None]:
diameter

In [None]:
for planet in mass.index:
    density[planet] = mass[planet] / (np.pi * diameter[planet] * diameter[planet] * diameter[planet] / 6)

In [None]:
print(density)

In [None]:
density = mass / (np.pi * np.power(diameter, 3) / 6)

In [None]:
density

In [None]:
mass['PlanetX'] = 6

In [None]:
density = mass / (np.pi * np.power(diameter, 3) / 6)

In [None]:
density

**Task 2**

Given this density Series, replace all values which NaNs with the mean density of all planets

In [None]:
density_mean = np.mean(density)

for key in density.index:
    if pd.isnull(density[key]):
        density[key] = density_mean

In [None]:
print(density)

In [None]:
density[pd.isnull(density)] = np.mean(density)

In [None]:
print(density)

**Task 3**

Compare Dictionary with Series:
- checking if some key is present
- summing values
- computing std

In [None]:
my_dict = {}
N = 1000000
for i in range(N):
    my_dict[i] = i%10

In [None]:
my_series = pd.Series(my_dict)

In [None]:
M = 10000

In [None]:
arr = np.random.randint(0, N, M)

In [None]:
%%timeit
for i in arr:
    i in my_dict

In [None]:
%%timeit
for i in arr:
    i in my_series

In [None]:
%%timeit
sum(my_dict.values())

In [None]:
import numpy as np

In [None]:
%%timeit
np.sum(my_series)

In [None]:
%%timeit
mean = sum(my_dict.values()) / N
variance = sum([(x - mean)**2 for x in my_dict.values()])
std = variance ** 0.5

In [None]:
%%timeit
np.std(my_series)

###Stock Market case-study

In [None]:
# @title
import pandas as pd
from io import StringIO

# Your data as a string
data = """Date,Close
01-Jan-2019,10910.10
02-Jan-2019,10792.50
03-Jan-2019,10672.25
04-Jan-2019,10727.35
07-Jan-2019,10771.80
08-Jan-2019,10802.15
09-Jan-2019,10855.15
10-Jan-2019,10821.60
11-Jan-2019,10794.95
14-Jan-2019,10737.60
15-Jan-2019,10886.80
16-Jan-2019,10890.30
17-Jan-2019,10905.20
18-Jan-2019,10906.95
21-Jan-2019,10961.85
22-Jan-2019,10922.75
23-Jan-2019,10831.50
24-Jan-2019,10849.80
25-Jan-2019,10780.55
28-Jan-2019,10661.55
29-Jan-2019,10652.20
30-Jan-2019,10651.80
31-Jan-2019,10830.95
01-Feb-2019,10893.65
04-Feb-2019,10912.25
05-Feb-2019,10934.35
06-Feb-2019,11062.45
07-Feb-2019,11069.40
08-Feb-2019,10943.60
11-Feb-2019,10888.80
12-Feb-2019,10831.40
13-Feb-2019,10793.65
14-Feb-2019,10746.05
15-Feb-2019,10724.40
18-Feb-2019,10640.95
19-Feb-2019,10604.35
20-Feb-2019,10735.45
21-Feb-2019,10789.85
22-Feb-2019,10791.65
25-Feb-2019,10880.10
26-Feb-2019,10835.30
27-Feb-2019,10806.65
28-Feb-2019,10792.50
01-Mar-2019,10863.50
05-Mar-2019,10987.45
06-Mar-2019,11053.00
07-Mar-2019,11058.20
08-Mar-2019,11035.40
11-Mar-2019,11168.05
12-Mar-2019,11301.20
13-Mar-2019,11341.70
14-Mar-2019,11343.25
15-Mar-2019,11426.85
18-Mar-2019,11462.20
19-Mar-2019,11532.40
20-Mar-2019,11521.05
22-Mar-2019,11456.90
25-Mar-2019,11354.25
26-Mar-2019,11483.25
27-Mar-2019,11445.05
28-Mar-2019,11570.00
29-Mar-2019,11623.90
01-Apr-2019,11669.15
02-Apr-2019,11713.20
03-Apr-2019,11643.95
04-Apr-2019,11598.00
05-Apr-2019,11665.95
08-Apr-2019,11604.50
09-Apr-2019,11671.95
10-Apr-2019,11584.30
11-Apr-2019,11596.70
12-Apr-2019,11643.45
15-Apr-2019,11690.35
16-Apr-2019,11787.15
18-Apr-2019,11752.80
22-Apr-2019,11594.45
23-Apr-2019,11575.95
24-Apr-2019,11726.15
25-Apr-2019,11641.80
26-Apr-2019,11754.65
30-Apr-2019,11748.15
02-May-2019,11724.75
03-May-2019,11712.25
06-May-2019,11598.25
07-May-2019,11497.90
08-May-2019,11359.45
09-May-2019,11301.80
10-May-2019,11278.90
13-May-2019,11148.20
14-May-2019,11222.05
15-May-2019,11157.00
16-May-2019,11257.10
17-May-2019,11407.15
20-May-2019,11828.25
21-May-2019,11709.10
22-May-2019,11737.90
23-May-2019,11657.05
24-May-2019,11844.10
27-May-2019,11924.75
28-May-2019,11928.75
29-May-2019,11861.10
30-May-2019,11945.90
31-May-2019,11922.80
03-Jun-2019,12088.55
04-Jun-2019,12021.65
06-Jun-2019,11843.75
07-Jun-2019,11870.65
10-Jun-2019,11922.70
11-Jun-2019,11965.60
12-Jun-2019,11906.20
13-Jun-2019,11914.05
14-Jun-2019,11823.30
17-Jun-2019,11672.15
18-Jun-2019,11691.50
19-Jun-2019,11691.45
20-Jun-2019,11831.75
21-Jun-2019,11724.10
24-Jun-2019,11699.65
25-Jun-2019,11796.45
26-Jun-2019,11847.55
27-Jun-2019,11841.55
28-Jun-2019,11788.85
01-Jul-2019,11865.60
02-Jul-2019,11910.30
03-Jul-2019,11916.75
04-Jul-2019,11946.75
05-Jul-2019,11811.15
08-Jul-2019,11558.60
09-Jul-2019,11555.90
10-Jul-2019,11498.90
11-Jul-2019,11582.90
12-Jul-2019,11552.50
15-Jul-2019,11588.35
16-Jul-2019,11662.60
17-Jul-2019,11687.50
18-Jul-2019,11596.90
19-Jul-2019,11419.25
22-Jul-2019,11346.20
23-Jul-2019,11331.05
24-Jul-2019,11271.30
25-Jul-2019,11252.15
26-Jul-2019,11284.30
29-Jul-2019,11189.20
30-Jul-2019,11085.40
31-Jul-2019,11118.00
01-Aug-2019,10980.00
02-Aug-2019,10997.35
05-Aug-2019,10862.60
06-Aug-2019,10948.25
07-Aug-2019,10855.50
08-Aug-2019,11032.45
09-Aug-2019,11109.65
13-Aug-2019,10925.85
14-Aug-2019,11029.40
16-Aug-2019,11047.80
19-Aug-2019,11053.90
20-Aug-2019,11017.00
21-Aug-2019,10918.70
22-Aug-2019,10741.35
23-Aug-2019,10829.35
26-Aug-2019,11057.85
27-Aug-2019,11105.35
28-Aug-2019,11046.10
29-Aug-2019,10948.30
30-Aug-2019,11023.25
03-Sep-2019,10797.90
04-Sep-2019,10844.65
05-Sep-2019,10847.90
06-Sep-2019,10946.20
09-Sep-2019,11003.05
11-Sep-2019,11035.70
12-Sep-2019,10982.80
13-Sep-2019,11075.90
16-Sep-2019,11003.50
17-Sep-2019,10817.60
18-Sep-2019,10840.65
19-Sep-2019,10704.80
20-Sep-2019,11274.20
23-Sep-2019,11600.20
24-Sep-2019,11588.20
25-Sep-2019,11440.20
26-Sep-2019,11571.20
27-Sep-2019,11512.40
30-Sep-2019,11474.45
01-Oct-2019,11359.90
03-Oct-2019,11314.00
04-Oct-2019,11174.75
07-Oct-2019,11126.40
09-Oct-2019,11313.30
10-Oct-2019,11234.55
11-Oct-2019,11305.05
14-Oct-2019,11341.15
15-Oct-2019,11428.30
16-Oct-2019,11464.00
17-Oct-2019,11586.35
18-Oct-2019,11661.85
22-Oct-2019,11588.35
23-Oct-2019,11604.10
24-Oct-2019,11582.60
25-Oct-2019,11583.90
27-Oct-2019,11627.15
29-Oct-2019,11786.85
30-Oct-2019,11844.10
31-Oct-2019,11877.45
01-Nov-2019,11890.60
04-Nov-2019,11941.30
05-Nov-2019,11917.20
06-Nov-2019,11966.05
07-Nov-2019,12012.05
08-Nov-2019,11908.15
11-Nov-2019,11913.45
13-Nov-2019,11840.45
14-Nov-2019,11872.10
15-Nov-2019,11895.45
18-Nov-2019,11884.50
19-Nov-2019,11940.10
20-Nov-2019,11999.10
21-Nov-2019,11968.40
22-Nov-2019,11914.40
25-Nov-2019,12073.75
26-Nov-2019,12037.70
27-Nov-2019,12100.70
28-Nov-2019,12151.15
29-Nov-2019,12056.05
02-Dec-2019,12048.20
03-Dec-2019,11994.20
04-Dec-2019,12043.20
05-Dec-2019,12018.40
06-Dec-2019,11921.50
09-Dec-2019,11937.50
10-Dec-2019,11856.80
11-Dec-2019,11910.15
12-Dec-2019,11971.80
13-Dec-2019,12086.70
16-Dec-2019,12053.95
17-Dec-2019,12165.00
18-Dec-2019,12221.65
19-Dec-2019,12259.70
20-Dec-2019,12271.80
23-Dec-2019,12262.75
24-Dec-2019,12214.55
26-Dec-2019,12126.55
27-Dec-2019,12245.80
30-Dec-2019,12255.85
31-Dec-2019,12168.45
"""

# Read data into pandas DataFrame
df = pd.read_csv(StringIO(data))

# Save to CSV file
df.to_csv("nifty.csv", index=False)

In [None]:
nifty = pd.read_csv('nifty.csv', index_col=0).iloc[:, 0]

In [None]:
nifty

In [None]:
nifty.head(25)

In [None]:
nifty.tail(25)

In [None]:
np.mean(nifty)

In [None]:
np.median(nifty)

In [None]:
np.std(nifty)

What fraction of days did the markets close higher than the previous day's close

In [None]:
nifty[0]

In [None]:
nifty[1]

In [None]:
nifty[1] - nifty[0]

In [None]:
nifty[1:]

In [None]:
nifty[:-1]

In [None]:
nifty[1:] - nifty[:-1]

In [None]:
np.sum((nifty.values[1:] - nifty.values[:-1]) > 0) / len(nifty)

**Tasks**

1. Compute moving average of the last 5 days

2. Subset the data to include only data for Fridays

In [None]:
nifty

In [None]:
nifty.index[0]

In [None]:
d = pd.Timestamp(nifty.index[0])

In [None]:
d.dayofweek

In [None]:
new_index = map(pd.Timestamp, nifty.index)

In [None]:
new_nifty = pd.Series(nifty, index = new_index)

In [None]:
new_nifty

In [None]:
new_nifty.index[0]

In [None]:
new_nifty.rolling('5d').mean()

In [None]:
dow = new_nifty.copy()
for i in dow.index:
    dow[i] = i.dayofweek

In [None]:
dow

In [None]:
new_nifty[dow == 4]