#Array Manipulation and File Handling

###Case Study: Movie Rating System

**Problem Statement:** 

A movie review website needs to create a database to store their data and perform operations on it. Create a dummy model with 100 users and 1000 movies to explain how it will work.

**Topics Covered:**
- NumPy
    - numpy.ndarray
    - Statistical operations using NumPy
    - Mathematical operations using NumPy
    
**Tasks to be performed:**
1. Generate 1000 movie IDs starting from 1301
2. Create a matrix movies_matrix, to store users rating such that
    - There are 100 users
    - Each user can review as many movies as they want
    - The review should be in between 0 to 10 (both inclusive)
3. We have ten movie experts, let us take their reviews as well. 50 new movies and their reviews have to be added 
4. Create final_movie_rating matrix with four columns i.e., 'Movie ID', 'Average rating', 'Number of ratings', and           'Standard deviation of ratings'
5. Convert the final movie ratings to have range from 0 to 10, such that the minimum rating converts to 0 and maximum to 10, and the other values in between
6. Display the movies rating-wise, highest to lowest

In [None]:
import numpy as np

In [None]:
np.set_printoptions(formatter={'float_kind':'{:.3f}'.format})  #Sets the display setting so that all the float values will have 3 values after decimal

#### Task 1: Create a NumPy array of 1000 movie IDs starting from 1301

In [None]:
movie_id = np.arange(1301,2301)
print(movie_id[:10])
movie_id.size

[1301 1302 1303 1304 1305 1306 1307 1308 1309 1310]


1000

In [None]:
user_id = []
for i in range(20201,20301):
  user_id.append(i) 
user_id = np.array(user_id)

In [None]:
user_id.size #checking the number of elements in numpy array

100

#### Other ways to create a numpy array

In [None]:
#Creating a Single-Dimensional Array

arr = np.array([2,3,7]) #Calling the Array Function
print(arr)

[2 3 7]


In [None]:
#Creating a Multi-Dimensional Array

arr = np.array([[10,20,30],[40,50,60]])
print(arr)
print(type(arr))

[[10 20 30]
 [40 50 60]]
<class 'numpy.ndarray'>


In [None]:
#Using np.empty()

arr = np.empty([2,2], dtype=complex) #Returns a new array with specified shape & type filled with random values
print(arr)

[[0.00000000e+000+0.00000000e+000j 0.00000000e+000+0.00000000e+000j]
 [0.00000000e+000+5.59282311e-321j 8.42714149e-300+7.96686663e-300j]]


In [None]:
#Using np.full()

X = np.full((2,3), 5) #Returns an array with specified shape & filled with specified value
print(X)

[[5 5 5]
 [5 5 5]]


In [None]:
#Using np.zeros()

arr = np.zeros([2,3]) #Returns a new array with specified shape filled with zeroes
print(arr)

[[0.000 0.000 0.000]
 [0.000 0.000 0.000]]


In [None]:
#Using np.ones()

arr = np.ones([3,5]) #Returns a new array with specified shape filled with ones
print(arr)

[[1.000 1.000 1.000 1.000 1.000]
 [1.000 1.000 1.000 1.000 1.000]
 [1.000 1.000 1.000 1.000 1.000]]


In [None]:
#Using np.eye()

X = np.eye(5) #returns an array of identical matrix of size 5X5
X

array([[1.000, 0.000, 0.000, 0.000, 0.000],
       [0.000, 1.000, 0.000, 0.000, 0.000],
       [0.000, 0.000, 1.000, 0.000, 0.000],
       [0.000, 0.000, 0.000, 1.000, 0.000],
       [0.000, 0.000, 0.000, 0.000, 1.000]])

In [None]:
#Using np.linspace()

arr = np.linspace(0, 20, 5) #Returns a Linearly-Spaced Vector spacing
print(arr)

[0.000 5.000 10.000 15.000 20.000]


#### Task 2: Create a matrix movies_matrix, to store users rating such that
    - There are 100 users
    - Each user can review as many movies as they want
    - The review should be in between 0 to 10 (both inclusive)
    - The movies which are not reviewed by a user should have value -1

In [None]:
import random
movie_matrix = []
for user in range(100):
  movies_rated_by_me = np.full(1000,-1)
  num_movies_rated = random.randint(0,999)
  #random.seed(num_movies_rated)
  movies_that_i_will_rate = random.sample(range(0,1000),num_movies_rated)
  #user rating for movies they chosed to rate
  for index in movies_that_i_will_rate:
    movies_rated_by_me[index] = random.randint(0,10)
  movie_matrix.append(movies_rated_by_me)



In [None]:
movie_matrix = np.array(movie_matrix)

Attributes of the numpy ndarray class

In [None]:
print(movie_matrix)
print('Shape of the array',movie_matrix.shape) #100 users(rows) and 1000 movies(columns)
print('Number of elements in the array',movie_matrix.size)
print('Number of dimensions in the array',movie_matrix.ndim)

[[-1 -1 -1 ... -1 -1  8]
 [ 6  7 -1 ... -1  2  8]
 [ 0  0  5 ...  4  0  6]
 ...
 [-1 -1  2 ... -1 -1 -1]
 [ 1  7 -1 ... -1  2 -1]
 [-1 -1 10 ... -1 -1  6]]
Shape of the array (100, 1000)
Number of elements in the array 100000
Number of dimensions in the array 2


#### Task 3: Add the reviews of 10 movie experts and 50 more movies

In [None]:
#Expert movie reviews

expert_matrix = []
for user in range(10):
  movies_rated_by_me = np.full(1000,-1)
  num_movies_rated = random.randint(0,999)
  #random.seed(num_movies_rated)
  movies_that_i_will_rate = random.sample(range(0,1000),num_movies_rated)
  #user rating for movies they chosed to rate
  for index in movies_that_i_will_rate:
    movies_rated_by_me[index] = random.randint(0,10)
  expert_matrix.append(movies_rated_by_me)
expert_matrix = np.array(expert_matrix)
print(expert_matrix.shape)
expert_matrix

(10, 1000)


array([[-1, -1, -1, ..., -1, -1,  3],
       [-1, -1,  2, ..., -1, -1,  4],
       [ 0, -1,  4, ...,  3, -1, -1],
       ...,
       [-1,  2, -1, ..., 10, -1, -1],
       [ 3,  0,  1, ..., -1, -1,  9],
       [-1, -1, -1, ..., -1, -1, -1]])

In [None]:
#adding these expert reviews with original reviews
movie_matrix = np.vstack([movie_matrix,expert_matrix]) # vstack is used to stack arrays vertically
movie_matrix.shape

(110, 1000)

In [None]:
new_movies_matrix = []
for user in range(110):
  movies_rated_by_me = np.full(50,-1)
  num_movies_rated = random.randint(0,49)
  #random.seed(num_movies_rated)
  movies_that_i_will_rate = random.sample(range(0,50),num_movies_rated)
  #user rating for movies they chosed to rate
  for index in movies_that_i_will_rate:
    movies_rated_by_me[index] = random.randint(0,10)
  new_movies_matrix.append(movies_rated_by_me)
new_movies_matrix = np.array(new_movies_matrix)
print(new_movies_matrix.shape)
new_movies_matrix

(110, 50)


array([[ 3,  6, 10, ..., -1, -1, -1],
       [-1,  4, -1, ..., -1, -1, -1],
       [ 9,  9,  3, ...,  0, 10,  9],
       ...,
       [-1,  9,  0, ...,  6,  7,  2],
       [-1, -1,  4, ..., -1,  4, -1],
       [-1, -1,  5, ..., -1, -1, -1]])

In [None]:
new_movies_id = np.append(movie_id,np.arange(2301,2351))
new_movies_id.size

1050

In [None]:
#adding the reviews of new movies with original reviews
movie_matrix = np.hstack([movie_matrix,new_movies_matrix]) # hstack is used to stack arrays horizontally
print(movie_matrix.shape) #100 users + 10 experts(rows) and 1000 + 50 movies(columns)

(110, 1050)


We can see now that we have 110 users and 1050 movies.

#### Indexing and Slicing in NumPy

In [None]:
#Indexing 
a = np.arange(1,10)
print("Accessed Element:",a[5]) #Accessing a Single Element

Accessed Element: 6


In [None]:
print('Given Array:',a) 
print('Retrieved Elements:',a[3:])

Given Array: [1 2 3 4 5 6 7 8 9]
Retrieved Elements: [4 5 6 7 8 9]


In [None]:
print('Given Array:',a) 
print('Retrieved Elements:',a[1:7:2])

Given Array: [1 2 3 4 5 6 7 8 9]
Retrieved Elements: [2 4 6]


In [None]:
#Slicing
print('Given Array:',a) 
slice1 = slice(2,7,2) #slice() function to access multiple elements
print(a[slice1])
print(a[2:7:2]) #without using slice function

Given Array: [1 2 3 4 5 6 7 8 9]
[3 5 7]
[3 5 7]


In [None]:
#Indexing
arr = np.array([[2,4,5],[6,8,1],[4,1,6],[9,1,4]]) #Creating a Multi-Dimensional Array
print("Original Array:\n", arr)
print('Retrieved Element:', arr[0,1]) #In a 2-dimensional array we can access elements using [rows,column] indexes 

Original Array:
 [[2 4 5]
 [6 8 1]
 [4 1 6]
 [9 1 4]]
Retrieved Element: 4


In [None]:
#Slicing
print("Original Array:\n", arr)
print("Sliced Array1:\n",arr[0:3,0:2]) #Slicing the first 3 rows & 2 columns from arr
print("Sliced Array2:\n",arr[:3,:]) #Slicing the first 3 rows 
print("Sliced Array3:\n",arr[:,:2]) #Slicing the first 2 columns 

Original Array:
 [[2 4 5]
 [6 8 1]
 [4 1 6]
 [9 1 4]]
Sliced Array1:
 [[2 4]
 [6 8]
 [4 1]]
Sliced Array2:
 [[2 4 5]
 [6 8 1]
 [4 1 6]]
Sliced Array3:
 [[2 4]
 [6 8]
 [4 1]
 [9 1]]


#### Statistical Function in NumPy

In [None]:
#amin() & amax() - Returns the minimum & the maximum from the elements in the given array along the given axis


arr = np.array([[1,2,3],[4,5,6],[7,8,9]])

print(np.amin(arr)) #Minimum along axis - 0

print(np.amin(arr,1)) #Minimum along axis - 1

print(np.amax(arr)) #Maximum along axis - 0

print(np.amax(arr,1)) #Maximum along axis - 1

1
[1 4 7]
9
[3 6 9]


In [None]:
arr = np.array([[4,9,2],
              [3,1,2]])
print('mean of the array is',arr.mean())
print('Standard deviation of the array is',arr.std()) #std() - Returns the Standard Deviation of the given data. It is the square-root of variance
print('median of the array is',np.median(arr))
print('minimum value in the array is',arr.min())

mean of the array is 3.5
Standard deviation of the array is 2.6299556396765835
median of the array is 2.5
minimum value in the array is 1


In [None]:
#mean() - Returns the Mean of the given data

arr = np.array([[0,5,7],
              [4,9,1],
              [3,1,7]])
print(np.mean(arr))

print(np.mean(arr,0)) #Mean along axis - 0

print(np.mean(arr,1)) #Mean along axis - 1

4.111111111111111
[2.333 5.000 5.000]
[4.000 4.667 3.667]


#### Task 4: Create final_movie_rating matrix with four columns i.e., 
- 'Movie ID'
- 'Average rating'
- 'Number of ratings'
- 'Standard deviation of ratings'

In [None]:
final_movie_rating = []
for i in range(1050):
  #each movie's rating
  m = movie_matrix[:,i] # taking all rows and the ith column
  m = m[m>=0]
  #total_rate = m.sum()
  total_num_rating = m.size
  rating = m.mean()
  standard_deviation = m.std()
  final_movie_rating.append([new_movies_id[i],rating,total_num_rating,standard_deviation])

In [None]:
final_movie_rating = np.array(final_movie_rating)
final_movie_rating

array([[1301.000, 4.651, 63.000, 3.381],
       [1302.000, 4.830, 53.000, 3.149],
       [1303.000, 4.548, 62.000, 2.911],
       ...,
       [2348.000, 4.667, 63.000, 2.917],
       [2349.000, 5.576, 66.000, 3.234],
       [2350.000, 4.210, 62.000, 3.054]])

#### Mathematical Operation using NumPy

In [None]:
#add() - Used to add arguments element-wise

a = np.array([[2,0],
              [4,9]])
b = np.array([[2,0],
              [4,9]])
print(np.add(a,b))
print("Another way to perform element-wise operation\n",a+b)
print("1 added to Matrix 'a' element-wise\n ",a+1)

[[ 4  0]
 [ 8 18]]
Another way to perform element-wise operation
 [[ 4  0]
 [ 8 18]]
1 added to Matrix 'a' element-wise
  [[ 3  1]
 [ 5 10]]


In [None]:
#subtract() - Used to subtract arguments element-wise

a = np.array([[10,20,30]])
b = np.array([[10,10,10]])
np.subtract(a,b)


array([[ 0, 10, 20]])

In [None]:
#multiply() - Used to add multiply arguments element-wise

a = np.array([[0,5,7],
              [4,9,1],
              [3,1,7]])
b = np.array([[10,10,10]])
np.multiply(a,b)

array([[ 0, 50, 70],
       [40, 90, 10],
       [30, 10, 70]])

In [None]:
#divide() - Used to divide arguments element-wise

a = np.array([[20,24,32],
              [4,8,18],
              [2,54,78]])
b = np.array([[2]])
np.divide(a,b)

array([[10.000, 12.000, 16.000],
       [2.000, 4.000, 9.000],
       [1.000, 27.000, 39.000]])

In [None]:
#around() - Used to round off the array elements

arr = [7.52, 21.58, 42.81, 43.72, 14.23, 32.64, 51.91, 2.50, 1.99] 
print("Input array : \n",arr) 

print ("\nRounded values : \n",np.around(arr)) 
print ("\nRounded values after decimal 1 : \n",np.around(arr,1)) 
print ("\nRounded values before decimal 1: \n",np.around(arr,-1)) 

Input array : 
 [7.52, 21.58, 42.81, 43.72, 14.23, 32.64, 51.91, 2.5, 1.99]

Rounded values : 
 [8.000 22.000 43.000 44.000 14.000 33.000 52.000 2.000 2.000]

Rounded values after decimal 1 : 
 [7.500 21.600 42.800 43.700 14.200 32.600 51.900 2.500 2.000]

Rounded values before decimal 1: 
 [10.000 20.000 40.000 40.000 10.000 30.000 50.000 0.000 0.000]


In [None]:
#floor() - Returns the greatest of integers present in the array but not not more than the input parameter

arr = [7.5, 21.5, 42.8, 43.7, 14.2, 32.6, 51.9, 2.5, 1.9] 
print("Input array : \n",arr) 

print("Floored Values:\n",np.floor(arr)) 

Input array : 
 [7.5, 21.5, 42.8, 43.7, 14.2, 32.6, 51.9, 2.5, 1.9]
Floored Values:
 [7.000 21.000 42.000 43.000 14.000 32.000 51.000 2.000 1.000]


In [None]:
#ceil() -Returns the ceiling of an input value

arr = [7.5, 21.5, 42.8, 43.7, 14.2, 32.6, 51.9, 2.5, 1.9] 
print("Input array : \n",arr) 

print("Ceiling Values:\n",np.ceil(arr)) 

Input array : 
 [7.5, 21.5, 42.8, 43.7, 14.2, 32.6, 51.9, 2.5, 1.9]
Ceiling Values:
 [8.000 22.000 43.000 44.000 15.000 33.000 52.000 3.000 2.000]


#### Task 5: Convert the final movie ratings to have range from 0 to 10, such that the minimum rating converts to 0 and maximum to 10, and the other values in between

In [None]:
#Converting the average of movie ratings to have the range 0 to 10
x = final_movie_rating[:,1]
old_range = x.max()-x.min()
new_range = (10 - 0)
final_movie_rating[:,1] = ((x - x.min())*(new_range/old_range) + 0)

In [None]:
final_movie_rating

array([[1301.000, 5.524, 50.000, 2.951],
       [1302.000, 1.706, 52.000, 2.869],
       [1303.000, 4.218, 57.000, 3.254],
       ...,
       [2348.000, 4.649, 49.000, 3.089],
       [2349.000, 6.364, 51.000, 2.774],
       [2350.000, 3.785, 50.000, 3.128]])

In [None]:
final_movie_rating.shape

(1050, 4)

#### Task 6: Display the movies rating-wise, highest to lowest

In [None]:
sorted_films = final_movie_rating[final_movie_rating[:,1].argsort()[::-1]]

In [None]:
print(sorted_films)

[[1588.000 10.000 54.000 2.974]
 [1731.000 9.816 41.000 3.077]
 [1717.000 9.575 46.000 3.179]
 ...
 [2018.000 0.589 44.000 2.599]
 [2140.000 0.374 50.000 3.005]
 [1971.000 0.000 53.000 2.485]]


**savetxt()** method is used to write files from a NumPy array.

In [None]:
np.savetxt('ranked_movie_data.csv',sorted_films, delimiter=',')

**genfromtxt()** method is used to read files in a NumPy array.

In [None]:
data = np.genfromtxt('ranked_movie_data.csv',delimiter = ',')

In [None]:
data

array([[1588.000, 10.000, 54.000, 2.974],
       [1731.000, 9.816, 41.000, 3.077],
       [1717.000, 9.575, 46.000, 3.179],
       ...,
       [2018.000, 0.589, 44.000, 2.599],
       [2140.000, 0.374, 50.000, 3.005],
       [1971.000, 0.000, 53.000, 2.485]])

**If you want to learn more about NumPy [Click Here](https://numpy.org/devdocs/user/quickstart.html).**