<a href="https://colab.research.google.com/github/yellajaswanth/ML-Math-Playground/blob/main/Vector_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from math import sin, cos, pi
def to_cartesian(polar_vector):
  '''
  input: (distance towards angle, angle)
  output: (x,y) -> cartesian coordinates
  '''
  length, angle = polar_vector[0], polar_vector[1]
  # cosine is used to estimate horizontal distance and sine for vertical distance
  # i.e., x and y coodinates
  return (length*cos(angle), length*sin(angle))

angle = 37*pi/180 # converting degrees to radians. More details below.
to_cartesian((5, angle))

(3.993177550236464, 3.0090751157602416)

### **Why angle is converted in to radians?**

Radians and degrees are both units used to measure angles, but they express the size of the angle in different ways:

* Degrees are based on dividing a circle into 360 equal parts. Therefore, a full circle corresponds to an angle of 360 degrees.

* Radians are based on the radius of the circle. In this system, a full circle corresponds to an angle of 2π radians (approximately 6.28318 radians).

So, to convert an angle from degrees to radians, you multiply by 2π/360 or π/180. This is because there are π radians in 180 degrees (half a circle), so each degree is π/180 radians.

In [None]:
from math import atan, atan2, sqrt

def distance(vector):
  '''
  Pythagorean Theorem: c^2 = a^2 + b^2 
  '''
  return sqrt(vector[0]**2 + vector[1]**2)

def to_polar(vector):
  '''
  input: (x,y) -> cartesian coordinates
  output: (distance towards angle, angle)
  '''
  x, y = vector[0], vector[1]
  angle = atan2(y,x)
  return (distance(vector), angle)

to_polar((-2,3))

(3.605551275463989, 2.158798930342464)

### **How atan2 works**

**atan2(y, x)** is a function that computes the arc tangent of $y/x$ in radians, but unlike **atan(y/x)**, it takes two arguments instead of one. The inputs are the coordinates of a point $(x, y)$ that is not the origin. The result is the angle of the vector from the origin to this point, with respect to the positive X-axis.

Here's how it differs from simple atan:

* **atan2** gives a result in the range $(-\pi, \pi]$. This means it can return a unique angle for every unique point in the plane, except the origin (which doesn't have a well-defined angle).

* **atan(y/x)** can only give a result in $(-\pi/2, \pi/2)$. This is because it doesn't know about the signs of $x$ and $y$ individually, only their ratio.

* **atan2** is also defined when x is zero (resulting in $\pi/2$ or $-\pi/2$ depending if $y$ is positive or negative), whereas **atan(y/x)** is undefined when $x$ is zero.

In the context of programming and mathematics, **atan2** is commonly used over **atan** when converting Cartesian coordinates $(x, y)$ to polar coordinates $(r, theta)$. This is because it can handle cases that **atan** cannot, such as when $x$ is zero.

### **PageRank**

The **PageRank** algorithm is used by Google to rank web pages in its search results. The pages with the highest **PageRank** are ranked highest in the search results.

Here is a more detailed explanation of how **eigenvalues** and **eigenvectors** are used in **PageRank**:

* The **link matrix** is constructed. Each row of the matrix represents a page, and each column represents a page that links to the page in the row. The value in a cell of the matrix represents the number of links from the page in the column to the page in the row.
* The **eigenvalues** and **eigenvectors** of the **link matrix** are calculated. The **eigenvalues** of the **link matrix** represent the importance of the pages in the web graph, and the **eigenvectors** represent the **PageRank** of the pages.
* The **PageRank** of each page is calculated. The **PageRank** of a page is calculated by repeatedly multiplying the page's **PageRank** by the **link matrix**. The **PageRank** of a page converges to a fixed value after a number of iterations.
* The pages with the highest **PageRank** are ranked highest in the search results.


In [1]:
import numpy as np

def pagerank(M, num_iterations: int = 100, d: float = 0.85):
    """PageRank: The trillion dollar algorithm.
    
    Parameters
    ----------
    M : numpy array
        adjacency matrix where M_i,j represents the link from 'j' to 'i', such that for all 'j'
        sum(i, M_i,j) = 1
    num_iterations : int, optional
        number of iterations, by default 100
    d : float, optional
        damping factor, by default 0.85
        
    Returns
    -------
    numpy array
        a vector of ranks such that v_i is the i-th rank from [0, 1],
        v sums to 1
    
    """
    N = M.shape[1]
    v = np.random.rand(N, 1)
    v = v / np.linalg.norm(v, 1)  # make it a stochastic vector
    M_hat = (d * M + (1 - d) / N)
    for i in range(num_iterations):
        v = M_hat @ v
    return v


# The adjacency matrix, i.e., link matrix. 
# Here, each column represents a page, each row represents the links from a page.
# Assume there are 4 pages, and the link relationship is the same as the previous example.
M = np.array([[0, 0, 1, 1], 
              [0.5, 0, 0, 0], 
              [0.5, 1, 0, 0], 
              [0, 0, 0, 0]])

v = pagerank(M, 100, 0.85)

print(v)


[[0.38694178]
 [0.20195025]
 [0.37360797]
 [0.0375    ]]


We have four pages, and the PageRank algorithm has assigned the following ranks:

Page 0 (first page): 0.38694178  
Page 1 (second page): 0.20195025  
Page 2 (third page): 0.37360797  
Page 3 (fourth page): 0.0375  

These ranks represent the "importance" or "relevance" of each page. The page with the highest PageRank score (in this case, Page 0) is considered the most important within this network of four pages.
These scores imply that if a user were to randomly click on links, they would spend approximately 38.7% of their time on Page 0, 20.2% on Page 1, 37.4% on Page 2, and only 3.75% on Page 3.