### What is NumPy?

- **Fundamental Package for Scientific Computing:**
  - NumPy is the essential package for scientific computing in Python.
  - Provides support for large, multi-dimensional arrays and matrices.

- **Library Components:**
  - Offers a multidimensional array object.
  - Includes derived objects such as masked arrays and matrices.
  - Provides a collection of routines for fast array operations.

- **Array Operations:**
  - **Mathematical:** Supports a wide range of mathematical operations.
  - **Logical:** Facilitates logical operations on arrays.
  - **Shape Manipulation:** Allows for efficient reshaping and resizing of arrays.
  - **Sorting and Selecting:** Includes functions for sorting and selecting elements.
  - **I/O Operations:** Supports input and output operations.
  - **Discrete Fourier Transforms:** Offers tools for performing discrete Fourier transforms.
  - **Basic Linear Algebra:** Provides basic linear algebra operations.
  - **Statistical Operations:** Includes basic statistical functions.
  - **Random Simulations:** Facilitates random number generation and simulations.

- **Core Object - ndarray:**
  - The central feature of NumPy is the `ndarray` object.
  - Encapsulates n-dimensional arrays of homogeneous data types.

### NumPy Arrays vs. Python Sequences

- **Fixed Size:**
  - NumPy arrays have a fixed size at creation.
  - Unlike Python lists, which can grow dynamically.
  - Changing the size of an ndarray creates a new array and deletes the original.

- **Homogeneous Data Type:**
  - All elements in a NumPy array must be of the same data type.
  - Elements are the same size in memory.

- **Efficiency in Operations:**
  - NumPy arrays support advanced mathematical and other operations on large data sets.
  - Such operations are typically more efficient and require less code than using Pythonâ€™s built-in sequences.

- **Wide Adoption in Scientific and Mathematical Packages:**
  - Many scientific and mathematical Python-based packages use NumPy arrays.
  - These packages often support Python-sequence input but convert such input to NumPy arrays for processing.
  - They frequently output results as NumPy arrays.

### Use Cases of NumPy in Algo Trading

NumPy is an essential library for algorithmic trading due to its efficiency in numerical computations and array manipulations. Here are some key use cases of NumPy in algo trading:

1. **Data Preparation and Cleaning**
   - **Data Transformation:** Convert raw trading data into structured formats for analysis.
     ```python
     import numpy as np
     raw_data = [[100, '2023-06-01'], [102, '2023-06-02'], [104, '2023-06-03']]
     structured_data = np.array(raw_data, dtype=[('price', 'f4'), ('date', 'U10')])
     print(structured_data)
     ```
   - **Handling Missing Values:** Fill or interpolate missing data points in price series.
     ```python
     prices = np.array([100, 102, np.nan, 105])
     cleaned_prices = np.where(np.isnan(prices), np.nanmean(prices), prices)
     print(cleaned_prices)
     ```

2. **Technical Analysis**
   - **Calculating Indicators:** Compute moving averages, Bollinger Bands, RSI, etc.
     ```python
     prices = np.array([100, 102, 104, 103, 105, 107])
     window = 3
     moving_avg = np.convolve(prices, np.ones(window)/window, mode='valid')
     print(moving_avg)
     ```
   - **Trend Analysis:** Detect trends and patterns in price movements.
     ```python
     prices = np.array([100, 101, 102, 103, 104, 105])
     gradient = np.gradient(prices)
     print(gradient)
     ```

3. **Portfolio Management**
   - **Risk Assessment:** Calculate covariance matrices and perform risk analysis.
     ```python
     returns = np.array([[0.01, 0.02, -0.01], [0.03, 0.01, 0.02], [0.01, -0.02, 0.03]])
     cov_matrix = np.cov(returns, rowvar=False)
     print(cov_matrix)
     ```
   - **Optimization:** Use linear algebra to find the optimal portfolio weights.
     ```python
     import numpy.linalg as la
     returns = np.array([[0.01, 0.02], [0.03, 0.01], [0.01, -0.02]])
     mean_returns = np.mean(returns, axis=0)
     cov_matrix = np.cov(returns, rowvar=False)
     weights = la.solve(cov_matrix, mean_returns)
     weights /= np.sum(weights)
     print(weights)
     ```

4. **Statistical Analysis**
   - **Simulating Price Paths:** Generate synthetic price data for backtesting.
     ```python
     np.random.seed(42)
     n_steps = 1000
     steps = np.random.choice([-1, 1], size=n_steps)
     random_walk = np.cumsum(steps)
     print(random_walk)
     ```
   - **Hypothesis Testing:** Perform statistical tests to validate trading strategies.
     ```python
     from scipy import stats
     returns = np.array([0.01, 0.02, -0.01, 0.03, -0.02])
     t_stat, p_value = stats.ttest_1samp(returns, 0)
     print(t_stat, p_value)
     ```

5. **Performance Measurement**
   - **Sharpe Ratio Calculation:** Measure risk-adjusted returns.
     ```python
     returns = np.array([0.01, 0.02, -0.01, 0.03, -0.02])
     risk_free_rate = 0.01
     sharpe_ratio = (np.mean(returns) - risk_free_rate) / np.std(returns)
     print(sharpe_ratio)
     ```
   - **Drawdown Analysis:** Identify and analyze maximum drawdowns in the portfolio.
     ```python
     prices = np.array([100, 110, 105, 115, 108])
     peak = np.maximum.accumulate(prices)
     drawdown = (prices - peak) / peak
     max_drawdown = np.min(drawdown)
     print(max_drawdown)
     ```






### Creating Numpy Arrays

In [1]:
# np.array
import numpy as np
np.array([1,2,3])

array([1, 2, 3])

In [2]:
# 2D and 3D
np.array([[1,2,3],[1,2,3]])

array([[1, 2, 3],
       [1, 2, 3]])

In [3]:
np.array([[[1,2,3],[1,2,3]],[[1,2,3],[1,2,3]]])

array([[[1, 2, 3],
        [1, 2, 3]],

       [[1, 2, 3],
        [1, 2, 3]]])

In [10]:
# dtype
np.array([1,2,3],dtype=str)

array(['1', '2', '3'], dtype='<U1')

In [12]:
# np.arange

np.arange(0,12,2)

array([ 0,  2,  4,  6,  8, 10])

In [14]:
# with reshape
np.arange(0,12).reshape(2,2,3)

array([[[ 0,  1,  2],
        [ 3,  4,  5]],

       [[ 6,  7,  8],
        [ 9, 10, 11]]])

In [22]:
# np.ones and np.zeros
np.ones((6))

array([1., 1., 1., 1., 1., 1.])

In [23]:
np.zeros((4,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [25]:
# np.random
np.random.random((3,4))

array([[0.96055385, 0.99323529, 0.76163768, 0.86443853],
       [0.24955442, 0.98569528, 0.92630859, 0.30797149],
       [0.96983587, 0.83431066, 0.14330608, 0.94973802]])

In [28]:
# np.linspace
np.linspace(1,100,25)

array([  1.   ,   5.125,   9.25 ,  13.375,  17.5  ,  21.625,  25.75 ,
        29.875,  34.   ,  38.125,  42.25 ,  46.375,  50.5  ,  54.625,
        58.75 ,  62.875,  67.   ,  71.125,  75.25 ,  79.375,  83.5  ,
        87.625,  91.75 ,  95.875, 100.   ])

In [34]:
# np.identity
np.identity(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

### Array Attributes

In [55]:
a1 = np.array([1,2,3],dtype=str)
a2 = np.array([[1,2,3],[1,2,3]])
a3 = np.array([[[1,2,3],[1,2,3]],[[1,2,3],[1,2,3]]])


In [40]:
# ndim
a3.ndim

3

In [46]:
# shape
a3.shape

(2, 2, 3)

In [49]:
# size
a3.size

12

In [57]:
# itemsize
a1.itemsize

4

In [None]:
# dtype




### Changing Datatype

In [58]:
# astype
a3.astype(np.int64)

array([[[1, 2, 3],
        [1, 2, 3]],

       [[1, 2, 3],
        [1, 2, 3]]])

### Array Operations

In [6]:
# scalar operations
ohlc_data_day1*2
# arithmetic


array([[200. , 203. , 199. , 202. ],
       [202. , 204. , 201. , 203.6],
       [203.6, 206. , 202. , 205. ]])

In [7]:
# relational
ohlc_data_day1 == 100.0

array([[ True, False, False, False],
       [False, False, False, False],
       [False, False, False, False]])

In [8]:
# vector operations
# arithmetic
ohlc_data_day1 * ohlc_data_Day2

array([[20000.  , 20452.25, 29800.25, 20301.  ],
       [20301.  , 20604.  , 20150.25, 20543.24],
       [20543.24, 20909.  , 20301.  , 20756.25]])

### Array Functions

In [23]:
ohlc_data_day1 = np.array([
    [100.1, 101.5, 99.5, 101.0],
    [101.3, 102.9, 100.5, 101.8],
    [101.8, 103.0, 101.0, 102.5]
])

ohlc_data_Day2 = np.array([
    [200.0, 201.5, 299.5, 201.0],
    [201.0, 202.0, 200.5, 201.8],
    [201.8, 203.0, 201.0, 202.5],
    [201.8, 203.0, 201.0, 202.5]
])



# max/min/sum/prod
# 0 -> col and 1 -> row
import numpy as np
np.prod(ohlc_data_day1)

1.1819891119907141e+24

In [12]:
# mean/median/std/var
np.var(ohlc_data_day1)

0.9333333333333327

In [15]:
# trigonomoetric functions
np.cos(ohlc_data_day1)

array([[ 0.86231887,  0.56609521,  0.51399138,  0.89200487],
       [ 0.89200487,  0.1015857 ,  0.99952063,  0.29720233],
       [ 0.29720233, -0.78223089,  0.89200487, -0.38779553]])

In [20]:
# dot product
np.dot(ohlc_data_day1,ohlc_data_Day2)

array([[80862.4 , 81354.5 , 90601.25, 81183.95],
       [81526.14, 82022.4 , 91362.8 , 81850.35],
       [82129.3 , 82629.2 , 92044.1 , 82455.95]])

In [21]:
# log and exponents
np.exp(ohlc_data_day1)

array([[2.68811714e+43, 1.20473052e+44, 1.63042546e+43, 7.30705998e+43],
       [7.30705998e+43, 1.98626484e+44, 4.43195591e+43, 1.62621611e+44],
       [1.62621611e+44, 5.39922761e+44, 7.30705998e+43, 3.27479708e+44]])

In [25]:
# round/floor/ceil
np.round(ohlc_data_day1)
np.ceil(ohlc_data_day1)


array([[101., 102., 100., 101.],
       [102., 103., 101., 102.],
       [102., 103., 101., 103.]])

### Indexing and Slicing

In [31]:


ohlc_data_day1 = np.array([
    [100.1, 101.5, 99.5, 101.0],
    [101.3, 102.9, 100.5, 101.8],
    [101.8, 103.0, 101.0, 102.5]
])

ohlc_data_Day2 = np.array([
    [200.0, 201.5, 299.5, 201.0],
    [201.0, 202.0, 200.5, 201.8],
    [201.8, 203.0, 201.0, 202.5],
    [201.8, 203.0, 201.0, 202.5]
])


In [27]:
a5 = [1,2,3,4,5]

In [30]:
a5[1:4]

[2, 3, 4]

In [35]:
ohlc_data_day1[1:,1:3]

array([[102.9, 100.5],
       [103. , 101. ]])

In [53]:
threedarray = np.arange(0,24).reshape(2,3,4)

In [54]:
threedarray

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [69]:
threedarray[:,::2,2:]

array([[[ 2,  3],
        [10, 11]],

       [[14, 15],
        [22, 23]]])

### Iterating

In [78]:
ohlc_data_day1 = np.array([
    [100.1, 101.5, 99.5, 101.0],
    [101.3, 102.9, 100.5, 101.8],
    [101.8, 103.0, 101.0, 102.5]
])

ohlc_data_Day2 = np.array([
    [200.0, 201.5, 299.5, 201.0],
    [201.0, 202.0, 200.5, 201.8],
    [201.8, 203.0, 201.0, 202.5],
    [201.8, 203.0, 201.0, 202.5]
])

a6 = [1,2,3,4,5]
for i in np.ravel(ohlc_data_day1):
    print(i)

100.1
101.5
99.5
101.0
101.3
102.9
100.5
101.8
101.8
103.0
101.0
102.5


### Reshaping

In [81]:
# reshape
np.arange(0,24).reshape(2,4,3)

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8],
        [ 9, 10, 11]],

       [[12, 13, 14],
        [15, 16, 17],
        [18, 19, 20],
        [21, 22, 23]]])

In [88]:
# Transpose
print(np.transpose(ohlc_data_day1))
print(ohlc_data_day1.T)

[[100.1 101.3 101.8]
 [101.5 102.9 103. ]
 [ 99.5 100.5 101. ]
 [101.  101.8 102.5]]
[[100.1 101.3 101.8]
 [101.5 102.9 103. ]
 [ 99.5 100.5 101. ]
 [101.  101.8 102.5]]


In [90]:
# ravel
np.ravel(ohlc_data_day1)

array([100.1, 101.5,  99.5, 101. , 101.3, 102.9, 100.5, 101.8, 101.8,
       103. , 101. , 102.5])

### Stacking

In [100]:
# horizontal stacking

ohlc_data_day1 = np.array([
    [100.1, 101.5, 99.5, 101.0],
    [101.3, 102.9, 100.5, 101.8],
    [101.8, 103.0, 101.0, 102.5]
])

ohlc_data_Day2 = np.array([
    [200.0, 201.5, 299.5, 201.0],
    [201.0, 202.0, 200.5, 201.8],
    [201.8, 203.0, 201.0, 202.5]
    
])



np.hstack((ohlc_data_day1,ohlc_data_Day2))

array([[100.1, 101.5,  99.5, 101. , 200. , 201.5, 299.5, 201. ],
       [101.3, 102.9, 100.5, 101.8, 201. , 202. , 200.5, 201.8],
       [101.8, 103. , 101. , 102.5, 201.8, 203. , 201. , 202.5]])

In [101]:
# Vertical stacking
np.vstack((ohlc_data_day1,ohlc_data_Day2))

array([[100.1, 101.5,  99.5, 101. ],
       [101.3, 102.9, 100.5, 101.8],
       [101.8, 103. , 101. , 102.5],
       [200. , 201.5, 299.5, 201. ],
       [201. , 202. , 200.5, 201.8],
       [201.8, 203. , 201. , 202.5]])

### Splitting

In [103]:
# horizontal splitting
a7 = np.hstack((ohlc_data_day1,ohlc_data_Day2))
a7

array([[100.1, 101.5,  99.5, 101. , 200. , 201.5, 299.5, 201. ],
       [101.3, 102.9, 100.5, 101.8, 201. , 202. , 200.5, 201.8],
       [101.8, 103. , 101. , 102.5, 201.8, 203. , 201. , 202.5]])

In [104]:
np.hsplit(a7,2)

[array([[100.1, 101.5,  99.5, 101. ],
        [101.3, 102.9, 100.5, 101.8],
        [101.8, 103. , 101. , 102.5]]),
 array([[200. , 201.5, 299.5, 201. ],
        [201. , 202. , 200.5, 201.8],
        [201.8, 203. , 201. , 202.5]])]

In [106]:
# vertical splitting

np.vsplit(a7,3)

[array([[100.1, 101.5,  99.5, 101. , 200. , 201.5, 299.5, 201. ]]),
 array([[101.3, 102.9, 100.5, 101.8, 201. , 202. , 200.5, 201.8]]),
 array([[101.8, 103. , 101. , 102.5, 201.8, 203. , 201. , 202.5]])]