# Matrix multiplication refresher

In [23]:
# https://www.ibm.com/docs/en/watson-studio-local/1.2.3?topic=notebooks-markdown-jupyter-cheatsheet

In [2]:
# 1D matrices (Vectors)
v1 = [1,2,3]
v2 = [4,5,6]

# 2D matrices
a1 = [
  [1.0, 2.0],
  [3.0, 4.0]
]
a2 = [
  [5.0, 6.0],
  [7.0, 8.0]
]

In NumPy, the np.dot() function is overloaded to perform matrix multiplication when both of the input arguments are 2D arrays (matrices). For 1D arrays (vectors), it calculates the dot product, which results in a scalar.

While np.dot() handles both, NumPy does offer a dedicated function for matrix multiplication: np.matmul() or its shorthand operator @.

np.dot() vs. np.matmul() The key difference between the two functions becomes apparent with higher-dimensional arrays or when one of the inputs is a 1D array.

np.dot():

If both arguments are 1D arrays, it performs a dot product and returns a scalar.

If both are 2D arrays, it performs matrix multiplication.

If one argument is an N-dimensional array and the other is a 1D array, it treats the 1D array as a vector and computes a sum product over the last axis of the first array and the first axis of the second array.

np.matmul() (or @):

This function is specifically designed for matrix multiplication.

It strictly follows matrix multiplication rules. For example, for 2D inputs, it behaves identically to np.dot().

For arrays with more than two dimensions (e.g., 3D tensors used in deep learning), np.matmul() handles broadcasting in a way that is more predictable and intuitive for batch operations. It treats the last two dimensions as matrices and performs matrix multiplication on them, leaving the leading dimensions for broadcasting.

For most cases involving 2D arrays, np.dot() and np.matmul() will produce the same result. However, np.matmul() is generally the preferred choice for matrix multiplication due to its clearer intent and more consistent behavior with higher-dimensional data, which is common in machine learning.

In [7]:
import numpy as np

np_v1 = np.array(v1)
np_v2 = np.array(v2)
np_a1 = np.array(a1)
np_a2 = np.array(a2)

print("Using np.dot():", np.dot(np_v1, np_v2))

# matmul() also works for vectors
print("Using np.matmul():", np.matmul(np_v1, np_v2))

print("Using np.dot():\n", np.dot(np_a1, np_a2))

print("Using np.matmul():\n", np.matmul(np_a1, np_a2))

print("Using @:\n", np_a1 @ np_a2)

# Element-wise multiplication using multiply()
print("Using np.multiply():", np.multiply(np_v1, np_v2))

print("Using np.multiply():\n", np.multiply(np_a1, np_a2))

# Element-wise multiplication using * operator
print("Using * operator:", np_v1 * np_v2)
print("Using *:\n", np_a1 * np_a2)

Using np.dot(): 32
Using np.matmul(): 32
Using np.dot():
 [[19. 22.]
 [43. 50.]]
Using np.matmul():
 [[19. 22.]
 [43. 50.]]
Using @:
 [[19. 22.]
 [43. 50.]]
Using np.multiply(): [ 4 10 18]
Using np.multiply():
 [[ 5. 12.]
 [21. 32.]]
Using * operator: [ 4 10 18]
Using *:
 [[ 5. 12.]
 [21. 32.]]


In [3]:
array1 = [1,2,3,4]
array2 = [5,6,7,8]
array3 = [
  [1.0, 2.0],
  [3.0, 4.0]
]
array4 = [
  [5.0, 6.0],
  [7.0, 8.0]
]

Just like NumPy, TensorFlow's tf.matmul() function is the primary way to perform matrix multiplication. There is no tf.dot() function for general-purpose use, as the dot product is implicitly handled by other functions depending on the context.

tf.matmul()
tf.matmul() is the main function for matrix multiplication. It works for 2D tensors (matrices) and is also designed to handle higher-dimensional tensors efficiently for batch operations, which is a common task in machine learning. .

Python

import tensorflow as tf

# Define two 2x2 tensors
A = tf.constant([[1, 2], [3, 4]])
B = tf.constant([[5, 6], [7, 8]])

# Perform matrix multiplication
C = tf.matmul(A, B)

print(C)
Output:

tf.Tensor(
[[19 22]
 [43 50]], shape=(2, 2), dtype=int32)
The Dot Product in TensorFlow
While there isn't a dedicated tf.dot() function, the concept of a dot product is handled in a few ways:

For two 1D tensors (vectors): TensorFlow doesn't have a single tf.dot() function like NumPy. Instead, you can achieve a dot product by performing a matrix multiplication with the vectors properly reshaped. A more common and often more efficient way to calculate a dot product for two vectors is to use element-wise multiplication followed by a reduce_sum operation.

Python

import tensorflow as tf

u = tf.constant([1, 2, 3])
v = tf.constant([4, 5, 6])

# This is the most common way to get a dot product in TensorFlow
dot_product = tf.reduce_sum(u * v)

print(dot_product)
Output:

tf.Tensor(32, shape=(), dtype=int32)
The operation u * v performs element-wise multiplication, resulting in [4, 10, 18]. tf.reduce_sum() then sums these elements to produce the final scalar result, 32.

In [18]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
tf_v1 = tf.constant(v1)
tf_v2 = tf.constant(v2)
tf_a1 = tf.constant(a1)
tf_a2 = tf.constant(a2)

In [16]:
print("element-wise multiplication:", tf_v1 * tf_v2)
print("dot-product using element-wise multiplication:", tf.reduce_sum(tf_v1 * tf_v2))    

# print("element-wise multiplication:", tf_v1 @ tf_v2)
print("matrix multiplication using @:\n", tf_a1 @ tf_a2)
print("matrix multiplication using tf.matmul:\n", tf.matmul(tf_a1, tf_a2))

element-wise multiplication: tf.Tensor([ 4 10 18], shape=(3,), dtype=int32)
dot-product using element-wise multiplication: tf.Tensor(32, shape=(), dtype=int32)
matrix multiplication using @:
 tf.Tensor(
[[19. 22.]
 [43. 50.]], shape=(2, 2), dtype=float32)
matrix multiplication using tf.matmul:
 tf.Tensor(
[[19. 22.]
 [43. 50.]], shape=(2, 2), dtype=float32)


In [9]:
import torch

In PyTorch, there are several ways to perform matrix multiplication and the dot product, similar to NumPy and TensorFlow.

Matrix Multiplication (torch.matmul)
The primary function for matrix multiplication in PyTorch is torch.matmul(). It's designed to handle 2D matrices as well as higher-dimensional tensors with batching, making it the most common choice for linear operations in neural networks. .

Python

import torch

# Define two 2x2 tensors (matrices)
A = torch.tensor([[1, 2], [3, 4]])
B = torch.tensor([[5, 6], [7, 8]])

# Perform matrix multiplication
C = torch.matmul(A, B)

print(C)
Output:

tensor([[19, 22],
        [43, 50]])
Similar to NumPy, PyTorch also supports the @ operator as a shorthand for torch.matmul().

The Dot Product (torch.dot)
Unlike TensorFlow, PyTorch has a dedicated function, torch.dot(), specifically for calculating the dot product of two 1D vectors. This function will raise an error if you try to use it on tensors with more than one dimension.

Python

import torch

# Define two 1D tensors (vectors)
u = torch.tensor([1, 2, 3])
v = torch.tensor([4, 5, 6])

# Calculate the dot product
dot_product = torch.dot(u, v)

print(dot_product)
Output:

tensor(32)
As with TensorFlow and NumPy, you can also compute the dot product by performing element-wise multiplication followed by a sum.

Python

import torch

u = torch.tensor([1, 2, 3])
v = torch.tensor([4, 5, 6])

# Element-wise multiplication followed by sum
dot_product_manual = torch.sum(u * v)

print(dot_product_manual)

In [19]:
import torch
t_v1 = torch.tensor(v1)
t_v2 = torch.tensor(v2)
t_a1 = torch.tensor(a1)
t_a2 = torch.tensor(a2)


In [20]:
t_v1 @ t_v2

tensor(32)

In [21]:
torch.dot(t_v1, t_v2)

tensor(32)