Matrix Chain Multiplication

For normal multiplication the commutative law indicates that \((AB)C = A(BC)\). This is not true from a computational perspective. Depending on the number of calculations it takes more or less time.

The Matrix Multiplication is defined as follows:

../../_images/001-matrix_multiplication.svg

Fig. 6 Matrix Multiplication Source: Wikipedia

A matrix with the size \(4\times 2\) and \(2 \times 3\) will result in a matrix with \(4x3\) elements. In general \(m\times n * n\times p \rightarrow m\times p\)

\[ \begin{align}\begin{aligned}\begin{split} \overset{4\times 2 \text{ matrix}}{\begin{bmatrix} {a_{11}} & {a_{12}} \\ {a_{21}} & {a_{22}} \\ {a_{31}} & {a_{32}} \\ {a_{41}} & {a_{42}} \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{2\times 3\text{ matrix}}{\begin{bmatrix} {b_{11}} & {b_{12}} & {b_{13}} \\ {b_{21}} & {b_{22}} & {b_{23}} \\ \end{bmatrix}}\end{split}\\\begin{split}= \overset{4\times 3\text{ matrix}}{\begin{bmatrix} c_{11} & c_{12} & c_{13} \\ c_{21} & c_{22} & c_{23} \\ c_{31} & c_{32} & c_{33} \\ c_{41} & c_{41} & c_{41} \\ \end{bmatrix}} \end{split}\end{aligned}\end{align} \]
\[c_{12} = a_{11}b_{12} + a_{12}b_{22}\]
\[c_{33} = a_{31}b_{13} + a_{32}b_{23}\]

In total there are \(4*3=12\) operations needed$

Example

\[ \begin{align}\begin{aligned}\begin{split} \overset{1 \text{ operation}}{(\overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 1 & 2 \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{2\times 1 \text{ matrix}}{\begin{bmatrix} 3 \\ 4 \\ \end{bmatrix}})}\end{split}\\\begin{split}\overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 5 & 6 \\ \end{bmatrix}}\end{split}\\\begin{split}= \overset{2 \text{ operations}}{ \overset{1\times 1 \text{ matrix}}{\begin{bmatrix} 1*3+2*4 \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 5 & 6 \\ \end{bmatrix}} } = \overset{1\times 1 \text{ matrix}}{\begin{bmatrix} 11 \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 5 & 6 \\ \end{bmatrix}}\end{split}\\\begin{split}= \overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 11*5 & 11*6 \\ \end{bmatrix}}\end{split}\\\begin{split}= \overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 55 & 66 \\ \end{bmatrix}} \end{split}\end{aligned}\end{align} \]
\[1 \times 2 \text{ matrix} * 2 \times 1 \text{ matrix} = 2 \times 1 \text{ matrix} \rightarrow 1 \text{ operation}\]
\[1 \times 1 \text{ matrix} * 1 \times 2 \text{ matrix} = 1 \times 2 \text{ matrix} \rightarrow 2 \text{ operations}\]
\[\text{In Total } = (1*1) + (2*1) = 1+2 = 3 \text{ operations}\]
\[ \begin{align}\begin{aligned}\begin{split} \overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 1 & 2 \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{4 \text{ operations}}{ (\overset{2\times 1 \text{ matrix}}{\begin{bmatrix} 3 \\ 4 \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 5 & 6 \\ \end{bmatrix}}) }\end{split}\\\begin{split}= \overset{2 \text{ operations}}{ \overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 1 & 2 \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{2\times 2 \text{ matrix}}{\begin{bmatrix} 3*5 & 3*6 \\ 4*5 & 4*6 \\ \end{bmatrix}} }\end{split}\\\begin{split}= \overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 1 & 2 \\ \end{bmatrix}}\end{split}\\\begin{split}\overset{2\times 2 \text{ matrix}}{\begin{bmatrix} 15 & 18 \\ 20 & 24 \\ \end{bmatrix}}\end{split}\\=\\\overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 1*15+2*20 & 1*18+2*24 \end{bmatrix}}\\=\\\overset{1\times 2 \text{ matrix}}{\begin{bmatrix} 55 & 66 \end{bmatrix}} \end{aligned}\end{align} \]
\[2 \times 1 \text{ matrix} * 1 \times 2 \text{ matrix} = 2 \times 2 \text{ matrix} \rightarrow 4 \text{ operations} \]
\[1 \times 2 \text{ matrix} * 2 \times 2 \text{ matrix} = 1 \times 2 \text{ matrix} \rightarrow 2 \text{ operations} \]
\[\text{In Total } = (2*2) + (1*2) = 4+2 = 6 \text{ operations}\]

Example 2

\(A\) is a \(10 \times 20\) matrix, \(B\) is a \(30 \times 40\) matrix, \(C\) is a \(50 \times 60\) matrix

All possible combinations: $\((AB)C = A(BC)\)$

Number of operations: $\( (AB)C = (10*20*40) + (10*40*60) = 32000 \text{ operations}\)\( \)\( A(BC) = (30*40*60) + (30*60*20) = 108000 \text{ operations}\)$

Solution

The best ordering can be found using recursive relationship. Let Matrix Chain Multiplication (MCM) denotes a function that returns a minimum number of scalar multiplications. Then MCM can be defined as the best split among all possible choices.

\[MCM(A_1, \dots, A_n) = min_i MCM(A_1, \dots, A_i) \bigotimes MCM(A_{i+1}, \dots, A_n)\]

Using dynamic programming and memoization, the problem can be solved in \(O(n^3)\) time.

Imports

Programatical reflections

def display(matrix_dict: dict, result_size: list, cost: int, order: list):
  """displays the calculation in a nice fashion

  Args:
      matrix_dict (dict): input matrix sizes
      result_size (list): result matrix size
      cost (int): calculation cost
      order (list): calculation order
  """
  _sizes = ""
  for i in matrix_dict:
    _sizes += "{}: [{} x {}] ".format(i, matrix_dict[i][0], matrix_dict[i][1])
  print("{} Matrixes: {}".format(len(matrix_dict), _sizes))
  _order = ""
  for item in order:
    if len(item) > 1:
      _order = "{}({})".format(_order, item)
    else:
      if _order == "":
        _order = "{}".format(item)
      else:
        _order = "({}{})".format(_order, item)
  print("  Result Matrix [{} x {}]".format(result_size[0], result_size[1]))
  print("  Calculation order: {}".format(_order))
  print("  Cost: {}".format(cost))

def calculation(matrix_dict: list, order: list):
  """Doing the inverse of what we should do, calculate the matrix costs and it's resulting size

  Args:
      matrix_dict (list): dict of input matrixes
      order (list): calculation order
  """
  _cost = 0
  _size = [0, 0]
  # loop through matrixes
  for elem in order:
    # not a single element
    if len(elem) > 1:
      # first element ever len 2
      if _size == [0, 0]:
        _size = [matrix_dict[elem[0]][0], matrix_dict[elem[1]][1]]
      else:
        _size = [matrix_dict[elem[0]][0], matrix_dict[elem[1]][1]]
      _cost += matrix_dict[elem[0]][0]*matrix_dict[elem[0]][1]*matrix_dict[elem[1]][1]
    else:
      # first element ever len 1
      if _size == [0, 0]: 
        _size = [matrix_dict[elem][0], matrix_dict[elem][1]]
      else:
        _size = [_size[0], matrix_dict[elem][1]]
      _cost += _size[0]*_size[1]*matrix_dict[elem][1]
  return _size, _cost
matrix_dict = {'A': [10, 20],
               'B': [30, 40],
               'C': [50, 60],
               'D': [70, 80],
               'E': [90, 100]}

cost = 0
cols = 0
rows = 0
order = ['AB', 'C']
display(matrix_dict, [cols, rows], cost, order)
order = ['A', 'BC']
display(matrix_dict, [cols, rows], cost, order)
order = ['AB', 'C', 'D']
display(matrix_dict, [cols, rows], cost, order)
order = ['A', 'BC', 'D']
display(matrix_dict, [cols, rows], cost, order)
order = ['A', 'B', 'CD']
display(matrix_dict, [cols, rows], cost, order)
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: ((AB)C)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: A(BC)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: (((AB)C)D)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: (A(BC)D)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: (AB)(CD)
  Cost: 0
matrix_dict = {'A': [10, 20],
               'B': [30, 40],
               'C': [50, 60],
               'D': [70, 80],
               'E': [90, 100]}

cost = 0
cols = 0
rows = 0
order = ['AB', 'C']
display(matrix_dict, [cols, rows], cost, order)
order = ['A', 'BC']
display(matrix_dict, [cols, rows], cost, order)
order = ['AB', 'C', 'D']
display(matrix_dict, [cols, rows], cost, order)
order = ['A', 'BC', 'D']
display(matrix_dict, [cols, rows], cost, order)
order = ['A', 'B', 'CD']
display(matrix_dict, [cols, rows], cost, order)

# (AB)C
order = ['AB', 'C']
cost = matrix_dict['A'][0]*matrix_dict['A'][1]*matrix_dict['B'][1]+matrix_dict['A'][0]*matrix_dict['B'][1]*matrix_dict['C'][1]
cols = matrix_dict['A'][0]
rows = matrix_dict['C'][1]
display(matrix_dict, [cols, rows], cost, order)

# A(BC)
order = ['A', 'BC']
cost = matrix_dict['B'][0]*matrix_dict['B'][1]*matrix_dict['C'][1]+matrix_dict['B'][0]*matrix_dict['C'][1]*matrix_dict['A'][1]
cols = matrix_dict['A'][0]
rows = matrix_dict['C'][1]
display(matrix_dict, [cols, rows], cost, order)
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: ((AB)C)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: A(BC)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: (((AB)C)D)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: (A(BC)D)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [0 x 0]
  Calculation order: (AB)(CD)
  Cost: 0
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [10 x 60]
  Calculation order: ((AB)C)
  Cost: 32000
5 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] D: [70 x 80] E: [90 x 100] 
  Result Matrix [10 x 60]
  Calculation order: A(BC)
  Cost: 108000
# Calculate the same with the function
matrix_dict = {'A': [10, 20],
               'B': [30, 40],
               'C': [50, 60]}

# (AB)C
order = ['AB', 'C']
calculation(matrix_dict, order)
display(matrix_dict, [cols, rows], cost, order)

# A(BC)
order = ['A', 'BC']
calculation(matrix_dict, order)
display(matrix_dict, [cols, rows], cost, order)
3 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] 
  Result Matrix [10 x 60]
  Calculation order: ((AB)C)
  Cost: 108000
3 Matrixes: A: [10 x 20] B: [30 x 40] C: [50 x 60] 
  Result Matrix [10 x 60]
  Calculation order: A(BC)
  Cost: 108000

Algorithm

def minimal_mcm(chain: list):
  """Calculation of the matrix chain mulltiplication variant with the least cost

  Args:
      chain (list): list of list of matrix sizes [['Name', row-size, col-size],...]

  Returns:
      dict: of {result-row-size. result-col-size, cost, order}
  """
  n = len(chain)

  # single matrix chain has zero cost
  aux = {(i, i): (0,) + chain[i] for i in range(n)}
  # i: length of subchain
  for i in range(1, n):
      # j: starting index of subchain
      for j in range(0, n - i):
          best = float('inf')

          # k: splitting point of subchain
          for k in range(j, j + i):
              # multiply subchains at splitting point
              lcost, lname, lrow, lcol = aux[j, k]
              rcost, rname, rrow, rcol = aux[k + 1, j + i]
              cost = lcost + rcost + lrow * lcol * rcol
              name = '(%s%s)' % (lname, rname)

              # pick the best one
              if cost < best:
                  best = cost
                  aux[j, j + i] = cost, name, lrow, rcol

  return dict(zip(['cost', 'name', 'row-size', 'col-size'], aux[0, n - 1]))

Test

print(minimal_mcm([('A', 10, 20), ('B', 30, 40), ('C', 50, 60)]))

print(minimal_mcm([('A', 10, 20), ('B', 30, 40), ('C', 50, 60), ('D', 70, 80), ('E', 90, 100)]))
{'cost': 32000, 'name': '((AB)C)', 'row-size': 10, 'col-size': 60}
{'cost': 160000, 'name': '((((AB)C)D)E)', 'row-size': 10, 'col-size': 100}