### Theory

In vector matrix multiplication, the number of columns of first matrix should be equal to the number of rows of the second matrix.

If matrix A having m columns and n rows is multiplied with matrix B having n rows and m columns, then the product of the two matrices C is a square matrix of m rows and m columns.

The general rule for product of two matrices is given by

(AB i,j) _{mxm} = ∑(A i,k) _{mxn}(B k,j) _{nxm}

The summation ranges over the interval 1≤ i ≤ m.

In block matrix multiplication, each matrix is divided into blocks of equal sizes. This technique proves to be very efficient for parallel matrix multiplication as each node can compute the product of the individual blocks.

Block matrix multiplication is used in Strassen’s algorithm for fast matrix multiplication.

### Flowchart

**Parallel Matrix Multiplication**

### Implementation

The initialization of the program variables and MPI environment is done initially. The code snippet given below demonstrates how the matrix row data is exchanged between client and server nodes. The server node generates the matrices, creates the divisions based on the ‘rank’ of each node and sends the block data to the client nodes.

```
if(rank==0) {
start = 0;
end = (msize/nproc);
MPI_Bcast(&matrix2[0][0],(msize*msize*3)+40,MPI_INT,0,MPI_COMM_WORLD);
for(i=0;i<nproc;i++){
start = unit * i;
end = unit * (i+1);
if(i!=0){
MPI_Send(&start,1,MPI_INT,i,1,MPI_COMM_WORLD);
MPI_Send(&end,1,MPI_INT,i,2,MPI_COMM_WORLD);
MPI_Send(&matrix1[start][0],
(msize*unit*3)+8,
MPI_INT,i,3,MPI_COMM_WORLD );
}
}
}
else{
MPI_Bcast(&matrix2[0][0],(msize*msize*3)+40,MPI_INT,0,MPI_COMM_WORLD);
MPI_Recv(&start,1,MPI_INT,0,1,MPI_COMM_WORLD,&s);
MPI_Recv(&end,1,MPI_INT,0,2,MPI_COMM_WORLD,&s);
MPI_Recv(&matrix1[start][0],(msize*unit*3)+8,MPI_INT,0,3,MPI_COMM_WORLD,&s);
}
```

The code given below demonstrates the actual matrix multiplication taking place on the client nodes. The results are then sent back to the server node.

```
start = unit * rank;
end = unit * (rank+1);
MPI_Barrier(MPI_COMM_WORLD);
for(i=start;i<end;i++) {
for(j=0;j<msize;j++) {
for(k=0;k<msize;k++) {
matrix[i][j] = matrix[i][j]+(matrix1[i][k] * matrix2[k][j]);
}
}
}
if(rank!=0)
MPI_Send(&matrix[start][0],(msize*unit*2)+6,MPI_INT,0,7,MPI_COMM_WORLD);
The server node assembles the results sent by the client nodes.
if(rank==0){
for(i=1;i<nproc;i++){
start = unit * i;
if(i!=0){
MPI_Recv(&matrix[start][0], (msize*unit*2)+6,
MPI_INT,i,7,MPI_COMM_WORLD,&s);
}
}
```