Loop Interchange 2
In the following program you will see a standard triple-loop implementation (three nested do statements in F90) for matrix-matrix operations. The goal of this exercise is to find out which is the index permutation that gives the best performance. The code implements two permutations out of six (factorial of 3). Familiarize with it, compile and run it.
program loop_interchange2 !! !! small program to find out the best permutation of i,j,k !! real*8,allocatable :: A(:,:,:),B(:,:,:),C(:,:,:) real*8 :: t1,t2 integer :: i,j,k integer :: nsize write(*,*) 'provide an integer (suggested range 100-250)' write(*,*) 'larger values can be very memory and time-consuming' write(*,*) 'check available memory on your system before playing with larger values' read(*,*) nsize allocate(A(nsize,nsize,nsize)) allocate(B(nsize,nsize,nsize)) allocate(C(nsize,nsize,nsize)) call cpu_time(t1) DO i=1,nsize DO j=1,nsize DO k=1,nsize A (i,j,k) =0 B (i,j,k) =rand() C (i,k,k) =rand() END DO END DO END DO call cpu_time(t2) print*,'inizialisation time=', t2-t1 call cpu_time(t1) DO i=1,nsize DO j=1,nsize DO k=1,nsize A (i,j,k) = A (i,j,k)+ B (i,j,k)* C (i,j,k) END DO END DO END DO call cpu_time(t2) print*,A(nsize,nsize,nsize),'ijk', t2-t1 call cpu_time(t1) DO k=1,nsize DO j=1,nsize DO i=1,nsize A (i,j,k) = A (i,j,k)+ B (i,j,k)* C (i,j,k) END DO END DO END DO call cpu_time(t2) print*,A(nsize,nsize,nsize),'kji', t2-t1 end program loop_interchange2
You are then requested to complete the code adding the missing four permutations and run again the program. Report the times you got in a table like the one here below on your page:
NOTE: Be sure to interchange the parameters i,k,j in the DO loop for each permutation.
permutations | times measured |
i,j,k | |
i,k,j | |
k,i,j | |
k,j,i | |
j,k,i | |
j,i,k |