Latin American School On Computational Materials Science :: Site / LoopInterchangeExerciseNumber1

Loop Interchange

The following two small programs, written in F90 and C, simply perform the same two loop operations on matrix a,b,c. Familiarize with them, compile your preferred F90 and C compiler and then run them.

Ifort crashes with optimization on because of the call to rand()!

program loop_interchange

! loop interchange example 
      integer:: n=4000,m=4000 ! global 
        dimensions 
        INTEGER:: I,J ! running indexes 
        real,allocatable :: a(:,:),b(:,:),c(:,:) 
        double precision :: time1=0d0,time2=0.d0 
        allocate(a(n,m)) 
        allocate(b(n,m)) 
        allocate(c(n,m)) 
        a=0.0 
        b=rand() 
        c=rand() 
        call cpu_time(time1) 
        DO I = 1, N 
           DO J = 1, M 
                A(I, J) = B(I, J) + C(I, J) 
           ENDDO 
        ENDDO 
        call cpu_time(time2) 
       print*,"control value:", A(N,M), 
       "time to execute  first loop (i,j) is: ", 
       time2-time1 
       a=0.0  

        call cpu_time(time1)  
	DO J = 1, M 
  	   DO I = 1, N  
    		A(I, J) = B(I, J) + C(I, J) 
           ENDDO 
	ENDDO 
	call cpu_time(time2) 
      print*,"control value:", A(N,M), 
      "time to execute second loop (j,i) is: ", 
      time2-time1 





        end program

#include <stdio.h> 
#include <stdlib.h> 
#include <time.h> 
#define N 4000 
#define M 4000 

main() 
{ 
  int i,j; 
  float *a,*b,*c; 
  double wall_time1, wall_time2, wall_time; 

  a=(float *)malloc(N*M*sizeof(float)); 
  b=(float *)malloc(N*M*sizeof(float)); 
  c=(float *)malloc(N*M*sizeof(float)); 

  for(i=0;i<N;i++){ 
    for(j=0;j<M;j++){ 
      b[j+N*i] = (float)rand()/(float)RAND_MAX; 
      c[j+N*i] = (float)rand()/(float)RAND_MAX; 
    } 
  } 
  wall_time1=(double)clock()/(double)CLOCKS_PER_SEC; 

  for(i=0;i<N;i++) { 
    for(j=0;j<M;j++) 
      a[j+N*i] = b[j+N*i] + c[j+N*i]; 
  } 
  wall_time2=(double)clock()/(double)CLOCKS_PER_SEC; 
  wall_time=(wall_time2-wall_time1); 
  printf("%f\t %e\n",a[1],wall_time); 

  wall_time1=(double)clock()/(double)CLOCKS_PER_SEC; 

  for (j=0;j<M;j++){ 
    for (i=0;i<N;i++) 
      a[j+N*i] = b[j+N*i] + c[j+N*i]; 
  } 
  wall_time2=(double)clock()/(double)CLOCKS_PER_SEC; 
  wall_time=(wall_time2-wall_time1); 
  printf("%f\t %e\n",a[1],wall_time);  
}

A typical output for the C version (compiled with gcc 4.1.2-42) is:

$ ./loop-gcc 
 control value: 1.581539       time to execute  first loop (i,j) is:    1.900000e-01
 control value: 1.581539       time to execute second loop (j,i) is:   2.050000e+00

while the fortran one (compiled with gfortran 4.1.2-42) gives something like:

$ ./loop-gfortran 
 control value:  0.1315453     time to execute  first loop (i,j) is:    2.23166000000000     
 control value:  0.1315453     time to execute second loop (j,i) is:   0.209968000000000

Answer now the following questions:

Which is the fastest loop in the C version ? and in the Fortran one?
Explain why
Is there any differences between different compilers result?
Did you observe any improvement enabling standard optimization flags ? -O2 /- O3
On which loop ? Do you have any explanation about the behavior observed ?

Latin American School On Computational Materials Science

Menu