Unroll

The following program simply performs a loop that could be easily unrolled.

program dirunroll
  integer,parameter :: N=10000000
  real,dimension(N):: a,b,c
  real:: begin,end
  real,dimension(2):: rtime
  common/saver/a,b,c
    call random_number(b)
    call random_number(c)
    x=2.5
    begin=dtime(rtime)
! this loop can be unrolled
    do i=1,N
      a(i)=b(i)+x*c(i)
    end do
    end=dtime(rtime)
    print *,' my loop time (s) is ',(end)
    flop=(2.0*N)/(end)*1.0e-6
    print *,' loop runs at ',flop,' MFLOP'
    print *,a(1),b(1),c(1)
 end

A typical output of the program is:

my loop time (s) is   0.115983000000000     
  loop runs at    172.439063910270       MFLOP
   3.058973      0.9975595      0.8245652    

Familiarize with and then compile it with different F90 compilers and with compiler flags that enables loop unrolling:

For Intel compiler :

-unroll[n]  set maximum number of times to unroll loops.  Omit n to use
            default heuristics.  Use n=0 to disable loop unroller.
-funroll-loops  unroll loops based on default heuristics

For PGI compiler :

-M[no]unroll[=c:<n>|n:<n>]
                    Enable loop unrolling
    c:<n>           Completely unroll loops with loop count n or less
    n:<n>           Unroll other loops n times
    -Munroll        Completely unroll loops with loop count 1

Check measured performance using these flags against standard optimization level -O3 Run this simple code at least three time for any case in order to estimate the fluctuation among different runs.

Check your results against the results provided in the table below. Produce a similar table and publish it on your personal wiki:

compileroptionsMflops
gfortran-~ 177
gfortran-O3~ 175
gfortran-unroll-loops~175
ifort-O3~173
ifort-O0 -funroll-loops~173
ifort-unroll=8~322
pgf90-O0~314
pgf90-Munroll=n:8~326
pgf90-Munroll=n:16~316
pgf90-O3~311