Unroll
The following program simply performs a loop that could be easily unrolled.
program dirunroll integer,parameter :: N=10000000 real,dimension(N):: a,b,c real:: begin,end real,dimension(2):: rtime common/saver/a,b,c call random_number(b) call random_number(c) x=2.5 begin=dtime(rtime) ! this loop can be unrolled do i=1,N a(i)=b(i)+x*c(i) end do end=dtime(rtime) print *,' my loop time (s) is ',(end) flop=(2.0*N)/(end)*1.0e-6 print *,' loop runs at ',flop,' MFLOP' print *,a(1),b(1),c(1) end
A typical output of the program is:
my loop time (s) is 0.115983000000000 loop runs at 172.439063910270 MFLOP 3.058973 0.9975595 0.8245652
Familiarize with and then compile it with different F90 compilers and with compiler flags that enables loop unrolling:
For Intel compiler :
-unroll[n] set maximum number of times to unroll loops. Omit n to use default heuristics. Use n=0 to disable loop unroller. -funroll-loops unroll loops based on default heuristics
For PGI compiler :
-M[no]unroll[=c:<n>|n:<n>] Enable loop unrolling c:<n> Completely unroll loops with loop count n or less n:<n> Unroll other loops n times -Munroll Completely unroll loops with loop count 1
Check measured performance using these flags against standard optimization level -O3
Run this simple code at least three time for any case in order to estimate the fluctuation among different runs.
Check your results against the results provided in the table below. Produce a similar table and publish it on your personal wiki:
compiler | options | Mflops |
---|---|---|
gfortran | - | ~ 177 |
gfortran | -O3 | ~ 175 |
gfortran | -unroll-loops | ~175 |
ifort | -O3 | ~173 |
ifort | -O0 -funroll-loops | ~173 |
ifort | -unroll=8 | ~322 |
pgf90 | -O0 | ~314 |
pgf90 | -Munroll=n:8 | ~326 |
pgf90 | -Munroll=n:16 | ~316 |
pgf90 | -O3 | ~311 |