Now we have it, and its time to take a closer look on the output of the compiler.
This is our Testing Proggy No.1
SUB TestfuncA()
REGISTER R01 AS LONG,R02 AS LONG
! NOP
! NOP
! NOP
FOR R01=0 TO 100000
GOSUB Laba
NEXT R01
EXIT SUB
Laba:
GOSUB Labb
RETURN
Labb:
RETURN
END SUB
Now lets first take a look, what we got with PB 8.04:
Executable Size: 21504 Bytes
4023D0 NOP
4023D1 NOP
4023D2 NOP
4023D3 MOV ESI, DWORD 00000000
4023D9 CALL L4023ED
4023DE INC ESI
4023E0 CMP ESI, DWORD 000186A0
4023E6 JLE SHORT L4023D9
4023E8 JMP L4023F4
4023ED CALL L4023F3
4023F2 RET NEAR
4023F3 RET NEAR
No surprise so far. Now lets take a look on the Output of PB 9 using
#OPTIMIZE SIZE
#OPTIMIZE SIZE
24576 Bytes
4024DB NOP
4024DC NOP
4024DD NOP
4024DE MOV ESI, DWORD 00000000
4024E4 CALL L4024F8
4024E9 INC ESI
4024EB CMP ESI, DWORD 000186A0
4024F1 JLE SHORT L4024E4
4024F3 JMP L4024FF
4024F8 CALL L4024FE
4024FD RET NEAR
4024FE RET NEAR
In fact we get exactly the same. Just notice the
Executable site is 24576 Bytes now.
Now we use PB 9 and the new
#OPTIMIZE SPEED
which is the default mode, means, its switched ON by default.
What we expect to see is, some more "NOPs" because the new compiler will try to Byte ALIGN Loops (etc.) to increase the speed of execution. Thats why PB 9 programms will often be faster then PB 8 programms,
just by recompiling with the new compiler.
#OPTIMIZE SPEED and Default Mode without #OPTIMIZE
24576 Bytes
4024EB NOP
4024EC NOP
4024ED NOP ; these are our three NOP's
4024EE MOV ESI, DWORD 00000000
4024F4 NOP ; These are the NOP's from the #OPTIMIZE SPEED
4024F5 NOP
4024F6 NOP
4024F7 NOP
4024F8 NOP
4024F9 NOP
4024FA NOP
4024FB NOP
4024FC NOP
4024FD NOP
4024FE NOP
4024FF NOP
402500 CALL L402514
402505 INC ESI
402507 CMP ESI, DWORD 000186A0
40250D JLE SHORT L402500
40250F JMP L40251B
402514 CALL L40251A
402519 RET NEAR
40251A RET NEAR
We find the compiler to insert up to 15 NOP's to ALIGN our Loop perfectly on a 16 Byte boundary for best execution speed. What I've noticed here is, that the total file size did not change in this case.
Where are the times?
Normally in such postings you would like to see times and examples.
In this case I have left them away.
Because I found, that in constructed small loops like we use them for testing,
the #OPTIMIZE has even an disadvantage over the unaligned Loop.
The result ofthe automatic ALIGNMENT, depend heavily on:
a ) The overall CPU architecture
b) the cache Size in the CPU
c) The size of the Loop
In case of very small loops I had the effect, that the code even may run slower using the alignment inside the Loop.
SUB TestfuncA()
'REGISTER R01 AS dword,R02 AS dword
'#register NONE
REGISTER R01 AS LONG,R02 AS LONG
LOCAL D01,D02 AS DOUBLE
! NOP
'#ALIGN 32
FOR R01=0 TO 1000
FOR R02=0 TO 1000
GOSUB Laba
! NOP
! NOP
! NOP
NEXT R02
NEXT R01
EXIT SUB
Laba:
GOSUB Labb
RETURN
Labb:
D01=SIN(D02)
! NOP
! NOP
! NOP
RETURN
END SUB
You can test it on your CPU with this constructed example. This does however not play a role in normal programms, as there are mostly much bigger Loops, and we do not have such cache effects.
Or let me say it like this:
If you make highly optimized programms with very small loops, you may get an Speed advantage, if you do the ALIGNMENT manually using #ALIGN and switching off the automatic #OPTIMIZE by using #OPTIMIZE SIZE.
If you have a normal complicated program, which is not handoptimized, you can just forget about it and the compiler will do the best Alignment for you.