• Welcome to Jose's Read Only Forum 2023.
 

(Optimization) Taking a look on FOR ... NEXT Loop depending on Datatypes

Started by Theo Gottwald, August 08, 2007, 10:42:00 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

In this Posting, we'll take a look, how the Datatype we use as a Loop-Variable will influence the construction of the Loop on the lowest level.

EXAMPLE 1:
We start with our fastest Loop. REGISTER-Variable, which is however NOT used inside the Loop.
Please note that this will make the compiler to reverse the Loop-direction and check for "JNZ" (Jump-Not-zero)
as this is the fastest way of Looping..

REGISTER R01 AS LONG

FOR R01=10 TO 200
    !NOP
NEXT R01

' becomes our fastest Loop:

4023D6  MOV ESI, DWORD 000000BF
4023DC  NOP
4023DD  DEC ESI
4023DF  JNZ SHORT L4023DC


please note that you can undermine that system, if you make sure that you use the REGISTER-Variables in the right Order.
Knowing that ESI - contains the first declared REGISTER-Variable, and EDI - the second, you can this way still use these registers with ASM-Code.

EXAMPLE 2:
We can still use the REGISTER-Variable and have the downrunning loop, byaccessing the REGISTER directly.
While I do not really recommend this. I show this, to demonstrate that the Compiler just looks if "R01" is been found inside the Loop.
If not he will reverse the Loop direction.


FOR R01=10 TO 200
    !NOP
    ! MOV EAX,ESI
NEXT R01 

4023D6  MOV ESI, DWORD 000000BF
4023DC  NOP
4023DD  MOV EAX, ESI
4023DF  DEC ESI
4023E1  JNZ SHORT L4023DC



EXAMPLE 3:
In this example we do something really useless, we assign R01 the Content of R01.
The reason is, to see how the compiler will change the Loop IF R01 is been used inside the Loop.

FOR R01=10 TO 200
    !NOP
    R01=R01
NEXT R01

4023D6  MOV ESI, DWORD 0000000A
4023DC  NOP
4023DD  MOV EAX, ESI
4023DF  MOV ESI, EAX
4023E1  INC ESI
4023E3  CMP ESI, DWORD 000000C8
4023E9  JLE SHORT L4023DC


Now we see that this Loop is just a bit bigger.
Instead of just the DEC and the JNZ, we have now INC,CMP and JLE (Jump-If-Less-or-Equal).
We see in this example, that the compiler makes really use of the REGISTER Variables (R01 = ESI Register).

EXAMPLE 4:
Lets now change the Loop-direction manually. I only want to see, if I will get the quick loop with JNZ then.

FOR R01=200 TO 0 STEP -1
    !NOP
    R01=R01
NEXT R01

4023D6  MOV ESI, DWORD 000000C8
4023DC  NOP
4023DD  MOV EAX, ESI
4023DF  MOV ESI, EAX
4023E1  DEC ESI
4023E3  CMP ESI, BYTE 00
4023E6  JGE SHORT L4023DC


No chance. We just get a JGE (Jump-If-Greater_or_Equal) insteaad of the JLE from Example 3.

EXAMPLE 5:
Please note at this chance, that you can't do this:

REGISTER R01 as DWORD

FOR R01=200 TO 0 STEP -1
    !NOP
    R01=R01
NEXT R01   


While you might say "The DWORD is always in RANGE", it does not compile.
DWORD can not hold negative numbers, and therefore the compiler does not accept them as STEP in this case.

Please note that in case of this Loops the optimization is not different, wether you take
REGISTER R01 AS LONG or
REGISTER R01 AS DWORD


doesn't make a difference here in the code.

EXAMPLE 6:
What definitely makes a difference is, if we use EXTENDED (Floating Point Datatype, declared as REGISTER)

REGISTER E01 AS EXTENDED
FOR E01=1 TO 200
    !NOP
NEXT   

4023D6  FILD INTEGER PTR [00406730]
4023DC  FSTP EXT (TBYTE) PTR [EBP+FFFFFF2C]
4023E2  FLD1
4023E4  FSTP EXT (TBYTE) PTR [EBP+FFFFFF38]
4023EA  FLD1
4023EC  FSTPST, ST(1)
4023EE  NOP
4023EF  FLD EXT (TBYTE) PTR [EBP+FFFFFF38]
4023F5  FLDST, ST(1)
4023F7  FADDP ST(1), ST
4023F9  FSTPST, ST(1)
4023FB  FLDST, ST(0)
4023FD  FLD EXT (TBYTE) PTR [EBP+FFFFFF2C]
402403  FCOMPP
402405  FNSTSW AX
402407  SAHF
402408  JNB SHORT L4023EE


There is not much to say to this, other then that you get an automatic REGISTER-Assignement for your first 4 EXTENDED Variables, even if they are only declared using LOCAL instead of REGISTER.

EXAMPLE 6:
Unless you do an explicit #REGISTER NONE. And this #REGISTER NONE will - as expected - not prevent you from beeing able to explicitly declare
REGISTER R01 AS LONG, R02 AS LONG as REGISTER Variables.

For Testing we do this:

REGISTER R01 AS DWORD,R02 AS DWORD
LOCAL E01,E02,E03,E04,E05,E06 AS EXTENDED
R01=R01
    !NOP
FOR E01=1 TO 200
    !NOP
    R01=R01
NEXT
! NOP   

' This 4 Commands make the REGISTER Assignement for the EXTENDED Variables
' These Instructions are not generated if you do not assign REGISTERS to the EXTENDED Variables.
4023D1  FLDZ
4023D3  FLDZ
4023D5  FLDZ
4023D7  FLDZ

4023D9  MOV EAX, ESI
4023DB  MOV ESI, EAX
4023DD  NOP
4023DE  FILD INTEGER PTR [00406730]
4023E4  FSTP EXT (TBYTE) PTR [EBP+FFFFFF14]
4023EA  FLD1
4023EC  FSTP EXT (TBYTE) PTR [EBP+FFFFFF20]
4023F2  FLD1
4023F4  FSTPST, ST(1)
4023F6  NOP

' This is inside our Loop
4023F7  MOV EAX, ESI
4023F9  MOV ESI, EAX
' Until here

4023FB  FLD EXT (TBYTE) PTR [EBP+FFFFFF20]
402401  FLDST, ST(1)
402403  FADDP ST(1), ST
402405  FSTPST, ST(1)
402407  FLDST, ST(0)
402409  FLD EXT (TBYTE) PTR [EBP+FFFFFF14]
40240F  FCOMPP
402411  FNSTSW AX
402413  SAHF
402414  JNB SHORT L4023F6
402416  NOP

' This 4 Commands reverse the REGISTER Assignement for the EXTENDED Variables
' These Instructions are not generated if you do not assign REGISTERS to the EXTENDED Variables.
402417  FSTPST, ST(0)
402419  FSTPST, ST(0)
40241B  FSTPST, ST(0)
40241D  FSTPST, ST(0)


EXAMPLE 7:
Now lets take a look on QUAD-Loops.

LOCAL E01,E02,E03,E04,E05,E06 AS QUAD
FOR E01=1 TO 200
    !NOP
    R01=R01
NEXT 

' becomes

4023D6  FILD INTEGER PTR [00406730]
4023DC  FISTP QUAD PTR [EBP+FFFFFF34]
4023E2  FLD1
4023E4  FISTP QUAD PTR [EBP+FFFFFF3C]
4023EA  FLD1
4023EC  FISTP QUAD PTR [EBP+FFFFFF6C]
'-----------------------------------------------
' Here we're inside the Loop
'-----------------------------------------------
4023F2  NOP
4023F3  MOV EAX, ESI
4023F5  MOV ESI, EAX
'-----------------------------------------------
' Here the Loop-Counter Quad is incremented
'-----------------------------------------------
4023F7  FILD QUAD PTR [EBP+FFFFFF3C]
4023FD  FILD QUAD PTR [EBP+FFFFFF6C]
402403  FADDP ST(1), ST
402405  FISTP QUAD PTR [EBP+FFFFFF6C]
'-----------------------------------------------
' The QUADs are loaded as 2x32 bit and then compared as Floating-Point
'-----------------------------------------------
40240B  FILD QUAD PTR [EBP+FFFFFF6C]
402411  FILD QUAD PTR [EBP+FFFFFF34]
402417  FCOMPP
402419  FNSTSW AX
40241B  SAHF
40241C  JNB SHORT L4023F2


What we see here, is that - as expected - on a 32 bit system QUADS are beeing treated less efficient,
then are 32 Bit LONG's. For QUADS there is no REGISTER ALLOCATION possible.


EXAMPLE 8:
Lets take a look on DOUBLE-variable Loops. If you followed me until here, there should be not much surprise any more.

LOCAL E01,E02,E03,E04,E05,E06 AS DOUBLE
R01=R01
    !NOP
FOR E01=1 TO 200
    !NOP
    R01=R01
NEXT


4023D6  FILD INTEGER PTR [00406730]
4023DC  FSTP DOUBLE PTR [EBP+FFFFFF34]
4023E2  FLD1
4023E4  FSTP DOUBLE PTR [EBP+FFFFFF3C]
4023EA  FLD1
4023EC  FSTP DOUBLE PTR [EBP+FFFFFF6C]
'-----------------------------------------------
' Here we're inside the Loop
'-----------------------------------------------
4023F2  NOP
4023F3  MOV EAX, ESI
4023F5  MOV ESI, EAX
'-----------------------------------------------
' Here the Loop-Counter is incremented
'-----------------------------------------------
4023F7  FLD DOUBLE PTR [EBP+FFFFFF3C]
4023FD  FADD DOUBLE PTR [EBP+FFFFFF6C]
402403  FSTP DOUBLE PTR [EBP+FFFFFF6C]
402409  FLD DOUBLE PTR [EBP+FFFFFF6C]
40240F  FCOMP DOUBLE PTR [EBP+FFFFFF34]
402415  FNSTSW AX
402417  SAHF
402418  JBE SHORT L4023F2



EXAMPLE 9:
Lets try a BYTE-Loop.

LOCAL E01,E02,E03,E04,E05,E06 AS BYTE
LOCAL R01 AS LONG

R01=R01
    !NOP
FOR E01=1 TO 200
    !NOP
    R01=R01
NEXT 

4023D4  MOV DWORD PTR [EBP+FFFFFF78], DWORD 000000C8
4023DE  NOP
4023DF  MOV EAX, ESI
4023E1  MOV ESI, EAX
4023E3  DEC DWORD PTR [EBP+FFFFFF78]
4023E9  JNZ SHORT L4023DE
4023EB  NOP


What we see is, that we get a automatic REGISTER Assignment of the LONG Variable. And we got the Loop-Direction reversed again, becasue the Loop-Counter was not used inside the Loop.
This will happen to BYTE,WORD,INTEGER,LONG,DWORD Datatypes

EXAMPLE 10:
WORD and INTEGER

Lets start with WORD. WORD can not be assigned to REGISTER therefore they have a disadvantage against LONG, as do have INTEGER.
The Loopcode is not much diffrent from the Loopcode for a LONG, therefore in this cases you do not have anb advantage from choosing a INTEGER or WORD compared to a LONG, even if you do not need all the Bits.

LOCAL E01,E02,E03,E04,E05,E06 AS WORD
REGISTER R01 AS LONG   

R01=R01
    !NOP
FOR E01=1 TO 200
    !NOP
    R01=E01
NEXT       

' will become:

4023D6  MOV EDI, WORD 0001
4023DB  NOP
4023DC  MOVZX EAX,EDI
4023DF  MOV ESI, EAX
4023E1  INC EDI
4023E4  CMP EDI, WORD 00C8
4023E9  JBE SHORT L4023DB


And finally INTEGER (signed 16-bit), the BASIC-Code is the same as before just that we have changed the WORD to INTEGER.

4023D5  NOP
4023D6  MOV EDI, WORD 0001
4023DB  NOP
4023DC  XADD EDI, EAX
4023DF  MOV ESI, EAX
4023E1  INC EDI
4023E4  CMP EDI, WORD 00C8
4023E9  JLE SHORT L4023DB


Thats where we end ... but about Loops - there is more.