• Welcome to Jose's Read Only Forum 2023.
 

(Optimization) Speed up your ABS()

Started by Theo Gottwald, August 28, 2007, 06:21:46 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

Often in code you need to be sure that your results are positiv.
The easiest way is to use the ABS() - Operator.

Example:

LOCAL R01, R02 AS LONG
R02=-5
R01=ABS(R02)
PRINT R01


you get 5 as result.

ABS makes you to always have positive results, by removing the negative sign if there is one.

If we take a look into DisASM, we realize that PB does not make a big difference what datatype we use.

In case R02 is a Datatype LONG and even in a REGISTER, we get this:

4023EF    MOV DWORD PTR [EBP-5C], EDI
4023F2    FILD LONG PTR [EBP-5C]
4023F5    FABS
4023F7    FISTP QUAD PTR [EBP-5C]
4023FA    MOV EDI, DWORD PTR [EBP-5C]


If R02 is a SINGLE we get:

4023F0    FLD SINGLE PTR [EBP+FFFFFF70]
4023F6    FABS
4023F8    FSTP SINGLE PTR [EBP+FFFFFF70]


The compiler wants to use the FABS-Mnemonic on this, which may be a good idea with Floating point,
but can be made faster with LONG and Integer. We'll take a look on LONG here.

Assume you want to change the sign of a LONG Variable, you can use this:

R01=-R01

Taking a look into DisASM, the compiler knows how to do it fast:


402408    MOV EAX, ESI  ' Transfer our REGISTER VARIABLE into EAX
40240A    NEG  EAX
40240C    MOV ESI, EAX ' Transfer EAX back into our REGISTER VARIABLE


While, if you write

! NEG R01

you save another two instructions. We see this at many places, that for technical reasons, the compiler tries to do its work in EAX.
Even if we already use REGISTER Variables. Anyway, this won't eat a lot of CPU Cycles as these instructions are not Floating Point mnemonics.

What if we want to do it like ABS()?
We only want to change the SIGN of the Variable IF its negative.

Lets code it in BASIC:
IF SGN(R01)=-1 THEN R01=-R01

What we get is this:


4023F0    MOV EAX, DWORD FFFFFFFF
4023F5    MOV DWORD PTR [EBP-5C], ESI
4023F8    FILD LONG PTR [EBP-5C]
4023FB    MOV DWORD PTR [EBP-2C], EAX
4023FE    CALL L4038D2
402403    CMP EAX, DWORD PTR [EBP-2C]
402406    JNZ SHORT L40240E
402408    MOV EAX, ESI
40240A    NEG  EAX
40240C    MOV ESI, EAX
.........
4038D2    FTST
4038D4    WAIT: FNSTSW AX
4038D7    FSTPST, ST(0)
4038D9    SAHF
4038DA    MOV EAX, DWORD 00000000
4038DF    JB  SHORT L4038E4
4038E1    JNZ SHORT L4038E6
4038E3    RET NEAR

4038E4    DEC  EAX
4038E5    RET NEAR

4038E6    INC  EAX
4038E7    RET NEAR



Thats not really faster then ABS(). A coding in BASIC won't help us to speed it up.
The problem is that the SGN() will also use Floating Point Mnemonics, while that would not be necessary at this place because we have datatype LONG.

At the end, the SIGN in Integer Variables (LONG) is just one bit, at the left end.
What we do is, we make "Bit Test" (BT) this will test the SIGN BIT in our case for LONG Variables this is bit Nr. 31.
Then the so called "Carry Flag" will be set if the sign was negative.
The next command is a JNC "JumpOnNoCarry". This way we avoid doing anything in case the value is already positive.

MACRO ABS_LNG(P1)
MACROTEMP POS
! BT P1,31
! JNC POS
! NEG P1
POS:
END MACRO 


This way, we get a alternative sollution, which will only work for variables of type LONG, best used on REGISTERS.
But then its faster.


Kent Sarikaya

Theo, very interesting post/article! Whenever I use absolute and modulus commands in a program, I feel like I am writing or working on some cool code, above the average and it makes me feel good. Now I realize my level of feeling good is at a kindergarten level when I see this sort of well thought out evaluation. You guys on this forum are all professors, I appreciate it a lot!

Theo Gottwald

Its even better understandable, if you know under which conditions Floating Point Menmonics use more CPU cycles then the normal Integer Mnemonics.

To take a cloaser look on this would fit into Donalds or Charles ASM Forum.
But its worth to take a look in case you want to get the last CPU Cycle out of your code.

Please note that I am talking about "inner Loops" here in really timecritical programs.
In 99,9% of all cases where the ABS is not called some Million or more times in a second,
you won't notice any difference from these small changes.

When its in a subprogramm (like mine) which has to do a lot of things while the user is waiting,
any small puzzlestone is important.

Petr Schreiber

Hi Theo,

is there any chance "Theo's Advisor" source checker would appear for PB/WIN :)
It would mark parts ( line numbers ? ) of source which can be optimized better and suggest solution based on knowledge from already posted workarounds and tweaks.


Bye,
Petr
AMD Sempron 3400+ | 1GB RAM @ 533MHz | GeForce 6200 / GeForce 9500GT | 32bit Windows XP SP3

psch.thinbasic.com

Edwin Knoppert

Wow cool, assembly put in a macro, good call!
:)

Theo Gottwald

Quoteis there any chance "Theo's Advisor" source checker would appear for PB/WIN

Peter,
knowing the PB/Win Market, I am quite sure I could not earn a potato with it .-) therefore it would have to be freeware.

The chances are high, if someone else will make it, if you wait for me, the chances are really low :-).
But just take a look on the ABS and for example SGN().

If you take the ABS-Code (i wrote) you can use the first two instructions to make a faster (much faster) version of SGN().

My hope is that the compiler itself would do these optimizations in a future version, because i think in these cases, there is no disadvantage, just that the compiler need to look if there is a LONG inside the SGN().