• Welcome to Jose's Read Only Forum 2023.
 

(Optimization) Faster MIN/MAX for LONG and DWORD

Started by Theo Gottwald, January 02, 2007, 09:40:49 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Theo Gottwald

Based on Ian's version from the discussion here: Post
I've made some MIN/MAX Macros.

They are not for "daily use" as they may need more memory then the normal code which is been generated by the compiler.
Theadvantage of this ASM Versions is that they avoid conditional Jumps.

As Ian says,
Quote... they can be used to replace statements like "If x <= 0 Then X = 1". They are particularly useful in pixel-based graphic manipulation where a  single, often used, function call may use 30,000,000 of them.

This is a good example for the ASM "SETx" Mnemonic, and how to code to avoid conditional JUMPS, as they slow down otherwise fast code.
The code is attached as Include file and can be downloaded.


'--------------------------------------------------
' signed only
MACRO A_MAXL5(ERG,P1,P2,P3,P4,P5)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov edx, P5
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setl cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' signed only
MACRO A_MAXL4(ERG,P1,P2,P3,P4)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setl cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' signed only
MACRO A_MAXL3(ERG,P1,P2,P3)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setl cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' signed only
MACRO A_MAXL2(ERG,P1,P2)
! mov eax, P1
! mov edx, P2
! xor ecx, ecx
! sub edx, eax
! setl cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
! mov ERG, eax
END MACRO
'-------------------------------------------------------------------------------------------
' signed only
MACRO A_MINL5(ERG,P1,P2,P3,P4,P5)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov edx, P5
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setge cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' signed only
MACRO A_MINL4(ERG,P1,P2,P3,P4)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setge cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' signed only
MACRO A_MINL3(ERG,P1,P2,P3)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setge cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' signed only
MACRO A_MINL2(ERG,P1,P2)
! mov eax, P1
! mov edx, P2
! xor ecx, ecx
! sub edx, eax
! setge cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
! mov ERG, eax
END MACRO
'--------------------------------------------------
' unsigned only
MACRO A_MAXD5(ERG,P1,P2,P3,P4,P5)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov edx, P5
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setb cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' unsigned only
MACRO A_MAXD4(ERG,P1,P2,P3,P4)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setb cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' unsigned only
MACRO A_MAXD3(ERG,P1,P2,P3)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setb cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' unsigned only
MACRO A_MAXD2(ERG,P1,P2)

! mov eax, P1
! mov edx, P2
! xor ecx, ecx
! sub edx, eax
! setb cl
'! sub ecx, 1
! and edx, ecx
! add eax, edx
! mov ERG, eax

END MACRO
'-------------------------------------------------------------------------------------------
' unsigned only
MACRO A_MIND5(ERG,P1,P2,P3,P4,P5)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov edx, P5
! call tst
! mov ERG, eax

EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setae cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' unsigned only
MACRO A_MIND4(ERG,P1,P2,P3,P4)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov edx, P4
! call tst
! mov ERG, eax

EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setae cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' unsigned only
MACRO A_MIND3(ERG,P1,P2,P3)
! mov eax, P1
! mov edx, P2
! call tst
! mov edx, P3
! call tst
! mov ERG, eax
EXIT MACRO
tst:
! xor ecx, ecx
! sub edx, eax
! setae cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
!ret
END MACRO
'--------------------------------------------------
' unsigned only
MACRO A_MIND2(ERG,P1,P2)
! mov eax, P1
! mov edx, P2
! xor ecx, ecx
! sub edx, eax
! setae cl
! sub ecx, 1
! and edx, ecx
! add eax, edx
! mov ERG, eax
END MACRO

Donald Darden

Call instructions take over twice the clock cycles of a conditional jump, and for some cases involving Return statements, the time required is even longer.  The net result is that by trying to avoid inline conditional jumps, you have created code that will be even slower.  You also have to consider whether your alternate
method requires more instructions as well, as those also take up time.

Windows programming and program execution time is somewhat hard to determine when compared to DOS, because there are many activities going on all at once within the Windows environment, and code (and registers) have to be swapped
in and out in a continuing cycle so that everything that is suppose to be running gets some CPU time.  The user also influences the time required by electing what
priorization scheme to follow, where the current focus is, how many and what
other tasks are being performed at the same time, and the number of currently
running processes and the time that they require.

In other words, what you can achieve in terms of optimization is somewhat
limited, because there are many factors involved that are not within the scope
of your program.  Shaving a few msecs off here and there are not near as
significant as you might think.  Gone are the days when you had to make a
lumbering PC attempt to fly.  Now the problem is that we are loading our PCs
down with a lot of things to do at once, in an operating environment that makes
it possible (though more difficult and involved) to write capable applications to satisfy a more critical and demanding public.

Theo Gottwald

Speed tests these days are dependent on CPU architecture.
Did you know that the new Intel Core Dua has a small Loop-Cache?
Beeing able to give very small loops a boost?

I tested on my A64, but even this architecture is now CPU-Architecture from Yesterday.
New CPU's need otehr optimizations.

Simple testing and timing of Subprogramms may give good results - or not depending on Cache architecture.
Because if you test the Subroutine 1000 times, after the second time the Subroutine is in the Cache,
which may be not the case in a normal program environment.

So far I agree with you, that optimization efforts may be missleading in this or that way.
In my case, its necessary that you read this post to see how we came up with this code:

http://www.powerbasic.com/support/forums/Forum4/HTML/013932-3.html