It's nice to do something really small for a change. Here are some pieces of assembler which will bed well into any PB program.
PI
DIM d AS DOUBLE
! fldpi
! fstp qword ptr d
MSGBOX STR$(d)
Radians to Degrees
DIM d AS DOUBLE
d=2
! fld qword ptr d
! push dword 180
! fild dword ptr [esp]
! add esp,4
! fldpi
! fdivp st(1),st(0)
! fmulp st(1),st(0)
! fstp qword ptr d
MSGBOX STR$(d)
Degrees to Radians
DIM d AS DOUBLE
d=60
! fld qword ptr d
! push dword 180
! fldpi
! fild dword ptr [esp]
! add esp,4
! fdivp st(1),st(0)
! fmulp st(1),st(0)
! fstp qword ptr d
MSGBOX STR$(d)
Cosine from degrees
d=60
! fld qword ptr d
! push dword 180
! fldpi
! fild dword ptr [esp]
! add esp,4
! fdivp st(1),st(0)
! fmulp st(1),st(0)
! fcos
! fstp qword ptr d
MSGBOX STR$(d)
Sin from Degrees
DIM d AS DOUBLE
d=60
! fld qword ptr d
! push dword 180
! fldpi
! fild dword ptr [esp]
! add esp,4
! fdivp st(1),st(0)
! fmulp st(1),st(0)
! fsin
! fstp qword ptr d
MSGBOX STR$(d)
Tangent from Degrees
DIM d AS DOUBLE ' degrees
DIM t AS DOUBLE ' tangent
d=-44
! fld qword ptr d
! push dword 180
! fldpi
! fild dword ptr [esp]
! add esp,4
! fdivp st(1),st(0)
! fmulp st(1),st(0)
! fptan
! fcomp
! fstp qword ptr t
MSGBOX STR$(d)+$CR+STR$(t)
Sine and Cosine pair from Degrees
DIM d AS DOUBLE ' degrees
DIM c AS DOUBLE ' sine
DIM s AS DOUBLE ' cosine
d=60
! fld qword ptr d
! push dword 180
! fldpi
! fild dword ptr [esp]
! add esp,4
! fdivp st(1),st(0)
! fmulp st(1),st(0)
! fsincos
! fstp qword ptr c
! fstp qword ptr s
MSGBOX STR$(d)+$CR+STR$(s)+$CR+STR$(c)
Degrees from Sine Cosine Pair
DIM d AS DOUBLE ' degrees
DIM s AS DOUBLE ' sine or y coordinate
DIM c AS DOUBLE ' cosine or x coordinate
s=10
c=-10
! fld qword ptr s
! fld qword ptr c
! fpatan
! fldpi
! fdivp st(1),st(0)
! push dword 180
! fimul dword ptr [esp]
! add esp,4
! fstp qword ptr d
MSGBOX STR$(s)+$CR+STR$(c)+$CR+STR$(d)
Thanks Charles,
are this short assembly poems :) faster than native PB functions ?
Thanks,
Petr
Chord from angle in degrees
DIM a AS DOUBLE
DIM c AS DOUBLE
a=60
! fld qword ptr a
! push dword 360
! fldpi
! fild dword ptr [esp]
! fdivp st(1),st(0)
! fmulp st(1),st(0)
! fsin
! mov dword ptr [esp],2
! fimul dword ptr [esp]
! add esp,4
! fstp qword ptr c
MSGBOX STR$(a)+$CR+STR$(c)
Distance of a point in space
DIM x AS DOUBLE ' x coordinate
DIM y AS DOUBLE ' y coordinate
DIM z AS DOUBLE ' z coordinate
DIM d AS DOUBLE ' distance
x=200
y=100
z=200
! fld qword ptr x
! fld st(0)
! fmulp st(1),st(0)
! fld qword ptr y
! fld st(0)
! fmulp st(1),st(0)
! faddp st(1),st(0)
! fld qword ptr z
! fld st(0)
! fmulp st(1),st(0)
! faddp st(1),st(0)
! fsqrt
! fstp qword ptr d
MSGBOX STR$(x)+$CR+STR$(y)+$CR+STR$(z)+$CR+STR$(d)
Phi, The Golden ratio / Fibonacci number
Calculation: (sqr(5)-1) /2
DIM d AS DOUBLE ' phi
! push dword 5
! fild dword [esp]
! add esp,4
! fsqrt
! fld1
! fsubp st(1),st(0)
! fld1
! fld1
! faddp st(1),st(0)
! fdivp st(1),st(0)
! fstp qword ptr d
MSGBOX STR$(d)
Yes Petr, these should certainly be faster than Compiled Basic since they combine several operations on the FPU stack. But the real gains to be had are in array processing where you can do all the iterations using registers for indexing, and keep intermediate stuff on the FPU stack for reuse. This cuts down reading variables from RAM.
Sounds good,
thanks a lot!
Petr
Thanks Charles,
Is this a good implementation?
FUNCTION ACOS( BYVAL rad AS DOUBLE ) AS DOUBLE ' ACOS = ASIN( SQR(1-rad^2) )
IF rad >= 1 THEN FUNCTION = 0 : EXIT FUNCTION
IF rad <= -1 THEN FUNCTION = -1 : EXIT FUNCTION
' FUNCTION = pi/2 - ATN(rad / SQR(1 - rad * rad))
! FINIT
! FLD rad
! FST ST(1)
! FMUL
! FST ST(1)
! FLD1
! FSUBR
! FSQRT
! FST rad
! FLD rad
! FST ST(1)
! FMUL
! FST ST(1)
! FLD1
! FSUBR
! FSQRT
! FDIVR rad
! FST ST(1)
! FLD1
! FPATAN
! FST FUNCTION
END FUNCTION
Thanks Mike,
This is my best effort for ACOS,
exploiting the FPU stack to maximum advantage. And you will find it gives sensible results with wierd values like -1 which yields - pi. Anything less than -1 gives 0.
FUNCTION acos(BYVAL v AS DOUBLE) AS DOUBLE
! fld qword ptr v
! fld1
! fld st(1)
! fmul st(0),st(0)
! fsubp st(1),st(0)
! fsqrt
! fxch st(1)
! fpatan
! fstp qword ptr function
END FUNCTION
And here is ASIN
(Omitting the fxch instruction)
FUNCTION asin(BYVAL v AS DOUBLE) AS DOUBLE
! fld qword ptr v
! fld1
! fld st(1)
! fmul st(0),st(0)
! fsubp st(1),st(0)
! fsqrt
! fpatan
! fstp qword ptr function
END FUNCTION
I listen to this podcast by Steve Gibson and Leo Laporte, but know that Steve Gibson programs only in assembly. I thought you guys might be interested on some of the links he provides and some programs he wrote.
http://www.grc.com/smgassembly.htm
It would be neat to have a test of speed. Between optimized code in assembly to let's say count to one million, record the start and end time and display how long it took to count. Then do the same thing in powerbasic and C and other languages to see the differences in performance.
I would have loved to have seen his ChromoZone Demo, but as he says, Windows NT based systems wont support it and that includes XP. But his AM puzzle works fine.
As an experiment I wrote part of the wndproc of Viewer5 in assembler to see what it was like. The coding was very straight forward, but I came to the view that there would be little advantage over the compiler. Assembler comes into its own when complete algorithms can be executed within the CPU registers, minimising accesses to RAM.
In these cases, it is inevitable that the compiler will be outperformed. But taking measurements is not as easy as it sounds. The main problem is that the operating system is always multitasking in the background, so the timer, measuring either absolute time or CPU clocks always comes back with a different result. Furthermore the CPUs all have slightly different architectures, and varying abilty to execute blocks of instructions in parallel (at the opcode level), and that is before Cache sizes and multiple cores are taken into consideration.
So I think the best strategy is to take a theoretical count of the CPU clocks for each instruction, and arrange instructions that are likely to be run in parallel so they can do so. For example, you can set the FPU to work calculating an arctangent which may take 100s of clocks and get on with some other stuff in the CPU. You know that the FPU will not require any other resources while it is performing this calculation. But the CPU itself is smart enough to know that while it is doing things with say the EAX and ECX registers, it can do other things on the EDX, EBX and other registers in parallel for as long as one group does not depend on the other.
Multiple levels of parallelism have been developed to the extreme, as a result of the rivalry between Intel and AMD
and the demands of the ever resource-hungry MS operating systems.
Steve's Gibson Research Corporation
www.grc.com is well known for his very useful
Shields Up! - Tool (https://www.grc.com/x/ne.dll?bh0bkyd2)
but there are a lot of other useful tools there.
Theo, i have tried this at home (with my Opera browser) and all tests went fine.
Is this really a trustworthy site?
Wanted to try it at the office tomorrow.
Edwin, Steve Gibson is a security expert among his many talents. He offers many tools for free and does all he can to promote security on the web. He has warned Microsoft about potential problems in security before they happen and then his warnings proved true.
He also has a really nice password generator on his, if you need very safe passwords.
Thanks Charles, for the explanations and code. I guess I won't worry about speed till it becomes an issue, so far has not been the case.
I like Steve Gibson's development philosophy, which I am sure many of us share - Small is Beautiful!
Advocating the use of Assembler, goes along with this. What we need is a better methodology for using assembler on the larger multifaceted projects, while maintaining the accessibility of a high level language.
With assembler the big picture is quickly lost in the stream of instructions. Good annotation is the key, and in the long run, possibly superior to a high level language alone, since both intention and methodology can be freely expressed.
Is this really a trustworthy site?
Edwin, I use Shields up,since several years.
Its one of those things I do on any new installed PC.
If someone would ask me for a trustful site, I may take www.grc.com before Microsoft .-)).
Finally, as I do not know any of the site owners i can not swear on any site that is not my own that its safe for sure.
After all, even TOR (the anonymizer) could be a affiliate of the NSA or of Google or both ... :-)
Charles,
In graphing, there is often a need to clamp values so they do not exceed known bounds. These functions must be called for every iteration with the atendant loss of speed.
PB has functions for doing it, but my guess is this can be done better with ASM?
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
FUNCTION ClampFloat( sVar AS SINGLE, Minimum AS SINGLE, Maximum AS SINGLE ) AS SINGLE
FUNCTION = MAX( Minimum, MIN( Maximum, sVar ) ) ' Clamp a FLOAT variable between Minimum & Maximum values
END FUNCTION
'¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤'
FUNCTION ClampInt( sVar AS LONG, Minimum AS LONG, Maximum AS LONG ) AS LONG
FUNCTION = MAX&( Minimum, MIN&( Maximum, sVar ) ) ' Clamp an INTEGER variable between Minimum & Maximum values
END FUNCTION
Used like this for example:
MinorColorIndex = CLNG( (maxIndexf * ClampFloat( (1.0 - 13*lineDensity), 0.0, 1.0) ) )
I could probably convert the code to utilize a MACRO function.
The best way to speed things up would be to prepare the unclamped data set in an array, then clamp it in bulk using assembler. This would give you two kinds of efficiency: firstly traversing the array can be done without loading an indexing variable each time, secondly, the max and min values can be held in registers, further reducing RAM accesses.
PS: removed 2 colons in the wrong place.
A very literal example:
arrayptr=varptr(myarray)
! mov esi,arrayptr
! mov ecx,0
! mov edx,elements
! mov ebx,maxval
! mov eax,minval
do1:
! cmp ecx,edx
! jge edo1
! cmp [esi+ecx*4],eax
! jge eif1
mov [esi+ecx*4],eax
eif1:
! cmp [esi+ecx*4],ebx
! jle eif2
! mov [esi+ecx*4],ebx
eif2:
! inc ecx
! jmp do1
edo1:
This can be further optimised by eliminating the SIBs which are are not hardcore silicon but microcoded.
arrayptr=varptr(myarray)
! mov esi,arrayptr
! mov edx,elements
! mov ebx,maxval
! mov eax,minval
do1:
! dec edx
! jl edo1
! cmp [esi],eax
! jge eif1
mov [esi],eax
eif1:
! cmp [esi],ebx
! jle eif2
! mov [esi],ebx
eif2:
! add esi,4
! jmp do1
edo1:
OK.
So just to be clear, MyArrary would be declared as:
DIM MyArray(n,3) as LONG/SINGLE
For i = 1 to n
MyArray(i,1) = Value To Clamp
MyArray(i,2) = Min Value
MyArray(i,3) = Max Value
Next
arrayptr=varptr(myarray)
call Clamp( arrayptr ) ' Clamp all n values in one call.
As I look at the different ways in which this is function is used in my code, I realize I could take this approach in some instances, but in others the result is immediatly used to calculate something else.
But that is not a problem, as i could just call the function with a one element array right?
You just need a single element array, assuming the clamp values are the same for all the elements.
The function might look like this:
minmax( byref myarray(0) as long, byval elements as long, byval minval as long, byval maxval as long)
If this fits your program I can produce an equivalent floating point minmax. It is slightly more tricky.
>assuming the clamp values are the same for all the elements.
yes I see, perfect!
Thanks Charles.
Why are floating point versions more tricky?
Well there's a bit a shuffling to do on the FPU stack, then you'll find that PB inline assembler does not support the direct comparator instructions FCOMI and FCOMIP, so opcodes have to be used for those. (Without them the older FCOM and FCOMP requires a transfer of FPU flags into the CPU flag register). Another important point is that the FPU uses the overflow flag not the sign flag, so jae (jump above or equal) and jbe (jump below or equal) must be used instead of jge and jle.
Anyway here is how it works out: tested on an array of 2 elements.
#COMPILE EXE
#DIM ALL
SUB minmax(_
BYREF aa AS SINGLE,_
BYVAL els AS LONG,_
BYVAL minval AS SINGLE,_
BYVAL maxval AS SINGLE _
)
DIM arrayptr AS SINGLE PTR
arrayptr=VARPTR(aa)
! mov esi,arrayptr
! mov edx,els
! fld dword ptr maxval
! fld dword ptr minval
do1:
! dec edx
! jl edo1
! fld dword ptr [esi]
' fcomip st(1),st(0)
! db &hdf,&hf1
! jae eif1
! fst dword ptr [esi]
eif1:
! fld dword ptr [esi]
' fcomip st(2)
! db &hdf,&hf2
! jbe eif2
! fxch
! fst dword ptr [esi]
! fxch
eif2:
! add esi,4
! jmp do1
edo1:
! fcomp st(0) ' dump
! fcomp st(0) ' dump
END SUB
FUNCTION PBMAIN () AS LONG
DIM a(1) AS SINGLE
a(0)=-1: a(1)=1
minmax(a(0), 2 , -.5, .5 )
MSGBOX STR$ (a(0))+$CR+STR$(a(1))
END FUNCTION