Let's Talk Assembler

Donald Darden · April 16, 2007, 08:52:27 AM

Assembly is pretty much the universal language when it comes to PCs, because the vast majority of PCs today run either the Intel or AMD chipsets, and these
all support the x86 instruction set.

There are many tools available for writing Assembly code, and even though you
may have never directly written a line of assembly or studied it, if you have ever programmed in any programming language, you have indirectly done your share.
This is because compilers generally take high level code and renders it into suitable assembly code.

But assembly code is not understood by the computer, which only works with groups of bits that we call bytes, words, dwords (for double words), and quads.
When an Assembler converts the assembly code into numeric values, what we call binary format, we have the final conversion necessary for the computer to execute. There are several types of binary files, which generally go by the extentions of BIN, OBJ, DLL, COM, and EXE. A BIN file is just a binary file, with no specific purpose identified. And OBJ file is a linkable file that can be used to
create an EXE file. OBJ files are often library files, which you obtain or create, and can be produced by virtually any compiler language. This is a form of static linking, and the contents of the OBJ files then become a permanent part of your final EXE file. A DLL (Dynamic Link Library) file is an updated concept that replaces the OBJ file. Links to it are formed in your final program, and the DLL
is then loaded and joined to your program at run time. This minimizes the size
of your final program and allows common modules, used in various programs, to
be shared between multiple programs. This can increase the number of programs that your computer can accomodate at one time, minimize the size of your programs on disk, and cause them to make better use of available memory.

COM files are primarily DOS executables. They assume you only have one segment of memory for everything, and that this is the only program running. All segment registers are set to point to the same segment. They also run in the
same segment as any other COM program. You can usually run COM programs
under Windows, but they cannot take advantage of some of the things that
have changed since the old DOS days, such as NTSF, USB, Long File Names, and
so on.

EXE files are generally multi-segment in nature, meaning they keep a DATA
segment, CODE segment, and STACK segment for starters. In the original 16-bit
architecture, a segment could have at most 65,536 memory addresses. With
32-bit and 64-bit CPUs available, the physical barrier to having larger segments
has been removed, but then you might still be constrained by the software toos available.

You do not have all of RAM available to your program when you run under Windows. Windows will supply your program with the memory requested by the size of your program, your use of DIM and assignment statements. Then Windows limits your access to what was allocated to your program, and respects the boundaries around other programs as well. If your program attempts to range outside the allocated memory area, it most likely will result in a GPF (General Protection Fault) error, which will kill your program immediately. Under
Windows 9x/Me, this would also kill Windows, but from Windows NT to Vista,
it would only kill the immediate program (usually).

Understanding something about file types, and that assembly code is fundamental to all programs written for the PC, should help you a bit in recognizing the role of each. Some people write a lot of assembler code directly, and many people only consider writing a small bit of it when the efficiency of that type of code is most pronounced. As a rule of thumb, many programmers are guided by the principal that only ten percent of your code consumes up to 90 percent of your processing time. If you can identify the specific code that is proving least efficient and is consuming the most processing time, then enhance it by refinement, which might include writing portions of it in Assembly code, then you will improve performance significantly.

The problem then is, how do you write assembly code? How does it work? Hopefully, this topic will be explored in more depth by contributors to this thread.

Here is a key thing to know: The x86 architecture, that is the design and function of the "brains" of the computer, is organized around a set of registers. There are in fact eight registers that you can use in your applications. Originally these were just 16-bit registers, but the original instruction set and the registers themselves have been extended to include 32-bit registers, and more functional modes. Rather than cover all eight registers right at this point, let me just explain that whatever you do in your program, most (if not all) of these registers will be involved somehow. Any instruction that involves a destination and a source, such as ADD [destination].[source]. has to have at least one register
designated. So this could be an add the contents of one memory address to a register, a register's contents to another register, or a register's contents to a memory address. But what you do not have is a mode where you can add the contents of one memory address to the contents of another memory address in one instruction. This means that the x86 instruction set is registry centric, not stack centric or memory centric as some other architectures support.

José Roca · April 16, 2007, 08:55:08 AM

This is an interesting site for assembler programers, home of the WinAsm Studio, a free IDE for developing 32-bit Windows and 16-bit DOS programs using the Assembly Language.

WinAsm: http://www.winasm.net/

In the forum you can also find some interesting custom controls written in assembler, but usable with PowerBASIC because they provide a .dll version. See, for example, the XXControls:

http://www.winasm.net/forum/index.php?showtopic=568

Theo Gottwald · May 22, 2007, 09:58:41 PM

One of the strengths of Powerbasic- and the original reason why i bought it that time - was that it has an INLINE ASSEMBLER.

Thats why I like the idea to have also discussions on Assembler and low-level optimizations in the forum.

I have read a lot on this topic from you Donald at otehr places (PB-Forum).
Many of this was really interesting, If you post more of it, Donald, a copy here would be welcome.

Donald Darden · May 23, 2007, 03:55:23 AM

PowerBasic's complier does support inline assembly code, which can be a great boon to anyone willing or interested in venturing into this area of development.

PowerBasic's method of writing inline assembly statements requires that you use either the word ASM or the exclamationj (!) symbol on each line of source code which contains an assembly statement. The format is like this:

[optional label:]
ASM [instruction] [dest] [,source] [;comment field]

If you elect to use the exclamation mark instead, it would appear like this:

[optional label:]
! [instruction] [dest] [.source] [;comment field]

The "white space" separators between fields is significant, but the actual number
of spaces is imaterial. For instructions that have both a destination and a source, the comma alone will suffice as a separator. The semicolon used to mark the comment field can be a single quote mark instead.

The Assembler used by PowerBasic is TASM (Turbo Assembler from Borland), and can be found on the Internet. You don't need it, unless you want to have a separate assembler, or you want access to more information and details of assembly programming and conventions used with it specifically.

Note that there are many Assemblers available for the x86 processors, some limited to 16 bit programming, and others capable of 32 bit programming. These include FASM (Fast Assembler), MASM (Microsoft Assembler, the cadillac of assemblers), NASM (Netwide Assembler), and others. The syntax used with each is very similar to the rest. The big differences are in the extensions and support available to each.

So what sets one Assembler apart from the others? Well, availability would be
one thing. You used to have to pay some pretty good money for these. But
with the adoption of Windows, the development of better high level language compilers and specialized languages, and the myrad of details that programmers had to master, the thrill and habit of writing assembly code sort of went away for a lot of people. There were too many new things to spend your time doing and learning. But writing assembly code gives you the advantage of having real control over the computer, and writing lean and mean code is the epitamy of the programmer's quest. Assembly code gets you there better than any other method.

I did mention that assemblers tended to be somewhat costly. Well, since the bloom has gone off that rose, continued Assembler development has become something of a hobbiest project, covered by open source agreements. you can now even get a MASM equivalent, named MASM32, as a free download off the network. And there are those committed to trying to keep the capabilities of Assemblers up to the expanding capabilities of the processors being designed and built by Intel and AMD. MASM32 is still being adapted, but is somewhat behind current hardware capabilities.

Another alternative is HLA (High Level Assembler), which wraps assembler into a high level language form. Actually, if you look at HLA and PowerBasic both, you realize that PowerBasic's approach packs a lot of functionality in its approach,

WinASM, which José Roca has mentioned and provided a link to, allows you to write assembly code directly under Windows. In other words, you have many options for learning and using assembly language under both DOS and Windows.
But you have a similar ability when it comes to Linux, Unix, and the Mac OS.
Learning the assembler language gives you an edge in all those environments.

So, if you have your choice of Assemblers, what ones would you choose? Well,
that might depend on what you want to do. For basic and limited use of assembly code, you probably don't need one - PowerBasic alone should suffice.
TASM would be the best choice for supporting your use of assembly code with the PowerBasic compilers. MASM and TASM include their own debuggers, and a lot of books feature these two compilers when teaching assembly programming.
Of course PowerBasic's debugger supports its inline assembler mode as well.

Some of the other factors to consider, would be whether you want to write stand alone programs or libraries in assembler, or if you want to create games or an assembler, compiler, or operating system of your own. How well versed do you want to become in assembly? Does a given assembler support 32 bit and possibly 64 bit instructions and address modes? Does it support floating point and graphical extensions? Which chip set is it better suited to, or does it model best? You might even choose the Assembler that your best book on the subject recommends.

Assembly coding may not be the high profile programming tool that it used to be, but there are many sites devoted to the topic still, and searching thise out can be profitable.

Now here is something else to consider as well: To program, you do not need to know or use assembly language, but to understand what has already been coded, you often do. The reason is that, as stated before, assembly language is the universal computer language. Existing programs, libraries, and modules can be run through a dissassembler, which converts the binary format back into assembly language, and if you know assembly language, you can take a stab at trying to understand it. (Note: This process is complicated by the fact that the disassembler may need to be instructed on what portions of the program represent data, and what parts represent actual code). Efforts to write decompilers have been far less successful, because every compiler is different, and techniques for rendering a high level language into assembler differ greatly. The best efforts at decompiling programms involve successfully deterninging which compiler, compiler version, and supporting tools (such as libraries) were involved in the original program design.

If you want to get started with ASM in PowerBasic, then you can start with the Help file and look up the keyword ASM. It doesn't really tell you anything about writing assembly code, but it does list all keywords accepted after the ASM or exclamation mark, and helps identify the limits of PowerBasic's ability to understand the assembler syntax. You will also find a line in the Help file to the Inline Assembler, and that will take you to various other links. But like any Help file, it helps you with specific questions, but it not really a tutorial in its own right. However, PowerBasic does nave a download section devoted to Assembly Program, but much of what is available is specific to the PB/DOS compiler, and involves 16-bit code only.

José Roca · May 23, 2007, 04:08:51 AM

Quote
The Assembler used by PowerBasic is FASM (Fast Assembler)

Bob uses TASM, no FASM.

Donald Darden · May 23, 2007, 04:34:34 AM

Thanks, José. I wonder where I got that misconception? Anyway, I corrected my post accordingly. I looked up TASM on the internet, and came on an order form from Boreland to buy their assembler. I wonder how old that web page is, or whether Borland still exists, or if they (or someone else) now sells TASM? It said version 5.0. I have MASM 6.1, and found MASM up to 9.x on the internet.

Donald Darden · May 23, 2007, 08:21:13 AM

I said earlier that the x86 CPUs are register centric, and I explained that this mean that the typical operation of the CPU involves moving information into, out of, between, and with respect to the registers provided.

Let's refine our concept of a registry a bit by referring to the typical calculator. It has a display that is attached to a registry, and whatever value it shows is what is in that register. It probably has another register as well, and to distinguish between the two, the one displayed is commonly referred to as the "X" register and the hidden one is the "Y" register.

You would normally enter a number into the X register, then when you press an operation key, such as add (+) or subtract (-), the contents of the X register are transferred to the Y registered, and the X register is usually cleared. Then you enter a second value into the X register, and when you press the equals (=) key or another operation key, the pending operation takes place between Y and X, and the results returned to the X register.

Now calculator designs vary quite a bit, and in some designs, the Y register might be cleared at this point, the pending operation may be discarded as being complete, or repeated press of the equals key might cause the pending operation to happen again and again.

The automatic shift of the original contents from the X register to the Y register identifies the X and Y registers as a form of a stack. In more advanced calculators, such as the HP programmable calculators, there may be additional registers associated with the internal stack, such as Z, S, and T. Certain operations permit the contents of any one register to be copied to or exchanged with another designated register.

Since simple calculators do not address the X and Y registers as such, and the transfer of information between these is handled transparently as the result of other operations, we would think of this type of calculator as stack oriented or centric. Now many calculators also include some sort of memory register as such, but these would closely resemble what we now consider a register to be.
There are operators that allow us to save to memory, read from memory, add
to memory, subtract from memory, exchange with memory, clear memory, and so on. Let's just call this the M registrer to put it on parity with our X and Y registers. Now we can see that some calculators are not only stack centric, but in at least one aspect, are register centric as well.

Now a programmable calculator has program memory as well, and this is very much like the RAM (random access memory) that we find in computers. This memory can be divided up into two types: the portion that actually contains the programmed instructions that we entered via the keyboard or other means, and the concept of additional registers, which we might refer to as the R() registers. These are frequently numbered registers, such as R1 - R63, or more appropriately, R(1) to R(63). These form a fixed size array of registers, but the parentheses are often neglected in direct references because each symbol of a left paren and right paren take up a program memory step, and program memory is limited. However, these R registers are a true array if they can be referenced indirectly by number from within another register. That other register would be one of the existing ones, but able to perform index operations with regards to the R() registers.

As you can see, even calculators can have multiple addressing modes. And we find similar modes in the x86 architecture. So why would I describe the x86 as being register centric rather than a mix of all three? The main reason is that the x86 instruction set is principally geared to working with its registers, so that makes it simple. The other thing is that the use of registers tends to make things go faster though the processor. But the final word on the subject is that it would be extremely difficult to write good and effective code if you avoided the use of registers altogether.

You can see from the description of registers in a calculator that naming is an important aspect of registers. There are also a finite number of registers, and they probably have certain key roles to play in normal operations. That all holds up as well when describing registers in the x86 CPU.

There are essentially eight registers that you would normally be concerned with in the x86, broken down into two groups. The first group is the General Purpose registers, and in the 32-bit design, these are called EAX, EBX, ECX, and EDX. The "E" means that they are 32-bit in length. The other part of the name, the AX, BX, CX, and DX, were merely 16-bit registers in the old CPU architecture.
For compatability with older software, the CPU recognizes both types of instructions and both types of registers, using the lower 16 bits of the extended register when a 16-bit instruction is specified.

The General Purpose registers are supported by many instrctions, some of which use certain registers in a special manner. Thus, each general register also has
specific operations for which it is best suited. The AX, or EAX register are most commonly used for arithmetic and comparative operations. The BX or EBX register is used as an alternate to the AX or EAX register, and also used as an offset in some addressing operations. The CX or ECX register is often used as a counter (think FOR loop in this context). The DX or EDX register is sometimes used in conjunction with the AX or EAX register for integer multiply and divide operations.

Because the x86 evolved from an 8-bit architecture originally, where IBM had set the standard for 8-bit bytes in its architecture, and the adoption of the 8-bit ASCII and EBCDIC codes (the last is an old IBM mainframe standard), one of the design criterias for the x86 was to support 8-bit byte operations. To do this,
the four ?X registers just discussed have instruction addressing modes that just address the lower 8 bits, or the higher 8 bits. Thus the lower 8 bits of the AX or EAX register are called AL, the upper 8 bits of the AX or EAX register are called AH, and for the other three registers you have BL, BH, CL, CH, DL, and DH.

That identifies and partially explains the first four registers. The x86 architecture provides some redundancy, overlap, and override capabilities, so
there is often choice in what registers to use and how when it comes to programming.

The other group of four registers usuallly involve special addressing modes that are used in conjunction with the other four registers and memory operations. Two of these are segment registers, where segment would be a place in memory that marks a start of a sequence of available memory. It sounds like a pointer, doesn't it. Well, this is often where pointer values end up. In the 16-bit form, these would be called DS and ES, which stand for Data Segment and Extra Segment, respectuflly. In 32-bit mode, these are designated EDS and EES.

The other two registers in this group are the offset, or index registers, and are called SI (Source Index) and DI (Destination Index) for 16-bit addressing, or
ESI and EDI when used for 32-bit addressing. Note that DS:SI and ES:DI are used as pairs in 16-bit machines to designate memory addresses above 16 bits in length.

Now a quirk/limitation of the 16-bit architecture is that Intel decided, for some reason, that they would never need a full 32-bit addressing range, so the DS:SI and ES:DI pairs can only address about 1 Megabyte of RAM. The segment register is shifted right 4 places (effectively multiplied by 16) before its contents are added to that of the index register. This represents 2^20, or 1,048,576 addresses. If you remember Expanded RAM, or Extended RAM under DOS, this was additional memory that could be added to the PC, which then required special software drivers to access, such as HIMEM.SYS and EMM386.SYS.
The expanded memory carried you from the 640 MBytes that DOS imposed on you, and carried you up to the 1 Megabyte boundary. The extended memory was what you could page in above the 1 megabyte boundary limit using a somewhat kludgy method of access. This is where DOS suddenly really began to show its age. Fortunately, Windows allows you to access and use much, much more memory than you could under DOS.

Well, that's only 2^20 addresses, and the pointers under PowerBasic are set at 2^32 bits, which translates into a maximum of 4,294,967,296 addresses. So do
we still see EDS and ESI (or EES and EDI) combined by shifting to create a larger address space? Actually, no. Either EDS or EES, as segment registers, can support the 32-bit pointer values used by PowerBasic. You can still add ESI and EDI to either segment register, and in fact you can also add the EBX register if that helps. But since PowerBasic does not do either of these within the scope of the PowerBasic statements that it translates to assembler, that leaves the ESI and EDI registers essentially unused, when just writing BASIC statements.
So as a consequence, PowerBASIC allows you to designate up to two integer values to be assigned to registers (via the #REGISTER and REGISTER instructions), with the first one being placed in ESI and the second in EDI.

Now PowerBasic tells you in its inline assembler section that you should not alter the contents of EDS or EES, as they are used by PowerBasic itself. This is an important matter, and it can really limit you sometimes, especially if you need to support 16-bit mode. For now, take this as a statement of fact, but we will discuss how to get around it later in this post.

If you want to use ESI or EDI for something else, or you intend to modify the variables in memory that get assigned to ESI or EDI, and you do not want PowerBasic to accidently overwrite these when it rewrites the registry to memory, then it would be best to use #REGISTER NONE to prevent PowerBasic from hindering your separate efforts in assembly code.

These are not the only registers in the x86 architecture. You have the status
register, which bits are conditioned with the results of various operations, and generally tested with branching statements, such as ! JE somewhere, which will jump somewhere if the EQUAL flag is set. ! JNE would jump somewhere if the EQUAL flag is clear (zero, meaning not equal or not true). You rarely manipulate
the status register directly, but since it can be pushed to and from the stack,
and since other registers can also be pushed to and from the stack, it is effectively possible to get it into another register and directly change it, then put it back.

You have the IP register, which is the location of the current programmed instruction (Instruction Pointer) being executed. This is used for the different
jump and CALL statements, and it defines where you are in the execution of your program. A copy is placed on the stack automatically when you do a CALL, so that the return (RET or RETF) command knows where to go back to later. Again, you aren't expected to directly change the contents of the IP register
(and can cause serious issues if you try), but anything that can be put on the stack can be accessed via another register, so it can't be said that it can't be done (or hasn't been done). Again, with 32 bit machines, this would be the EIP register.

You should have some idea of what a stack is now, and it may not surprise you that the stack is supported by a Stack Pointer (SP), which is a pointer that marks the bottom position of the stack in memory. The stack is not fixed in memory, in fact you can have multiple stacks involved, one for each program perhaps. But there is only one stack pointer, and it always points to your stack when you are executing your program. The stack is where parameters are placed when calling SUBs and FUNCTIONs in PowerBasic, but these are all part of a stack frame that PowerBasic sets up and maintains for you before and after you make that call. When you look at assembly code, you will often see where
information is being read from memory using the stack pointer (SP) with an offset. This is the method by which those parameters are accessed once you are in the procedure (a procedure being either a SUB or a FUNCTION). The offset value used determines the parameter you are currently accessing. So, in essence, the stack acts as sort of an array with the SP being the pointer and the offset the index into the array. PowerBasic automatically determines the offset required for accessing any given parameter. Again, with the move to 32 bit architecture, the SP becomes ESP, to support the larger address space.

And then there is the Base Pointer, or BP register. Again, this sounds like a register that was planned to be used with pointers. Perhaps it was. But a programmer found a better use for it - a register to capture a shapshot of the stack pointer. If you do a ! MOV BP, SP (or ! MOV EBP, ESP), you get whatever the stack pointer is currently pointed to. So anything you add to the stack with a PUSH statement is not reflected as a change in the base pointer, and you can
POP values from the stack and still not cause a change in the base pointer. The base pointer then serves to ensure that, when compared to the stack pointer,
that all the PUSH and POP instructions have effectively cancelled themselves out. And there is something else you can use the BP register for, which is to force the SP register back to its original value at any time. If you pushed a lot of values or parameters on the stack, call some process, come back from that process, then do a !MOV SP, BP (or !MOV ESP,EBP), you effectively nullify all the previous pushes, cancelling them at once.

However, there is only one Base Pointer as well, and if I use it in my process, and someone else uses it in their process, what happens to the present or previous contents of the base pointer? Here is where it gets a bit tricky. Before you do a ! MOV BP, SP, to put the contents of the stack pointer into the base pointer, you do a ! PUSH PB, which saves the contents of the BP on the stack.
Then your ! MOV BP, SP sets the Base Pointer onto the stack. Now the first thing on the stack that the base pointer points to is its previous contents. The next thing is usually the return address to get it back to whatever process called it. Above that in the stack (memory offset from the stack pointer) is the
parameters that were passed when this call was made. Above that is whatever called before, which has not yet been returned from.

Now since the base pointer and the stack pointer initially are set equal by the
! MOV command, we have the option of using either as our stack reference point. If we use BP plus an offset, we get a comparitively static method of accessing parameters. If we use SP plus an offset, we get the same effect, unless we perform any further PUSH or POP instructions, which would change the contents of SP, and cause us to require a different offset to reach the same parameter.

Thus, if someone is not intending on using any PUSH or POP instructions in his procedure, he may resort to using SP and offsets to reach passed parameters, and may not use the BP pointer at all. This makes for faster programming. But
it also means that any register changes will possibly effect the calling program on a return. Sometimes this is what you want, often it is not.

On the other hand, if you want the freedom to use any register and/or the stack, then you would not only place BP on the stack, move the SP to BP, but you would push other registers on the stack as well, including the status register and any registers that would be effected by your process. Now registers have to be popped off the stack in the exact opposite order as they are pushed on there, and the pops have to match the pushes on count, and this is an area where mistakes are sometimes made. To deal with this, later CPUs had the PUSHALL and POPALL instructions added, which saves all registers onto the stack, and restores all registers from the stack, and makes this part of the process simple. Note that if you are using the Inline Assembler under PowerBasic, you do not normally need to concern yourself with pushing and popping the registers. PowerBasic thoughtfully takes care of that for you when you call a procedure, or exit from one.

But, remember the previous discussion about how PowerBasic uses the contents of EDS and EES for its own purposes? And that if you empower the use of the ESI, and even the EDI registries for integer variables, that these should be considered off limits as well (unless you mean to change the content of the assigned variables)? Well, now you have a case where you might want to save the contents of two to four specific registers, to restore them before the end of the current procedure. You can use specific PUSH and POP instructions to handle this. But there can be a serious downside: What happens if you interlace BASIC statements with assembly code statements? PowerBasic may
need reference to whatever it has in the EDS and EES registers during the BASIC statements, but you planned to put other values in those registers and use them with your assembly code. The chances are strong that you will point to the wrong place at one point or another, and blooey! Your code blows up. To prevent this, either leave EDS and EES intact, or do not interlace your assembly code with PowerBasic statements. You can of course PUSH a register and POP a register at the beginning and end of each segment of assembly code, and no harm done, but it makes your code bulkier and kills some of the efficiency you were looking for. Don't worry about changing EDS and EES inside a procedure, as the PUSHALL and POPALL instructions used by PowerBasic should protect the original contents on exit.

This concludes my discussion of registers for the moment. There are other registers in the CPU, such as control registers, test registers, debug registers, floating point registers, and so on. These would be advanced topics, not necessary for an understanding of the basic principles of assembly coding.

Note that PowerBasic uses the term "optimizing compiler" when describing its products, but optimization is sometimes a matter of perspective. They are optimized to produce extremely fast compiled code, and generally create code that is small and runs fast. But to write a fast compile process, you have to risk not writing the absolute smallest and most efficient code, because the analysis time required would become prohibitively long. Instead of having a finished compile in the matter of a moment or two, or within minutes for a really large program, it could take much, much longer, which would make your efforts to develop new code and debug it even more tedious.

So the PowerBasic compilers represent a good compromise in terms of evelopment and performance. Once you get your program running properly,
land you want to optimize it further. that is when your understanding of assembly code could pay off. In this regard, be aware that integer arithmetic, particularly the use of LONG numeric types, will do the most to optimize your code under PowerBasic. But in looking at the finished code with a disassembler, you may find that PowerBasic used a large number of floating point operations,
which it will do in cases where it was not sure how you intended to do something at first glance. Any assembly instruction that begins with the letter "F" is likely a floating point instruction, and deserves your attention when it comes to further optimization, since floating point instructions are very inefficient timewise when compared to integer mathematics that the CPU can peform on its own with the available registers as described here.

Charles Pegge · May 23, 2007, 08:48:18 AM

Donald, I did some assembler code last year for parsing. Since it incorporates
INSTR functionality with case insensitivity in its latest version.

I thought it might be of interest to you, as you were discussing INSTR in the general section. Hope you can follow my code, I have annotated it quite intensively. Iit is surprising how important the annotation is, even for an author to follow his/her own assembler code, especially after a year's lapse.

It is in the Windows source code section:

http://www.jose.it-berater.org/smfforum/index.php?topic=684.0

Incidently, Freebasic also has an inline assembler but no exclamation marks are needed with this one.

Donald Darden · May 23, 2007, 09:58:31 PM

Charles, I read your post yesterday, and am interested in examining your code. Right now though, I am in considerable pain, and need to go see a doctor, so I will catch up with you later.

For those interested in the discussion about performing searches, I should point out that PowerBasic compilers now support two commands for this purpose:
REGEXPR statement
REGREPL statement

The INSTR() function in PowerBasic works quite well, and allows you to search from left to right, right to left, for a specific word or phrase, or one of a selection of characters. However, it can be sometimes be difficult to apply any one tool or approach to a particular need, so having a choice of tools can be very helpful.

Don't be too alarm about my mention of pain, it's been building up for a few months, and has finally reached a point where I can't escape it, even in sleep.
So I will give the doctors a chance to do their worse.

Charles Pegge · May 23, 2007, 11:36:39 PM

Hope you will be feeling better soon Donald. Being in pain is a pain!

On the subject of assembler in general, I find that of all the computer languages I have encountered, assembler is one of the most satisfying.
Hexadecimal better still! May be its my hardware roots or the knowledge that the code carries no wasted cycles whatsoever and you know, or should know exactly what the CPU is doing at each step.

Donald Darden · May 24, 2007, 10:04:01 AM

When I first studied computers in the Navy, they were all based on octal (6 bit) bytes instead of the current 8 bits. Some of the computers had 12 or 18 bit words, and the CDC 1604A had 48 bit words which contained two 24-bit instructions. The art of programming in raw numeric form was called writing in Machine Languiage. The use of letter groups to represent operations like ADD,
SUB, MOV, CMP, and JMP were called MNEMONICS, which means "memory aid".
The development of Assemblers took the mnemonics and replaced them with the necessary numerical equivalents. There was a one-on-one correspondance between the numerical form, which you call hexidecimal (meaning base 16) and
the corresponding nmemonic for that instruction. But as time went on, new pseudo a macro codes were added to the assembler language it give it more power. Various data types and strings of data could be appended with such statements, and the ability to identify structures. You could label portions of your program and reference them using various instructions, and the assembler was able to correlate these to actual locations in memory.

The machines of that era were far less powerful, and generally had fewer instructions and methods of addressing than found in the x86 architecture. The complexity of the X86 means that creating a comprehensive assermbler is a bit of a challenge. Some adaptations avoid or overlook some portions of the x86 capabilities. However, a clever programmer who knows the missing portions and is comfortable in writing hexidecimal code can emulate that instruction by using the hexidecimal value in place of the instruction. You also find the converse on occasion - a dissembler that is unable to translate some binary data back into a valid instruction, and presents it in hexidecimal form instead. If you know the hexidecimal values and the mnemonics or the way the instruction works, you can get through these problem areas with greater ease.

I personally found if beneficial to move on to using an assembler when they became available, but I remember with some fondness my earlier work with machine language routines, and the laborous effort of counting out the correct number of steps for forward and backward relative memory address references.

Incidently, a long day in pain, where I was subjected to a sedative, two pain killers, and a muscle relaxant, and my pain has largely disappeared. The tests for other causes were negative, so the doctors concluded it was due to some disk desease, pinched nerves, muscle spasms, and feedback from being with pain for so long. It got really bad for awhile there, but I guess it was just necessary to break the pain cycle.

Charles Pegge · May 24, 2007, 12:07:33 PM

Glad you are feeling better Donald. You will have to avoid long sessions on the computer and move around to keep the back supple.

One warning when using Freebasic under Linux: The compiler and the assembler are so tightly integrated that they both share the same variable name space. I happen to use a lot of short variable names so I fall into this pitfall quite easily. This morning, it was a variable called DI, which is one of the x86 index registers.

So any of the variable names synonymous with x86 registers have to be avoided unless you intend accessing them of course.

These include (from my head!)
al,bl,cl,dl,
si,di, ds,es,sp,
ax,bx,cx,dx,
eax,ebx,ecx,edx,esi,edi

This is a problem that does not occur on the MS windows version of FreeBasic.
So it only becomes apparent when you move from MS to Linux. I am told that this is a problem in the GCC compiler, used by Freebasic (version 0.16) at the back end.

Donald Darden · May 24, 2007, 11:43:03 PM

That's very true, and from the top of my head, you should include all extended regester names (those starting with "E", such as EAX). However, this is only necessary if you intend to employ assembly language. Let it be known that single character names, these being "A" to "Z", create no problem, and that two character names, with the second character being "0" through "9", are quite safe. This actually supports early BASIC naming conventions quite well.

Also, double identical letters are safe, these being "AA" to "ZZ", and that already gives you 62 possible variable names without requiring any new rules. Another 26 can be obtained by using triple identical letters, "AAA" to "ZZZ". And if you want to get fancy, you can have certain letters-digits mixes such as AA1, or A1A, etc, that will not accidently duplicate any keyword in Basic or Assembly coding.

With all these patterns possible in your naming convention, you can afford to set aside some groups of names for certain types of variables. In my case, I use
A to L for integer variables, M to T for floating point variables, and U to Z for temporary throwaway variables, like FOR loop counters. For strings, I use double-identical letters AA to ZZ, and for temporary strings, I use something else, such as letter-digit.

It's a very simple naming convention, and I don't use it very rigorously, especially when writing brief, sample code to just illustrate a point. There I might just end up naming variables A to Z, and AA to ZZ, and define them as needed. The distinction that single letters represent numeric types, and double letters represent strings, is the convention I stick to most strongly.

There are a couple of other addressing issues to consider here. In assembly language programming, you have your external devices attached indirectly to the CPU through a chipset, and they appear in the CPU as additional address spaces,
which are numbered, in exactly the same manner as RAM memory is. But these are called I/O ports, rather than memory locations. A single defice may have a number of ports associated with it, where some of those ports serve to control the device, and some are used to exchange data with the device. Some will even return status about the state of the device, but these may also be classified as control ports.

Manufacturer's generally define the port address space allowed for their devices, and follow certain conventions and guidelines that have evolved over the decades. There are overlaps, and inconsistencies, and conflicts. The more devices you attach to your computer, the greater the risk of conflict between different devices. A real problem was the limit that Intel set on Interrupt lines, which were used to alert the CPU whenever a device requested attention. With many more devices needing interrupt service, but having to share existing interrupt lines, a lot of problems were encountered.

Windows finally brought an end to this, and the abandonment of some of the old style parallel and serial devices has helped. New buss standards, such as USB (Universal Serial Bus) were designed to let multiple devices all work with the PC without enduring the same conflicts. We now classify devices that used the old style busses as legacy devices, and while it might still be possible to use them with your newer PC and updated OS, you run the risk of experiencing some of the old problems associated with them.

But now there is a void. Most information on programming in assembly language assumes that you are working with a computer that deals with legacy devices.
This is because most books on assembly language were written when the only PCs around were those that came equipped with those old devices. There are not many new books on assembly language available, and for the latest information, you pretty much are dependent on online sites that benefit from the work of others who still see assembly coding as the way to go.

Another part of the void is that device manufacturers are faced with a requirement to write device drivers for the devices that they design, build, and sell. A device driver is exactly that. It standardizes the way that the device is "seen" and works with a specific operating system. Different drivers are required when working with different operating systems, and each operating system sets what the device interface must look like in order to be integrated with that operating system. So there is a device interface standard for DOS, another one for advanced Windows, yet a third for Linux and Unix, and so on. The big problem then is that your device manufacturer may not have produced a device driver for your device that will enable it to work under the operating system of your choice. He may have decided that your OS did not represent a big enough market to justify the cost and time it would take to make a suitable driver.

Device I/O (Input and Output) is slow compared to program execution speeds, mostly because of mechanical factors, and there may be little real gain in speed by trying to interface with them in assembly language. And as you can see, trying to work with devices directly, or deal with device drivers, can make your job extremely hard. The consequence is that many programmers that use Assembly code, only use it to address data while it is present in RAM. The effort to read it from a device. such as a hard drive, or write it out to another device, is left to the higher level language, in this case PowerBasic, to deal with transparently.

The flexability of deciding what part of your programming will be done in Basic or other high level language, and what part will be done in assembly, can make for a very good mix. You are not forced to do what you don't feel comfortable trying to do, or feel may beyond your skill and knowledge. And the consequence is that you only have to learn as much assembly language as you want to try and use. This is where a language like PowerBasic or FreeBasic really pays off, because they support the easy integration of assembly language into your program when and where you want it. So we will not be addressing device I/O
at this point. The information already provided should help you identify the problem, and let you search further elsewhere for that type of information, but much of it will be outmoded and obsolete, because it will be directed at supporting legacy devices.

Getting back to the use of registers in the CPU, you will note that you have some registers that allow you to address the lower 8 bits and the upper 8 bits directly, or the whole word of a 16-bit register, or the whole word of a 32-bit register. But there is what might be considered a gap: You cannot address the upper 16-bit word of a 32-bit register, and there is no way to refer to the lower or higher order bytes within that upper 16-bit word either.

Well, the plain fact is, that doubling 16-bit registers to 32-bits does nothing for anyone that is interested in byte or character processing. That is still just 8 bit chunks of memory if you use bytes, ASCII or EBCDIC code, or 16 bits if you resort to Unicode. The only time you might want more byte space in a register is when you want to compare consecutive bytes in memory, say for a string search. But the problem is, when you search by 16-bit or 32-bit groups, the basic instructions associated with an efficient search, the automatic increment/ decrement of indirect addresses are going to be by 16-bit or 32-bit chunks of memory as well. So if you try to test for 2 bytes at once, you will increment or decrement your address count by 2 bytes at once, effectively only checking every other byte start of a possible match. With 32-bit registers, you only start checking with every fourth byte. Another problem, is that the most effective may to test for string matches is if you are looking for exact matches. This means the exact same case, and the exact same group separations and punctuation.

If a long word is split with a hyphen, say at the end of a line, so that it is actually written as pre-[cr][lf][tab]columbian, where the codes for carrage return, line feed, and tab are represented within the square brackets, then that would not match with "pre-columbian" or "precolumban", unless we somehow allow for white spaces and possible hypenation in our pattern. In assembler, trying to allow for such variances would be extremely hard, and not supprtable with a simplified search method. Search techniques of this nature would be hard in a higher level language as well, and coming up with a fast, efficient, and thorough search method is something of an elusive goal. Many people look for existing code that will do this for them, and a part of the problem for the assembler programmer is that the x86 instruction set is not optimized enough in this regard to make it somewhat easier or more efficient to create such search techniques.

But you are somewhat stubbron, and want to be convinced on your own, so you ask: But how do I get access to the upper 16 bits of a 32 bit register? How would I treat this as two separate 8-bit bytes if I want to? The answer that is most common, is to swap the upper 16 bits with the lower 16 bits, and then do what you want with the lower 16 bits before swapping them back. Sounds simple, and the x86 does have an XCHG (exchange) instruction for switching any two 8-bit, 16-bit, or 32-bit reference. But it won't work, because we have no way to refer to the upper 16-bits of a register as either a source or destination.
But there is a way to do it, and this involves the rotate left or rotate right insturctions provided. If you rotate a register left 16 places, or rotate it right 16 places, you effectively swap the upper and lower 16 bits. Now do what you want, and rotate the register back 16 bits to restore the upper and lower
portions to their original positions. Another way would be to store the register into a 32-bit address space, then reference each byte or 16 bit segment separately. And a third way would to be to push a 32-bit register onto the stack and then pop of two 16-bit words into 16-bit registers.

If this is not that common, then what is the point of discussing these techniques? The real point is to get you thinking about the nature of having
registers to deal with. How many bits are there? What does shifting have to do with the contents? Numerically, now does shifting effect the contents? What is the purpose of the Carry flag? What happens when I use an ADD command, and later use an ADC command? How about when I use SUB, and later use a SBB or
SBC (usually, only one of these is supported) command? What happens if register pars are used with a shift or rotate command? What difference is there between a shift and a rotate anyway? If you can't answer these questions, then you obviously still have something to learn about the CPU registers. But
don't spend a lot of time looking in booiks for the answers. Most books rush through their description of the registers, because there is so much detail infolved, and they have to get though one chapter quickly in order to start the next. You could write a book just on the registers if you wanted to. No, the best way to learn registers is to execute some assembly instructions in step mode, to note changes that take place in the CPU registers while in the Debug mode with the PowerBasic IDE, and to return the modified value to a variable and print it in a Basic statement so that you can observe any change to it.

This would be an example of testing a left shift instruction and its effects on an integer variable with assembly code:
LOCAL a AS LONG
a = 12345
? a
! MOV eax, a
! SHL eax, 1
! MOV a, eax
? a
You can add other commands if you want, or change the ones you see here.
For PB/CC, you can add these if you like:
FUNCTIO PBMAIN
COLOR 15,1
CLS
... the above
WAITKEY$
END FUNCTION

Remember that you want to save this with some file name, then on the toolbar under Run you want to do a Compile and Debug. The Debugger window will open up, and you want to move it down and to the side a bit so you can see the toolbar and your code above. Then on the toolbar you want to select the icon to bring up the CPU registers, which starts off with eax and its present contents. followed by the remaining registers. Then along the lower portion of the toolbar, you will see three small blue retangles with dotted lines and arrows above them. The hover legend when you put your mouse cursor on each will read "Skip over call", "Step into code", and "Step out of code". They perform differently, and either of the first two would allow you to single-step this simple example, but as a general rule, the middle one is best when stepping assembly language. By clicking on that button repeatedly, you will advance one line in your source code for each click. With Basic statements and multiple statements per line, all the multiple statements get executed at once (a good argument for not using multiple statements in code that needs to be debugged). Since you can only have one instruction per line with assembly, you will see the effects of each instruction as it effects the contents of the CPU registers. You have the option of bringing up the Variable Watch window and watching PowerBasic variable at the same time, to see any changes there as well. Which would be a good alternative to relying on the ? (which stands for PRINT in the console compiler, and MSGBOX in the windows compiler).

After you perform this simple program, you might want to play with the ! SHL
statement a bit. Try changing the eax to ax, al, or ah, and restep the program each time to see what happens. Try to figure out why the results either changed or did not change in the second print (or MSGBOX) statement.

Then replace SHL with SHR, which is the right shift instruction, and repeat the steps above. You can also change the value assigned to the variable a, perhaps even use a negative number there. Play is the best way to learn what your PC can do and how it really works.

When you tire of the SHL and SHR instructions, you can also play with ROL and
ROR, which are another type of shift instruction, and you can also look at how to use two registers in conjunction with each other, by loading each with a value (that's what the !MOV instruction does), then using commands like ADD, SUB, AND, OR, and XOR, with one register serving as the destination and the other as the source. You will quickly learn with these added instructions, that the source and destination registers have to be the same size, meaning the same number of bits, or you will get an assembler error.

Theo Gottwald · May 25, 2007, 08:06:06 AM

Quoteand what part will be done in assembly, can make for a very good mix.

I have a suggestion on this.
You can use the Profile Instruction on the program, then you see:
1. Which subroutines get how often called
2. how much of the time is used in which subprogram

Then we know, it would not make much sense to waste our time on sub's which are only called once.

We take a look at those sub's which use the most time.
And maybe those which use a lot time and are often called.

Thats a general rule:
The more often a Sub is called, the more rewardinga optimization will be at the end.

Sub's in Inner Loops for example are mostly rewarding.
Sometimes we may even think of using a GOSUB instead of a SUB in such cases, depending on the total runtime.
We have discussed this in another topic.

Another thing is, if strings are involved. Then its getting a bit more tricky.
Maybe thats something for an extra topic at times: ASM-Optimization when strings are used.

Donald Darden · May 26, 2007, 10:23:01 AM

Hey, this post is not esclusively mine. Anybody who wants to talk about any aspect of Assembly program is welcomed to do so. Just take off with it. And don't be afraid to criticize what I write either. If I mazke a mistake, it can be fixed.

There is a large number of things that the x86 CPU design supports, and most assemblers follow the Microsoft Assembler mnemonics when it comes to what the various operations, registers, and data types are named. However, it is possible that a certain assembler may lack support for some specific area in the CPU that
is the target machine for the program being developed. Further, the family of x86 processors have evolved, and later models have more features than the earlier ones do. To help you in this regard, to avoid doing something that does not fit with a given CPU's capability, you can usually designated the CPU type that you intend for the program to run on. Since later processor models generally support reatures of earlier models, targeting an earlier model may mean that your program can run on more processors than if you target a later one.

By trial and error, I've found that there are some operations that are allowed by the x86 processor that are not supported with the Inline Assembler in PowerBasic. You get an error when you try to use them. Now there are three things you can do when this happens: (1) You can try to work around the problem by attempting a different instruction or sequence of instructions that should give you the same effect, (2) You can figure out what the hex code sequence is for the instruction that is not supported, enter this as a sequence of data bytes, or (3) You can tell PowerBasic support about your problem, and see if they will address it and fix it for you in their next release.

The first method is often the most expedient way to deal with this problem. Now if you try an instruction, and it does not work, how can you be sure that it is not something that you did, or a misunderstanding on your part? Well, you can consult a good book on assembler and try to puzzle it out yourself, or post a question on a forum devoted to assembly coding. Or you can attempt the same thing in a standalone assembler and see if works there or not. If it works, you can find out what the hex code sequence is and use that with the inline assembler, which was your second option above. And if it does not work under a standalone assembler, you can try it with yet another assembler or conclude that what you are trying to do is simply incorrect or unsupported.

Now previously, I mentioned that the registers in the 32-bit CPUs had the same name as the 16-bit registers, but with an added "E" in front. This is generally
true, but if you look at the registers in the PowerBasic Debugger, you will see that the CS, DS, ES, FS, GS, and SS registers don't have the leading "E" letter, but otherwise are shown to be 32-bits long. That's fine, just something that you need to be aware of. The CS register is the Control Segment, and in 16-bit architecture, used in conjunction with the IP (Instruction Pointer). The DS and ES registers were previouly explained. The FS and GS registers were introduced with the move to 32 bit architecture, and as far as I know, do not have a predesignated purpose. The SS register points to the stack segment, and is used in conjunction with the SP (stack pointer) in 16-bit mode.

The question you have to ask yourself, why does this guy keep describing things in the 16-bit mode? What do these things do in the 32-bit mode? Well, to tell you the truth, I'm not exactly sure. I haven't tried enough things yet to have all the answers, and the books I have tend to be a bit vague on details in some areas. There is still lots for me to learn as well.

This is a small sequence of PUSH and POP instructions for you to study:

Code Select

   
LOCAL a AS LONG
   ! push ebp               'save the current contents of ebp to the stack
   ! push esp               'save the current stack pointer on the stack as well
   ! pop eax                 'get the stack pointer off the stack into the EAX reg. 
   ! mov a, eax             'copy the stack pointer value from EAX into the "a" var.
   ! pop ebp                 'and restore ebp to its rightful register
   ? a

If you compile and debug this code, and step through it, you can have the A
variable tracked in the variable watch window, and have the CPU register open to see any changes there. As you step the code, make sure the ERR state in the variable watch window continues to indicate 0 (zero). If you get an error,
PowerBasic will clear some of the registers and variables.

Now we are going to try something a little different:

Code Select

 
 ! push ebp               'save the current contents of ebp onto the stack
  ! push esp               'save the current contents of esp onto the stack
  ! pop esp                'restore esp contents from the stack to the esp reg.
  ! pop ebp                'restore ebp contents from the stack to the ebp reg.

Again, step the code and it should work fine, and no ERR should occur.

But now we are going to show you something that does not work:

Code Select

 

! push ebp               'save the current contents of ebp onto the stack
  ! push esp               'save the current contents of esp onto the stack
  ! pop ebp                'restore esp contents from the stack to the ebp reg.
  ! pop ebp                'restore ebp contents from the stack to the ebp reg.

This time the PowerBasic debugger will give you a ERR 24 when you get to the third instruction. What does this mean? Well, an ERR 24 is associated with TCP and UDP connections, according to the Help file, so it obviously is not a PowerBasic error. It must be coming from the Assembler.

Apparently, the Assembler is alerting you to the fact that you pushed the contents of one register onto the stack, and attempted to pop in back into another. The Assembler probably meant it as a warning, just in case you did not mean to do this, but PowerBasic took it as a fatal error.

This then, indicates that some techniques used by programmers with a standalone assembler are not going to work as expected under PowerBasic. But if this is true, then why didn't the Assembler (and PowerBasic) complain when we popped the stack pointer back into the EAX register in the previous sample? The answer must be that the Assembler is selective about which operations or registers it is concerned with. EAX is often used for all manner of processors,
but EBP and ESP are registers of some concern, and there may be validity checks in place for them.

So, the inline assembler may be trying to watch out to make sure that you do not do something stupid, and in the process, may keep you from doing something too clever. But if it is a valid instruction, you won't really know unless you step the code and watch the ERR state for any change.

The displayed Registers in the Debugger also show you a set of FLAGS. This may be preceeded by EFLAGS in the 32-bit nomenclature. These correspond to various states, and control the outcome of some arithmetic operations and impact on how some conditional branch instructions work. I could not find too may writeups on the bits in the FLAG register, so I will present this one as a guide:

Code Select

  bits 31 - 12 (bits 15 - 12 in 15-bit design) - unused
     bit  11 - Overflow Flag (appreviated OV or OF, when clear is NO)
     bit  10 - Direction Flag (set is down, or DN; clear is UP)
     bit   9 - Interrupt Flag (set is Enable Interrupts = EI; 0 is Disable Interrupts = DI)
     bit   8 - unused
     bit   7 - Sign Flag (set is negative = NG; clear is positive or plus = PL)
     bit   6 - Zero Flag (set is zero = Z or ZR; clear is nonzero = NZ)
     bit   5 - unused
     bit   4 - Auxillary Flag (used to indicate a carry/borrow in BCD operations) interest)
     bit   3 - unused
     bit   2 - Parity Flag (set if operand bit count even or PE; clear if not or PO)
     bit   1 - unused
     bit   0 - set if carry (C or CY); clear of no carry (NC))

The capability to perform BCD (Binary Coded Decimal) uses each Hex (4 bits or
nybble) to count only between 0 and 9, forcing an early carry (or borrow). This
is helpful in performing simple adds and subtracts of decimal values, but the
need to do more complicated math really required the use of numbers in binary or floating point form. Consequently, BCD is rarely used or taught anymore.

The parity flag wazs intended for serial communications, to set or verify that each byte or word was sent and received without any bits being dropped. However, as most serial communications involved dial-up connnections and the use of modems, which had their own parity and error checking capabilities, the use of parity instructions in the x86 has also been largely ignored.

The branch instructions all start with "J", such as JE (Jump if Equal) or JNE (Jump if Not Equal). The one exception is the LOOP instruction, that tests the contents of the CX (or ECX) register before deciding to branch, and if not zero, it decrements the contents instead. So you do not need to look at the flag bits directly, just use an appropriate branch instruction, which checks the corresponding bit for you.

The carry bit allows the results of an addition or subtraction operation to be extended to multiple words or registers. The overflow flag gets set during operations involving the carry bit, when the outcome may be in error. You need to read up on its use to understand it better.

Because the carry flag can be set or cleared with an instruction, it often is used to signal whether the result of some operation was successful or not. Then you can use the JC (Jump if Carry Set) or JNC (Jump if No Carry) as a way to test the outcome.

Because math and comparison operations are an overriding concern, the x86 does not set or clear flags based on INC or DEC (increment and decrement) operations. If you need to perform an operation that will change any of the flag states, but you want to retain the current flag state for later use, it is not uncommon to push the flags onto the stack, do your secondary opeation, then
pop the saved flags back into the flag register.

I mentioned before that there should be a pop for every push, and each push has to come before the corresponding pop. This is mostly true, at least when performing sequential programming. But when you resort to branching, then you may end up with two or more branches having to have their own pop instructions to pull off all the previous pushes. In other words, every exception follows some rule in its own way.

If you can find it, one of the best books available for 80386 architecture and design is called Assembly Language Programming for the Intel 80XXX Family by William B. Giles, copyright 1991 by Macmillan Publishing Co. It is a hardcover textbook used in college courses, and I found my copy in a used book store. There may be better or more recent books out there, but assembly language programming has lost favor in the last decade or so, Any other resource suggestions from readers of this thread would be welcomed. The 80386 covers much of the essentials found in later CPU architectures, such as the 486. 586. 686. Pentiums, and further.