GoAsm Assembler Manual
A "Go" development tool: http://www.GoDevTool.com
Version 0.56.4e by Jeremy Gordon -
GoAsm Assembler and Tools forum
(in the MASM forum)
GoAsm Assembler and Tools forum
(in the Windows Programming forum)
Old forum messages
Go to Alphabetical Index

Introduction

How to use this manual
Why is a new assembler needed?
Versions and updates
Discussion forum
Integrated Development Environments (IDEs)
Legal stuff
Acknowledgements

GoAsm's design

GoAsm's features in a nutshell
Syntax aims and compatibility with other assemblers
Why GoAsm does not type or parameter check
Why GoAsm requires square brackets for writing to, and reading from, memory
Supported mnemonics

Beginners

Make an asm file
Insert some code and data
Assemble the file with GoAsm
Link the object file to make the exe

Basic GoAsm elements

Starting GoAsm
Sections - declaration and use
Data declaration
Code and the starting address
Labels: unique, re-usable and scoped
Short and long code jumps
Accessing labels
Calling (or jumping to) procedures
Calling Windows APIs in 32-bits and 64-bits
Calling Windows APIs using INVOKE
PUSH or ARG pointers to strings and data
Moving pointers to strings and data into registers
Type indicators
Repeat instructions
Using character immediates in code
Numbers and arithmetic
Characters in GoAsm
Operatives

Advanced features

Structures - different types and uses
Unions
Definitions: equates, macros and #defines
Importing: using run-time libraries
Importing: data, by ordinal, by specific Dll
Exporting procedures and data
Automated register/flags save and restore using USES...ENDU
Callback stack frames in 32-bits and 64-bits
Automated stack frames using FRAME...ENDF, LOCALS, and USEDATA
Conditional assembly
Include files (#include and INCBIN)
Merging: using static code libraries
Unicode support
64-bit assembly
x86 compatibility mode (32-bit assembly using 64-bit source)
Sections - some advanced use
Adapting existing source files for GoAsm

Miscellaneous

Special push instructions
Segment overrides
Using source information
Using the location counters $ and $$
Alignment and the use of ALIGN
Using SIZEOF
Using branch hints
Syntax to use FPU, MMX, XMM, SSE, SSE2 and 3DNow! registers
Reporting assemble time
Other GoAsm interrupts
GoAsm list file
GoAsm's error and warning messages
Using GoAsm with various linkers

Appendices

"Hello World 1" (Windows console program)
"Hello World 2" (Windows GDI program -1)
"Hello World 3" (Windows GDI program -2)
"Hello Dialog" (create a dialog and see various ways of writing to it)

"Hello Unicode 1" (draws Unicode characters to console)
"Hello Unicode 2" (draws Unicode characters in dialog and message box)
"Hello Unicode 3" (draws Unicode characters using TextOutW, and also demonstrates Unicode/ANSI switching using the Microsoft Layer for Unicode)

"Hello 64World 1" (simple 64-bit console program)
"Hello 64World 2" (simple 64-bit windows program)
"Hello 64World 3" (switchable 32-bit or 64-bit windows program)

Writing Unicode programs
Writing 64-bit programs
"Run Time Loading" (demonstrates how to use run-time loading in large application running on both W9x/ME and NT/2000/XP/Vista and using both ANSI and Unicode APIs)
Do nothing Linux program by V.Krishnakumar
The windows character set


Selection of Tutorials
Full list

Quick start to making a simple Windows program
For those new to Programming
For those new to Assembly Language
For those new to Windows
For those new to Symbolic Debugging
Understand bits, binary and bytes
Understand hex numbers
Understand finite, negative, signed and two's complement numbers
Understand registers
Understand the stack - Part 1
Understand flags and conditional jumps
Understand reverse storage
Some programming hints and tips
Standardized window and dialog procedure
Understand the stack - Part 2
FAQ "When I click on the GoAsm or GoLink or GoRC icon something just flashes on the screen but nothing else happens".

Introduction

How to use this manualtop

If you are interested in why I wrote GoAsm, and the legal and licencing stuff, read on.

If you want a quick view in a nutshell of some of GoAsm's features, then click here.

If you are a beginner and want to see how to make a simple Windows program then click here.

If you would like to see some sample GoAsm code then click here to have a look at a simple "Hello World" Windows console program
or here to see a straightforward "Hello World" GDI (with window) program
or there is a "Hello World" GDI program making full use of stack frames, structures, locally scoped labels, INVOKE and definitions (macros).
There is a list of sample Unicode and also 64-bit programs in the list of appendices.

If you want to read about the aims which drove my design for GoAsm then click here.

If you are just interested in how to use GoAsm then click here for GoAsm's basic elements, here for more advanced ones or here to read about miscellaneous matters.

If you are new to assembler -
welcome to the joys of assembler programming! Make fast, compact working programs. Assembler works well with Windows. It is true that assembler is a low level language but the Windows API (Applications Programming Interface) is a very high level language. The two are very compatible, in 64-bit as well as in 32-bit programming. This document will help you to pick up assembler programming. Have a look at the beginner's section, the appendices and the tutorials for some starter articles.

Why is a new assembler needed?top

There are a number of assemblers available, the most popular being Microsoft's MASM, NASM (from a team originally headed by Simon Tatham and Julian Hall), Borland's TASM and Eric Isaacson's A386. For my part, in the context of Windows programming, none of these assemblers can be regarded as perfect. Some have annoying defects. In writing GoAsm I have attempted to produce an assembler which always produces the minimum size code and which has a clear and obvious syntax, reduced source script requirements and extensions to help with programming for Win32 and Win64. This has also given me the opportunity to write a linker GoLink, which is finely tuned to work together with GoAsm.
Others have also tried to escape the mould, most notably René Tournois who wrote the executable-maker Spasm (now called RosAsm) and Tomasz Grysztar with his flat assembler (FASM).

Versions and updatestop

My intention is to keep GoAsm free from known bugs. So I usually work on bug fixes as soon as I discover them (unless I am on holiday). I usually provide a fix to those who report bugs by sending them (or posting) a copy of GoAsm with a version number which has a suffix letter. Relatively minor bugs are usually accumulated in this way and then eventually published as an update. Such updates can be obtained from my web site at www.GoDevTool.com. A serious bug may result in an immediate publication of a GoAsm update. I am also continuing to enhance GoAsm from time to time: this tends to result in a beta version of GoAsm which is available for test. These test versions are often also available from my web site. Only after testing do these beta changes become an official update.

Discussion forumtop

There is a GoAsm Assembler and Tools forum run by Hutch as part of the MASM forum, where you can air your views about the "Go" tools, ask questions of me and other users, and check for updates. The forum also provides an opportunity for me to consult with you about proposed enhancements to GoAsm and the other "Go" tools. Earlier forum messages can be viewed in the old forum archive.
The Windows Programming forum run by Emmanuel Zacharakis (Manos) also hosts a forum for the GoAsm Assembler and Tools.

Integrated Development Environments (IDEs) top

IDEs are editors which help you to get the correct programming syntax and then run the development tools to create the output files. A full list of IDEs which are suitable for use with GoAsm is available from here.

Legal stufftop

Copyright

GoAsm is Copyright © Jeremy Gordon 2001-2007 [MrDuck Software] - all rights reserved.

GoAsm - licence and distribution

You may use GoAsm for any purpose including making commercial programs. You may redistribute it freely (but not for payment nor for use with a program or any material for which the user is asked to pay). You are not entitled to hide or deny my copyright.

Disclaimer

I have made every effort to ensure that the output of GoAsm and its accompanying program AdaptAsm is accurate, but you do use them entirely at your own risk. I cannot accept any liability for them failing to work properly or giving the wrong output nor for any errors in this manual.

Acknowledgementstop

My grateful thanks to those who have encouraged me to write this program and have offered helpful comments, reports and guidance in particular:-
Wayne J Radburn, of Gatineau, Québec, Leland M. George of West Virginia, Edgar Hansen of Kelowna, British Columbia, Canada ("Donkey"), Daniel Fazekas of Budapest, Greg Heller of the Congo ("Bushpilot"), René Tournois of Liousville, Meuse, France ("betov"), Ramon Sala of Barcelona, Spain, Bryant Keller of Cartersville, Georgia, Emmanuel Zacharakis (Manos), and Brian Warburton of Weybridge, UK. Thanks also for the support, suggestions and bug reports from grv, Jeff Aguilon, Jonne Ahner, Thomas Hartinger, Martyn Joyce, Kazó Csaba, Dmitry Ilyin, Patrick Ruiz, and from all contributors to the GoAsm and Tools Forum, the WinASM Community Message Board and the Windows Programming forum which I have not already mentioned.


GoAsm's design

GoAsm's features in a nutshelltop


Syntax aims and compatibility with other assemblerstop

The syntax acceptable to the assembler is of central importance to any assembler programmer. It varies between assemblers. GoAsm does not create 16 bit code and it works only in "flat" mode (no segments). Because of this, its syntax is very simple. I have chosen what I regard as the best syntax with the main aim of clarity and consistency. You may disagree with my opinion on this. If so I would be interested to hear your views.
When writing GoAsm I toyed with the idea of making it wholly compatible with the syntax used by other assemblers, but this was simply not possible because of the variations between them some of which would produce inconsistency. I also decided against making GoAsm fully compatible with any one assembler.
You will recognise syntax from other assemblers. Where possible I have tried to keep close to what I regard as the best syntax from each assembler in general use. You will also recognise some syntax used in "C" programming. I have mainly followed the "C" "preprocessor" syntax since it seems pointless to do otherwise. This also makes GoAsm's use of preprocessor commands compatible with my resource compiler GoRC.

Why GoAsm does not type or parameter checktop

After some thought I decided that GoAsm ought not to type or parameter check. This, I believe, reduces the size of the source script enormously, and adds to its flexibility and readability. I have concluded that even basic type checking in assembler programming for Windows is not at all essential, and is more trouble than it is worth.
Let me explain.
In type checking the Assembler must check that references to areas of memory are made with the correct size and type of data to suit what the area of memory is intended to be used for. This is achieved through a two-stage process. Firstly when the area of memory is declared the programmer must allocate it a certain "type". Then when the area of memory is used, the programmer has again to state the type of memory intended to be used. If there is a mismatch the assembler or compiler will show an error.

Some assemblers, like NASM, do no type checking at all. Others, like A386, do only basic type checking based on the byte, word, dword, qword and tword types. MASM and TASM, like "C", allow you to specify your own types using TYPEDEF and then type-check based on those specified types.
Parameter checking checks that the correct number of parameters are passed to an API and also type checks the parameters. Most assemblers do not parameter check but MASM permits parameter checking when the INVOKE pseudo mnemonic is used.
The overheads required to achieve full type and parameter checking like a "C" compiler are enormous. Just look through a Windows header file and see the long lists of various types allocated to various structures and to the parameters of APIs. Then look at the efforts of the programmer which are required in the source script to ensure that no error is thrown up by the assembler or compiler.

I decided to follow the NASM example and not even offer basic type checking as A386 provides. I have used A386 over many years and have enjoyed its clean syntax, but I have only found its basic type checking a hindrance when programming for Windows. This is because there are often occasions when you want to write to data, or read from data using a different size of data than used to declare the data in the first place.

As for parameter checking, again I have not even tried to offer this since in my view it unnecessarily complicates things. It again requires enormous lists of APIs and parameters to be provided to the assembler or compiler so that it can check that these match what you are giving the API. Miss one and your program does not compile. Take the example of

PUSH 40h,EDX,EAX,[hwnd]
CALL MessageBoxA
Here is a call to an API which takes 4 parameters. Now it is said that you would like the assembler to tell you if you send the wrong number of parameters. But you don't need this warning. Your program would simply crash if you sent the wrong number of parameters, and you are going to test this call aren't you? Yes! There is no hidden, latent, fault here which will not be noticed at testing stage. Then, it is said that you need type checking in case you send the wrong data size to an API as a parameter. I just can't see this. All parameters to APIs are dwords (with one or two exceptions out of thousands). So you won't be sending the wrong size of data to an API.
I agree that it may be possible to send the wrong type of data to an API. For example you might send a constant when it ought to be a handle. Or you might send the contents of a memory address when it ought to be a pointer to a memory address. However, the API simply will not work if there is an error - again there is no hidden, latent, fault here which will not be noticed at testing stage.

Abolishing parameter and type checking not only frees the assembler from a great deal of work, making it faster in operation, but it also frees the programmer from the headache of manipulating header and include files. It provides greater fluidity in memory addressing, since errors will not be thrown up if you want to use data of a size which does not match the size of the data declaration. So in GoAsm even if lParam has been declared as a dword,

MOV [lParam],AL 
is still allowed. And if LOGFONT is a simple structure of dwords, GoAsm is quite happy for example with
MOV B[LOGFONT+14h],1 
which you might want to use to set a font to italic.

By not type and parameter checking I have been able to abolish EXTRN. GoAsm does not need to know the type of symbols which are declared outside the source file (ie. to be found during linking). I hope you will agree this relieves you from a lot of hard work and anguish in having to add those EXTRNs in your larger programs.
The corollary to the abolition of type and parameter checking is that you must tell GoAsm the size of the data to be worked on, if this is not obvious.
So, for example,
MOV [MemThing],23h is an error. To load 23h as a byte into MemThing you need to code MOV B[MemThing],23h. This is because GoAsm will not know at assembly time whether the 23h should be loaded as a byte, word, or dword, all of which are permitted by the MOV instruction.

In some ways the requirement for a type indicator (when the type is not obvious) is helpful. This is because you can see from the instruction itself how much memory is affected by the instruction. You don't have to look up a particular data declaration to see its type in order to see what the instruction will do. So, for example:-

MOV B[MemByte],23h  ;comforting to see this is limited to a byte operation
FLD Q[NUMBER]       ;useful to know real number loaded with double precision
INC B[COUNT]        ;essential to know this can count only up to 256
Another advantage arising from no parameter checking is that there is no need to decorate the names of imports, and in turn there is no need for LIB files at the linking stage when using the companion program GoLink.

Why GoAsm requires square brackets for writing to, and reading from, memory top

Assembler programmers have debated for a long time whether for consistency all memory addressing should be done using square brackets. The argument is that since you must use square brackets when the address is contained in a register for example MOV EAX,[EBX], then you should also use square brackets when the address is pointed to by a label for example MOV EAX,[lParam]. I have followed the debate with interest. MASM and A386 made it optional, so that these two instructions did exactly the same thing:-
MOV EAX,lParam
MOV EAX,[lParam]
However, A386 differentiated between labels with and without colons so that the above was only true if lParam was declared as follows
lParam DD 0
but not if it was declared as:-
lParam: DD 0
In that case MOV EAX,lParam in A386 would act the same was as MOV EAX,OFFSET lParam. Very confusing!
NASM took the plunge by making it a requirement for any memory addressing to be in square brackets. However it still allowed:-
MOV EAX,lParam
In NASM this is the same as MOV EAX,OFFSET lParam for other assemblers.
So when looking at assembler code, without knowing the syntax of the assembler concerned, you can never be really sure what MOV EAX,lParam does. The same instruction can do two entirely different things depending on which assembler is used.
Borland TASM when switched to "Ideal" mode outlawed MOV EAX,lParam altogether, and only allowed
MOV EAX,[lParam]
or
MOV EAX,OFFSET lParam
I tend to agree with this approach. The main aim here is to ensure that coding is unambiguous.
For this reason I have decided that GoAsm must be strict about this question. This avoids all ambiguity. Therefore in GoAsm
MOV EBX,wParam
is completely outlawed, unless wParam is a defined word. In order to get the offset in GoAsm you must use
MOV EBX,ADDR wParam
or if you prefer
MOV EBX,OFFSET wParam
which means the same thing
In order to address memory in GoAsm you must use
MOV EBX,[wParam]

Supported mnemonicstop

What is a "mnemonic"?

A mnemonic is an instruction in word form which you use in your assembler source script. GoAsm assembles the mnemonic instructions and converts them into the opcodes which the processor executes. This is sometimes called machine code. The mnemonics themselves are recommended by the processor manufacturers. They are intended to convey in short form as precisely as possible what the instruction does. Although there are now over 550 mnemonics, in fact in everday use an assembler programmer only uses about 20 or 30 of these on a regular basis. See for those new to Assembler for a list of the most commonly used mnemonics.
For the sake of transportability of source scripts and consistency in the light of possible updates, all assemblers normally recognise all mnemonics at the level at which they assemble. But the processor knows nothing about the mnemonics and only works in machine code itself. Non-assembler programmers never use mnemonics. A compiler working solely in "C" for example still produces machine code, yet it does not work with mnemonics as such (unless switched to in-line assembly mode).

Which mnemonics are supported by GoAsm?

GoAsm supports all the mnemonics at its level of assembly including the x87 floating point instructions, MMX, SSE, SSE2, 3DNow! and 3DNow! extensions instructions. GoAsm supports the CMP pseudo instructions which may be used with the XMM registers.
GoAsm does not support some mnemonics because they are used solely for 16-bit programming. These are IBTS, IRETW, JCXZ, RETF, and XBTS.
GoAsm does not support the string mnemonics which require additional operands, and where there is an easier mnemonic to use. These are:-
CMPS - use CMPSB or CMPSD
INS  - use INSB or INSD
LODS - use LODSB or LODSD
MOVS - use MOVSB or MOVSD
OUTS - use OUTSB or OUTSD
SCAS - use SCASB or SCASD
STOS - use STOSB or STOSD
XLAT - use XLATB

Beginnerstop

1. Make an asm file

The asm file is a file which you make and edit using an ordinary text editor, such as Paws which you can download from my web site, www.GoDevTool.com, or a program like Notepad or Wordpad which comes with Windows. If you use Notepad or Wordpad you should make sure you save the file in a format which adds no control or formatting characters, other than the usual end of line characters (carriage return and line-feed). This is because GoAsm only looks for plain text. You can achieve this by saving the file as a "text" document. If you don't use an extension for the file (the extension is the characters after the "dot") then the editor may give the file a ".txt" extension but you can change this by renaming the file (you can rename the file by right-clicking on the name using Windows Explorer or My Computer).
It may be that you cannot see the extension on your computer, because it may be set that way. To see the extensions of your files from Windows Explorer, choose the menu item "View", "Folder options", then the "View" tab and ensure that the "Hide file extensions for known file types" is not checked. The procedure may differ slightly in different versions of Windows.
It is traditional amongst programmers to give their source scripts an extension which matches the language in which the source code is written. For example you might have an assembler file called "myprog.asm". Similarly you will usually find source code written in the "C" language with the extension ".c" or ".cpp" (for "C++"), ".pas" for pascal and so on. However, there is no magic in these extensions. GoAsm will accept files of any extension or files which do not have an extension.
The .asm file contains your instructions to the processor in words and numbers. These are executed by the processor when the program is run. It is said therefore, that the .asm file contains your "source code" or your "source script".

2. Insert some code and data

As an example let's look at the code and data in a simple 32-bit Windows program which writes "Hello World" to the MS-DOS (command prompt) window (the "console"). This is what you would put into your asm file:-

DATA SECTION
;
KEEP DD 0               ;temporary place to keep things
;
CODE SECTION
;
START:
PUSH -11                ;STD_OUTPUT_HANDLE
CALL GetStdHandle       ;get, in eax, handle to active screen buffer
PUSH 0,ADDR KEEP        ;KEEP receives output from API
PUSH 24,'Hello World (from GoAsm)'    ;24=length of string
PUSH EAX                ;handle to active screen buffer
CALL WriteFile
XOR EAX,EAX             ;return eax=0 as preferred by Windows
RET
Note that anything after a semi-colon is ignored, so you can insert comments. See operatives for other comment forms. See provide good comments and descriptions for the importance of comments.
The first line in this file opens the data section. See sections - declaration and use for the importance of sections and how to use them.
Then we declare a data area of 4 bytes (DD means a "dword" or "doubleword" which is 4 bytes), identify it with a label "KEEP" and initialise it to zero. See data declaration for an explanation of "data" and how to declare it.
Then we open the code section and provide the label "START" to tell the processor where to start executing the instructions. See code and the starting address for an explanation of "code" and the starting address.
The next instruction "PUSH -11" puts minus 11 decimal on the stack ready for the call to the Windows API GetStdHandle on the next line. See understand the stack for an explanation of the stack and the PUSH instruction. See understand finite, negative, signed and two's complement numbers for an explanation what is meant by "-11 decimal". See for those new to Windows for an explanation of an API. The instruction "CALL" transfers execution to the API and on return from the API, execution continues on the next line. See transferring execution to a procedure.
Then there are five more PUSHes onto the stack. Note that some of these are repeated PUSH instructions separated by commas. See repeat instructions to see how this works. These PUSHes get ready for the call to the API WriteFile. In order, they are PUSH 0 (zero); then the address of KEEP (see accessing labels; then the number 24 decimal which is the length of the string (ie. the words in quotes), then a pointer to the string (see PUSH or ARG pointers to strings; then the register EAX (see understanding registers).
Then the value of zero is put in the register EAX using the instruction XOR EAX,EAX. This does the same as MOV EAX,0 but produces less code. See some programming hints and tips for similar tips.
Finally RET finishes the program by returning to the caller (in this case Windows itself). See an explanation of this.

3. Assemble the file with GoAsm

Having put in the code and data to your file you are ready to make your program. This is done in two steps. First you need to assemble your file and then you need to link it. In order to do this you need to open an MS-DOS (command prompt) window. See how to do this. In this case you use the command line:-

GoAsm /fo HelloWorld.obj filename
where filename is the name of your asm file. See starting GoAsm for how to use the command line for GoAsm.
GoAsm makes an "object" file containing your code and data; this file has the extension ".obj" and is in a format suitable for the linker. See more information about the object file.

4. Link the object file to make the exe

The final step is to "link" your program to make the final executable. You can use GoAsm's companion program GoLink to do this. This is what you need in the command line:-
GoLink /console helloworld.obj kernel32.dll
(add "-debug coff" if you want to watch the program in the debugger).

Note that the GetStdHandle and WriteFile calls are to kernel32.dll which is why the name of that Dll appears in the GoLink command line. See for more information about Dlls. See using GoAsm with various linkers for more information about using GoLink and other linkers if you prefer. See the GoLink help file for other GoLink options.

GoLink creates the file HelloWorld.exe. You can then run this program from the MS-DOS (command prompt) window. Type in HelloWorld and press enter. You will see the string you sent to WriteFile is written in the console.

So let's recap by looking back at the lines in your source script.
See that first you asked Windows for a handle to the console window. This was returned by the API GetStdHandle and held in the EAX register. This handle and the string to write were passed to WriteFile. In other words you told Windows to write the specified string to the console. Information about exactly how to use the APIs and the parameters which need to be passed to them is available from Microsoft from the MSDN site (look for the "Platform SDK"). Finally see suggestions how to organise your programming work.


Basic GoAsm elementstop

Starting GoAsm

The command line syntax is:-

GoAsm [command line switches] filename[.ext]

Where,

filename is the name of the source file

Command-line Switches
/b beep on error
/c always put output file in current directory
/d define a word (eg. /d WINVER=0x400)
/e=empty output file allowed
/fo=specify output path/file eg. /fo asm\myprog.obj
/gl retain leading underscore in external "C" library calls
/h or /? help
/l=create listing output file
/ms decorate for mslinker
/ne no error messages
/ni no information messages
/nw no warning messages
/no no output messages at all
/x64=assemble for AMD64 or EM64T
/x86=assemble 64-bit source in 32-bit compatibility mode

If no extension is given for the inputfile, GoAsm looks for the file without any extension. If that file is not found than GoAsm looks for the file with an assumed .asm extension.
If no path is given for the input file it is assumed to be in the current directory.
If no filename is given for the output file an object file with the same name as the inputfile is created. For example MyAsm.asm will create a file called MyAsm.obj.

The directory which receives the output file is as follows:-

  • the path given if /fo is specified, or if it is not specified:
  • the current directory if /c specified, or if it is not specified:
  • the path given for the input file, or if no path given:
  • the current directory
    If no extension is given for the output file, .obj is created by default. The listing file is given the same name as the output file with the extension .lst and is created in the same directory as the output file.

    Sections - declaration and usetop

    Why sections are needed

    You must declare a section before you can start coding. The reason for this is that the processor needs to know the attributes of the instructions it is being given. Also the Windows system relies on the attributes to identify parts of your code. Some common attributes are read-only (cannot be written to), read-write (can be written to) and execute (code instructions). Internally the processor deals with the instruction in the most appropriate and speedy manner to suit the attribute. For example, code instructions use the processor code cache, non-code material is regarded as data and may be kept in the data cache.
    When you declare a section in your source script, GoAsm automatically sets the attribute of the section. Once this is done you can start to write the code or data in your program.

    How to declare a section

    In Windows programming we are only interested in four section types, code, data, const, and uninitialised data. You declare code, data or const sections as follows:-

    CODE SECTION
    DATA SECTION
    CONST SECTION or CONSTANT SECTION

    The words "code", "data", "const" and "constant" are reserved to section declarations and an error will be signalled if these words are used elsewhere in your source.

    GoAsm also allows shortened forms to declare a section as follows:-

    CODE
    DATA
    CONST

    You can also use .CODE, .DATA and .CONST if you wish.

    GoAsm automatically adds the attributes to suit the processor and Windows. A code section is given the attributes read, execute, code. A data section is given the attributes read, write, initialised data. A const section is given the attributes read, initialised data (you won't be able to write to a const section). Uninitialised data has the attributes read, write, uninitialised data.
    Except to add the shared attribute, you can't override these attributes yourself. This is because to do so is pointless in the Windows system which has control over the attributes of the section as loaded and running. For example even if you give a code section the write attribute, Windows will not allow you to write to it. Also Windows will not permit you to execute code in a data section. You can change this behaviour however, by calling the API VirtualProtect at run-time.
    In GoAsm you can use a code section to hold read-only data, although there may be a reduction in performance if you do this.

    Declaring a section also sets certain switches in GoAsm which affect syntax and coding. The rules are as follows:-

    • All labels in a code section must have a colon. The reason for this is to identify to GoAsm what is a label and what is not, ensuring that misspelt mnemonics and directives will always be reported as an error.
    • Re-usable labels are only permitted in a code section. If you use these label types in a data section they will be regarded as unique labels and put in the symbol table.
    • GoAsm will report an error if you have an instruction which would write to a const section. The const section is intended for initialised data and strings which will not be written over.

    Uninitialised data section

    GoAsm will make an uninitialised data section in the object file if you declare uninitialised data. This will be called ".bss" to suit other tools. GoAsm names this for you and you cannot change the name because some linkers expect this name. With most linkers, including GoLink, the .bss section does not find its way into the exe. Instead it is merged with a read/write section in the exe. The attributes of the uninitialised data section are read, write, uninitialised data.
    The advantage in declaring uninitialised data, rather than initialised data, is that the executable will be smaller. This is because only the amount of uninitialised data is specified in the executable, and the data itself is not kept in the executable. There is no need to do so since it has no specific value. See declaring ordinary uninitialised data.

    Switching to and from sections

    It's easy to switch to a section and back again. Just use either
    CODE SECTION or
    DATA SECTION
    CONST SECTION or their shortened forms as appropriate. You can do this as often as you like through your source script. GoAsm and the linker will concatenate all instructions intended for each section.

    See also sections - some advanced use on naming sections, shared sections, and section ordering.


    Data declarationtop

    What is "data"?

    In a way all the instructions given to a processor are "data". But assembler programmers use the word to mean information which is either fixed or which can be changed at run-time and which is not actually executed. Data is of four main types:-
    1. Read-only data specified at assemble-time (when the program is compiled) and which is kept in the const section which has a read-only attribute. This is called "initialised" data because its contents are fixed in your source script. At run-time this data can be read but it cannot be written to. In your source script you can give this data labels so that it can be referenced easily.
    2. Data specified at assemble-time and kept in the data section in the executable. Again the contents of the data will be fixed in your source script, but at run-time this data can be read, or written to and read using the data labels.
    3. Data not specified at assemble-time but held in an area which is reserved for it. This is called "uninitialised data" and the amount reserved is recorded in the executable. In your assembler source script you specify how much data to be reserved. You can give it labels but you cannot initialise its contents. The advantage of this type of data is that it takes up no space in the executable. At loading-time the data is established but it has no particular contents at that time. At run-time the data can be read, or written to and read in the same way as data in the data section.
    4. Data established at run-time either by the program itself or by the system. This type of data is not established at assemble-time from your assembler source script. Instead is is established by the operating system when your code is executed.

    Declaring initialised numerical data

    GoAsm follows the traditional assembler syntax for declaring data in your source script.
    In a data or const section a label need not be terminated in a colon. In a code section this is necessary, to help catch syntax errors. Some examples (using data section):-
    HELLO1 DB 0                 ;one byte with label "HELLO1" set to zero
           DB 0                 ;second byte set to zero
    HELLO2 DW 34h               ;two bytes (a word) set to 34h
    HELLO3 DD 12345678h         ;four bytes (a dword) set to 12345678h
    HELLO4 DD 12345678D         ;four bytes (a dword) set to 12345678 decimal
    HELLO5 DD 1.1               ;four bytes (a dword) set to real number 1.1
    HELLO6 DQ 0.0               ;8 bytes (a qword) set to real number 0.0
    HELLO7 DQ 123456789ABCDEFh  ;8 bytes (a qword) set to 123456789ABCDEFh
    HELLO8 DQ 1234567890123456  ;8 bytes (a qword) set to 1234567890123456 decimal
    HELLO9 DT 1.1E0             ;10 bytes (a tword) set to real number 1.1
    HELLOA DT 123456789ABCDEFh  ;10 bytes (a tword) set to 123456789ABCDEFh
    
    Note that DB, DW, DD and DQ accept numbers in both decimal and hex; DD, DQ and DT accept real numbers too.
    See also declaring real numbers, directly loading the exponent and mantissa and loading a file using INCBIN.

    Declaring more data on each line

    A comma after an initialiser means that another initialiser is expected which declares more data, as follows:-
    Label DB 0,0,0,0             ;four bytes set to zero
          DW 33h,44h,55h,66h     ;four initialised words
          DD 33h,44h,55h,66h     ;four initialised dwords
          DD 1.1,2.2             ;two DD real numbers
          DQ 1.1,2.2             ;two DQ real numbers
          DQ 3333h,4444h         ;two DQ hex numbers
          DT 1.1,2.2             ;two DT real numbers
          DT 5555h,6666h         ;two DT hex numbers
    

    Declaring ordinary uninitialised data

    GoAsm follows the traditional assembler syntax here but like A386 does not require the uninitialised section (the ".bss" section) to be declared. Instead, a simple ? ensures that the data is treated as uninitialised. Some examples (within data or const section text):-
    HELLO1 DB ?   ;one byte with label "HELLO1" recorded as uninitialised
    HELLO2 DW ?   ;two bytes (a word)
    HELLO3 DD ?   ;four bytes (a dword)
    HELLO4 DQ ?   ;8 bytes (a qword)
    HELLO5 DT ?   ;10 bytes (a tword)
    
    Orphaned uninitialised data is not allowed: you cannot mix initialised and uninitialised data so this is an error:-
    DATA6    DD 5 DUP 0
             DB ?
             DB 0
    
    However this is ok:-
    DATA6    DD 5 DUP ?        ;5 dwords for the customer
             DB ?              ;a byte to hold the main course
             DB ?              ;and a byte to hold the sauces
    
    This is to allow you to separate areas of uninitialised data so that each separate area can have its own comment
    Uninitialised data cannot be declared until a section has been opened. You can declare uninitialised data within code section text, but the labels must end in colons as usual for the code section, for example:-
    HELLO1: DB ?   ;one byte with label "HELLO1" recorded as unitialised
    HELLO2: DW ?   ;two bytes (a word)
    

    Declaring duplicate data

    GoAsm uses the well established DUP syntax, but does not require any initialiser to be in brackets. Some examples (using data section):-
    HELLO1 DB 2 DUP 0      ;two bytes with label "HELLO1" both set to zero
    HELLO1A DB 800h DUP ?  ;2K buffer not initialised 
    HELLO2 DW 2 DUP 0      ;four bytes all set to zero
    HELLO3 DD 2 DUP ?      ;eight bytes in uninitialised section
    HELLO4 DD 2 DUP 1.1    ;real number 1.1 repeated twice as dwords
    HELLO5 DQ 2 DUP 1.1    ;real number 1.1 repeated twice as qwords
    HELLO6 DQ 2 DUP 333h   ;qword repeated twice
    HELLO7 DT 2 DUP 1.1    ;real number 1.1 repeated twice as twords
    HELLO8 DT 2 DUP 444h   ;tword repeated twice
    
    You can use DUP to declare some data and then initialise each data component individually for example:-
    HELLO300 DB 3 DUP <23,24,25>    ;declare three bytes and set them to 23,24,25
    
    which does the same as:-
    HELLO300 DB 23,24,25            ;declare three bytes and set them to 23,24,25
    
    Although it may seem pointless to do this, the syntax does make it easier to initialise a member of a structure if it contains DUP. See initialising structure members which have DUP data declarations.

    Initialising using character values

    You can initialise to character values by putting characters in quotes, for example:-
    Letters DB 'a'
            DW 'xy'
    Sample  DD 'form'
    ZooDay  DQ 'Saturday'
    
    Unless inserting Unicode strings GoAsm carries out no conversion to the character, so that the actual value inserted in the object file will depend on the current character set at the time of assembly.
    GoAsm does not put in memory the word and dword string declarations here using reverse storage. This follows NASM's lead but is different from MASM's handling of such strings. This means that the first byte in the word above is 'x' and the second is 'y'. The dword at Sample is stored as 'f' then 'o' then 'r' then 'm' when viewed as bytes. This then permits you to code
    MOV EDI ADDR BUFFER
    MOV EAX,[Sample]
    STOSD
    
    which inserts into the buffer the string: form.
    Any bytes not initialised are given a value of zero, for example:-
            DW 'a'        ;first byte is a, second is zero
            DD 'ab'       ;'a' then 'b' then two zero bytes
    
    Repeat character value initialisations are allowed, for example:-
    DD 3 DUP "Hi"
    
    This inserts H then i then two zeroes and this is done three times.

    Declaring strings

    Strings may be in single or double quotes. Some examples are:-
    String1 DB 'This is a string'
            DB 'This is a string with "internal" quotes'
    String2 DB "A string in double quotes"
            DB "I enjoyed the string's contents"
    String3 DB '"A string itself in double quotes"' 
            DB "'A string itself in single quotes'" 
            DB "'A string's own single quotes'"
    

    Declaring more than one string per line

    A comma after a string means that another initialiser is expected which may declare more data, or another string as follows:-
    String1 DB 'This is a string with null terminator',0
            DB 'First string',0,'And another string',0
    String2 DB 22h,"A string's own double quotes",22h
    
    The ASCII values you can use here if you wish are 22h for double quotes and 27h for single quotes.

    Longer strings

    For longer strings you can straddle lines using another DB on the next line, for example:-
    LongString1 DB 'His first program looked like it would be a great success '
                DB 'until he ran it for the first time',0
    LongString2 DB 'His fundamental error:',0Dh,0Ah
                DB 'he did not test it as he went along',0
    
    The ASCII values 0Dh and 0Ah are carriage return and line feed respectively, used to start a new line when the string is drawn on the screen.

    Unicode strings

    In Windows programming you sometimes need to declare Unicode strings in the data or const section, for example in a dialog template. There are several ways to do this in GoAsm and they are described in detail in writing Unicode programs. Briefly, you can use the following methods:-
  • Rely on the basic Unicode format of the source script (GoAsm can read Unicode UTF-16 and UTF-8 files).
  • Use the Lquote symbol used in "C" programming for example:-
                DB L'Hello how are you?'
    
  • Declare Unicode sequence using DUS:-
                DUS 'I am a Unicode string with new line and null terminator',0Dh,0Ah,0
    
    See also overriding using the STRINGS directive.

    Inserting blocks of data

    For larger blocks of data than will conveniently fit on one line, you can either use INCBIN to load the contents (or part contents) of a file. Alternatively you can use DATABLOCK_BEGIN and DATABLOCK_END if it is more convenient to hold the block of data in the source file itself.

    The syntax for a DATABLOCK is as follows:-

    MyBlockData DATABLOCK_BEGIN      ;comment
        .
        . data is inserted here
        .
    DATABLOCK_END
    
    Here all the material between DATABLOCK_BEGIN and DATABLOCK_END is inserted in the output file, and you can then address the data using the label MyBlockData.
    GoAsm regards the data as starting just after the end of the line holding DATABLOCK_BEGIN, and as ending at the end of the last line before DATABLOCK_END.
    The data is inserted in its raw state (no conversion takes place). This means that characters which may not be displayed in an ordinary editor such as spaces or tabs will also be loaded. It also means that the format of the data and the characters which can be used in the data are limited only by the editor you are using to write the source code.

    Initialising using addresses of labels

    Often you need to initialise a dword to the address of a label. In other words after this has been assembled and linked the dword will hold a pointer to the label. The label can be either a data or a code label. For example:-
    MS1 DB 'First string to use',0
    MS2 DB 'Second string to use',0
    Strings DD MS1,MS2            ;Strings to hold address of the strings
    
    then to get ready to use the string MS2 instead of coding
    MOV ESI,ADDR MS2
    
    you can code
    MOV ESI,[Strings+4]
    
    Whole tables can be created using this method and addressed by taking advantage of the * index register multiplier (scaling) for example
    MOV ESI,[Strings+EAX*4]
    
    Here eax, which is zero indexed, holds which string to use. When eax is zero the first string will be used, when eax is one the second string and so on if there are more strings.
    Here is an example using code labels:-
    PROCEDURE_TO_CALL DD FIRSTPROC,SECONDPROC
    MOV ESI,ADDR PROCEDURE_TO_CALL     ;get procedures in esi
    MOV ESI,[ESI+EAX*4]                ;get correct procedure
    CALL [ESI]                         ;call the procedure
    

    Code and the starting addresstop

    What is "code"?

    Code is made up of the instructions contained in a "code" section, that is, with the attributes "code" and "execute". You tell the processor which code instructions to execute. The processor takes the instructions byte by byte and executes them. Each byte of executable code is called an "opcode".

    What is the "starting address"?

    This is also known as the "entry point". In an ordinary executable (.exe file) this is where execution starts immediately after loading. In a dynamic linked library (.dll file) it is where some execution takes place during the loading process.

    How is execution controlled?

    Once execution starts in your executable at the starting address your program has control of where execution will continue. In practice you divert execution using the CALL, JMP and conditional jump mnemonics.

    How do you set the starting address?

    From the above you can see that unless your source script is data-only it is essential to provide a starting address for your program. In GoAsm this is achieved very simply by giving the starting address a label and then telling the linker what this is. Different linkers take this instruction in different ways.
    My linker GoLink assumes the starting address to be START unless told otherwise. So, to use this default you would have the following line in your source script where you want execution to commence:-
    START:
    
    This can be upper or lower case or a mixture.

    If you don't want to use START, you can specify the starting address using one of these methods in GoLink's command line or command file:-

    -entry STARTINGADDRESS
    /entry STARTINGADDRESS
    
    If you are using ALINK only the first method works.

    If you are using the MS linker you need to make a slight change to your label. It must be preceded by an underline character. So your label is _START: in your source script. Then you would use one of these instructions to the linker (without the underline character):-

    -ENTRY START
    /ENTRY START
    
    What is happening here is that the MS linker is designed to work with a "C" compiler which will decorate global labels with the underline character. So the linker looks for the label _START, rather than START. Assembler programmers have had to put up with such quirks in Windows tools for many years but now we have our independence!
    See also using GoAsm with various linkers.

    Labels: unique, re-usable and scopedtop

    What is a label?

    A label is a name which you provide to identify a particular place in data or in code. It is like a bookmark. You can refer to that place and access it by using the label. A data label refers to data, and a code label refers to executable code. A symbol is a label which appears in the symbol table of the object file and which can therefore be seen by the debugger if a debug version of the executable is made.

    Unique labels

    A unique label is one which can only be used once in your source script and in linked object files. It is a label with "global" scope, that is to say, at link-time it can be accessed by other object files. Usually you would provide a name which describes the data or code function, for example NAME_LIST or CALCULATE_RESULT. If you have set your linker to provide debug output, all unique labels are put in the symbol list and passed to the debugger. In GoAsm you make a unique label as follows:-
    NAMEOFLABEL:
    
    This does not output any code, but sets a bookmark called NAMEOFLABEL at the point in data or code where it appears. If you are in a data section, the colon is not obligatory, nor is it obligatory if the label gives the name of an automated stack frame. Therefore the following lines all create unique labels:-
    (in data section)
    HELLO  DB 0        ;label HELLO
    BYE:   DB 0        ;label BYE
    MEAGAIN            ;label MEAGAIN
    (in code section)
    RICE:              ;label RICE
    PEAS: FRAME        ;label PEAS
    BEANS FRAME        ;label BEANS
    
    You can see from this that a single word which is not known to GoAsm to be a directive, mnemonic, data declaration, initialisation of data, or a defined word will be regarded as label. GoAsm expects a colon after a code section label. This is because there are numerous words which must be used in a code section and if they are misspelt, it is important that an error is declared rather than the word being misconstrued as a label.

    Re-usable labels

    Sometimes you need to label parts of your source script with names which you have used before. GoAsm provides two levels of such re-usable labels which can be used in a code section:-
  • locally scoped re-usable labels beginning with a period, and
  • unscoped re-usable labels made up of digits or a character+digits

    The scope of a label defines from where it can be accessed using it own unmodified name. Lets look at these two types of re-usable labels in turn.

    Locally scoped re-usable labels

    These types of labels are created using a period followed by the label for example
    .looptop           ;label looptop
    .fin               ;label fin
    
    The boundary of the scope of these labels is defined by the unique code labels in the source script. In other words the label can be jumped to provided there is no unique label in the way. So for example:-
    JZ >.fin
    CALCULATE:
    .fin
    RET
    
    here the jump instruction will not find .fin because the label CALCULATE is a unique code label in the way.

    If you want to jump past a unique code label to a locally scoped re-usable label, you can either use another unique code label as the destination of the jump, or you can use an unscoped re-usable label. Or for advanced use, you can use the locally scoped label within an automated stack frame see re-usable label scope in automated stack frames.
    Locally scoped re-usable labels are sent to the debugger as symbols together with their "owner". Therefore the symbol sent to the debugger in the above example is CALCULATE.fin.

    Unscoped re-usable labels

    You will often have very insignificant jumps destinations and loops in your code which do not need any name at all. For these you can use a label whose name will not be passed to the debugger as a symbol. This is useful when debugging to limit the symbol table to the most significant names in your code. These labels are made up either of all digits, or one character then one or more digits. You may also use a period as a decimal point which makes it much easier to add new local labels to existing code. The label itself must always end in a colon. Here are examples of unscoped re-usable labels:-
    L1:
    24:
    24.6:
    
    You can even use a single stand-alone colon. You might use this for those extremely insignicant jump destinations in your code.

    Jumping to labels: short and long code jumpstop

    There are several jump instructions. Some will jump if the flags are in a particular state. These are called the "conditional jump instructions". Then there is the JMP instruction which will always jump to the destination irrespective of the flags. Then there is the LOOP instructions and its conditional variants which drop through if ecx=1. The CALL instruction jumps and returns afterwards. All these instructions need a label to jump to.

    The direction indicators

    In order to make your source script more readable, GoAsm uses direction indicators to indicate the direction of the jump. The "back" direction indicator is optional. For example, using locally scoped re-usable labels:-
    JZ >.fin         ;jump forward to .fin
    JMP >.exit       ;jump forward to .exit
    LOOP .looptop    ;loop backwards to .looptop
    LOOP <.looptop   ;loop backwards to .looptop (alternative form)
    
    Here is an example using unscoped labels:-
    JZ >L10          ;jump forward to L10
    JNC L3           ;jump backwards to L3
    JNC <L3          ;jump backwards to L3 (alternative form)
    JMP 100          ;jump backwards to 100
    

    Jumps to unique labels

    These are treated differently, depending on whether the jump is made using a conditional jump mnemonic or not.

    Conditional jumps to unique labels

    You can code conditional jumps to unique labels in the same way as you would for jumps to locally scoped or unscoped labels. In other words use the forward indicator ">" if the jump is to a place later on in the source script. Optionally you may use the backward indicator < to show that the jump is to a place earlier in the source script, or you can omit it.
    Basically GoAsm will not permit you to jump out of a file using a conditional jump. So instead of coding:-
    JZ EXTERNALLABEL
    
    you should code
    JNZ >
    JMP EXTERNALLABEL
    :
    
    This is to help with error checking. GoAsm assumes a conditional jump was meant to be to a place inside the existing source script.

    Unconditional jumps to unique labels

    You may use a direction indicator for these jumps if you wish, but you don't have to.
    If you do use a direction indicator, this will tell GoAsm only to look for the label in the source script: GoAsm will not tell the linker to look for the label in other source scripts.
    If you don't use a direction indicator, GoAsm will find the label if it is in the source script, but if not, it will tell the linker to look for the label in other source scripts.
    For example:-
    JMP LABEL               ;look for label in all source scripts
    JMP <INTERNALLABEL1     ;only look for label earlier in source script
    JMP >INTERNALLABEL2     ;only look for label later in source script
    

    Jumps to single colons

    Single colons are treated as unscoped labels and can be used for your most insignificant jumps, for example:-
    :
    CALL PROCESS
    LOOPZ <
    
    or
    CMP EAX,EDX
    JZ >
    CALL PROCESS
    :
    RET
    

    The importance of long or short jumps

    A short jump is coded using the short relative jump form of instruction which is only 2 bytes. This tells the processor to jump back or ahead in the range +127 bytes or -128 bytes. The actual amount of the jump is contained in the second opcode byte, which is why this type of instruction is limited to this range.
    To jump outside this range a 6 byte instruction is needed. This is called the long relative jump form of instruction.
    Using short jumps not only tightens up your code but also increases speed of execution because the processor has to read and execute fewer bytes in order to carry out the instruction. This will be important for looped instructions which are executed many times in sequence.

    Whether long or short jumps are coded

    GoAsm always tries to make the smallest possible code, consistent with being a one-pass assembler. Here are the rules which are followed:- GoAsm will show an error if a short jump to a locally scoped or unscoped label is specified but cannot be achieved. This is to ensure that you have not made an error in your source script. For example you might intend to jump only a short distance but have forgotten to add the destination of the jump to your source script.

    Telling GoAsm to code a long jump

    Use either the LONG operator or << or >>, for example
    JZ >>.fin         ;long forward jump to .fin
    JZ LONG >.fin     ;long forward jump to .fin (alternative form)
    JC <<A1           ;long backward jump to A1
    JC LONG A1        ;long backward jump to A1 (alternative form)
    JC LONG <A1       ;long backward jump to A1 (alternative form)
    
    Note that there is no long form of LOOP and its variations, nor of JECXZ. If you need a long jump for these instructions use this instead:-
    DEC ECX
    JNZ LONG L2          ;long jump replacing LOOP
    OR ECX,ECX           ;test for ecx=0
    JZ LONG >L44         ;long jump replacing JECXZ
    

    Accessing labelstop

    Getting the address of a label

    This must be done using the ADDR or OFFSET operator. In the final executable under Windows this gives the distance to the label from the start of the section, plus the position of the section in virtual memory. In other words, the address of the label in memory when the executable is loaded and running.

    Here are examples using unique labels:-

    MOV ESI,ADDR Process_dabs  ;get in esi the address of the code label Process_dabs
    MOV ESI,ADDR Hello2        ;get in esi the address of the string labelled Hello2
    MOV ESI,ADDR HelloX+10h    ;get in esi the address 16 bytes beyond HelloX
    
    Here is an example using a locally scoped re-usable label:-
    MOV ESI,ADDR CALCULATE.fin ;get in esi the address of the code label .fin in the CALCULATE procedure
    
    Here is an example using a formal structure:-
    MOV ESI,ADDR Lv1.pszText   ;get in esi the address of the psztext member in the formal structure Lv1
    

    Reading data from the place pointed to by a label

    Reading data from the place pointed to by a label is quite different from getting the address of a label. Here you are reading the data value in the area of memory concerned. This must be done using square brackets. Examples are:-
    MOV ESI,ADDR Hello1     ;get in esi the address of the dword Hello1
    MOV EAX,[ESI]           ;get in eax the value of Hello1
    
    or this which does the same thing:-
    MOV EAX,[Hello1]        ;get in eax the value of Hello1
    

    Writing to the place pointed to by a label

    Here you cause a write to data as follows:-
    MOV ESI,ADDR Hello1     ;get in esi the address of the dword Hello1
    MOV [ESI],EAX           ;write the value in eax to Hello1
    
    or this which does the same thing:-
    MOV [Hello1],EAX        ;write the value in eax to Hello1
    

    Reading and writing to labels using displacement

    Suppose you have simple structure of data declared as follows:-
    PARAM_DATA DD 0     ;+0h
               DD 0     ;+4h
               DD 55h   ;+8h
               DD 0     ;+0Ch
               DD 0     ;10h
    
    Then you can use the label to read from and write to a particular part of the structure using a displacement value as follows:-
    MOV ESI,ADDR PARAM_DATA
    MOV EAX,[ESI+8h]           ;get in eax value of third dword
    MOV [ESI+8h],EDX           ;and insert edx instead
    
    or this which does the same thing:-
    MOV EAX,[PARAM_DATA+8h]    ;get in eax value of third dword
    MOV [PARAM_DATA+8h],EDX    ;and insert edx instead
    
    The displacement value can be any value up to 0FFFFFFFFh. It can be positive or negative. Non-numeric elements must be separated by the plus sign.
    See more about structures.

    Reading and writing to labels using indexation

    Suppose you have 16 dwords of data declared as follows:-
    PARAM_DATA DD 10h DUP 0
    
    Then you could use indexation (scaling) to multiply the index register to suit:-
    MOV ESI,ADDR PARAM_DATA
    MOV EAX,[ESI+ECX*4]        ;get in eax value of ecx dword
    MOV [ESI+ECX*4],EDX        ;and insert edx instead
    
    or this which does the same thing:-
    MOV EAX,[PARAM_DATA+ECX*4]        ;get in eax value of ecx dword
    MOV [PARAM_DATA+ECX*4],EDX        ;and insert edx instead
    
    You can use indexation of 0,2,4 or 8. The following instructions are all valid:-
    MOV EAX,[BPARAM_DATA+ECX]          ;get in eax value of ecx byte
    MOV EAX,[WPARAM_DATA+ECX*2]        ;get in eax value of ecx word
    MOV [QPARAM_DATA+ECX*8],EDX        ;insert edx at ecx qword
    
    Non-numeric elements must be separated by the plus sign.
    In 32-bit coding, only the general purpose 32-bit registers can be used as an index register - EAX,EBX,ECX,EDI,EDX,ESI, or EBP. You cannot use ESP as an index register.
    In 64-bit coding, you can use the general purpose 32-bit registers or 32-bit addressing versions of the new registers (R8D to R15D). Also you can use the 64-bit extensions of the general purpose registers - RAX,RBX,RCX,RDI,RDX,RSI, or RBP, and the new 64-bit registers R8 to R15. You cannot use RSP as an index register.

    Reading and writing to labels using indexation and displacement

    Suppose you have 24 dwords of data declared as follows where the final dword in each case holds the result required:-
    PARAM_DATA DD 19h,0,0,22222h
               DD 1Ah,0,0,44444h
               DD 1Bh,0,0,66666h
               DD 1Ch,0,0,88888h
               DD 1Dh,0,0,0AAAAAh
               DD 1Eh,0,0,0CCCCCh
    
    Then you could use indexation (scaling) and displacement as follows:-
    MOV ESI,ADDR PARAM_DATA
    CMP EAX,[ESI+ECX*4]        ;see if there is eax value at ecx dword
    JNZ >L2                    ;no
    MOV EDX,[ESI+ECX*4+0Ch]    ;yes so get the result in edx
    
    or this which does the same thing:-
    CMP EAX,[PARAM_DATA+ECX*4]        ;see if there is eax value at ecx dword
    JNZ >L2                           ;no
    MOV EDX,[PARAM_DATA+ECX*4+0Ch]    ;yes so get the result in edx
    
    You can use indexation of 0,2,4 or 8. The displacement value can be any value up to 0FFFFFFFFh. In your source script it can be positive or negative. Non-numeric elements must be separated by the plus sign.
    In 32-bit coding, only the general purpose 32-bit registers can be used as an index register - EAX,EBX,ECX,EDI,EDX,ESI, or EBP. You cannot use ESP as an index register.
    In 64-bit coding, you can use the general purpose 32-bit registers or 32-bit addressing versions of the new registers (R8D to R15D). Also you can use the 64-bit extensions of the general purpose registers - RAX,RBX,RCX,RDI,RDX,RSI, or RBP, and the new 64-bit registers R8 to R15. You cannot use RSP as an index register.

    Calling (or jumping to) procedurestop

    What is a "procedure"?

    A procedure is a series of code instructions with a label to which execution can be transferred. Other names for this are "function", "routine" or "subroutine". Here is an example of a short procedure:-
    PROCESS_HASH:       ;label to the procedure
    XOR EAX,EAX
    MOV EDX,ESI
    CALL PH23
    MOV EDX,866h        ;return from the procedure with edx=866h
    RET
    

    Transferring execution to a procedure

    Usually execution is transferred to the procedure by the use of the CALL instruction. This instruction causes the processor to PUSH onto the stack the position in code just after the CALL instruction and then execution will continue in the procedure being called. At the end of the procedure there will be a RET. This instruction causes the processor to POP from the stack the position in code immediately after the CALL and then execution will continue from that point.
    Unusually execution can be transferred to the procedure by the use of the JMP instruction. At the end of the procedure there could also be another JMP instruction, as in this example:-
    PROCESS_HASH:
    XOR EAX,EAX
    MOV EDX,ESI
    CALL PH23           ;transfer execution to the PH23 procedure and return
    MOV EDX,866h        ;return from the procedure with edx=866h
    JMP >SOMEWHERE_ELSE
    ;
    START:              ;start place for execution
    JMP PROCESS_HASH
    ;
    

    CALL and JMP to procedure syntax

    The usual way to call or jmp to a procedure is to use its code label, for example:-
    CALL PROCESS_HASH
    JMP PROCESS_HASH
    
    Sometimes the address of the procedure to go to is held in memory pointed to by a label or a register or even held at a known place in memory in which case you can use for example:-
    CALL [PROCADDRESS]
    CALL [PROCTABLE+20h]
    CALL [ESI]
    CALL [ESI+EDX]
    JMP [4000000h]
    
    Sometimes the address of the procedure to go to is held in a register in which case you can use for example:-
    CALL EAX
    JMP EDI
    

    More complex syntax

    Hopefully you will never have to use any of these forms but GoAsm does allow them (using either CALL or JMP):-
    #define Hello PROCESS_HASH
    CALL Hello       ;treated as a call to PROCESS_HASH
    CALL 100h        ;treated as a call to a relative address
    CALL [HELLO3+ECX+EDX*4]
    CALL [HELLO3+ECX+EDX*4+9000h]
    CALL $$          ;a call to the start of the current section
    CALL $+20h       ;a call 20h bytes ahead
    

    CALL and JMP to procedures outside the object file or section

    Some assemblers require you to say in your source script whether a call is to somewhere outside the object file which is being made, using EXTRN. They also require the destination of the call to be marked as GLOBAL or PUBLIC. You don't have to do either of these things with GoAsm because if the destination of the call is not found when assembling, it is assumed to be an external call. Also all labels which are not local ones or which have re-usable names are assumed to be "global". GoAsm works in the same way when a call or jump is to be made to a code section with another name.

    So if you want to call a procedure in another source script (which will be producing another object file) just call it in the usual way. Similarly if you have a procedure in another executable (usually a Dll) you can do the same.

    For example, suppose you have written My.Dll containing a calculation algorithm you wish to use with the label CALCULATE. You could call it as follows:-

    CALL CALCULATE
    

    In your list of Dlls you give to GoLink you will specify My.Dll. GoLink will first look for the code label CALCULATE in the object files, but will then look in the specified Dlls. Most other linkers look in library files (.lib files) for the functions they contain, which means you have to make a lib file. Either way, in GoAsm syntax there is nothing further for you to do in your source script. If the linker does not find the destination of the call, an error will be shown.
    This form of the call is a relative call using the opcode E8.

    You could also use this form:-

    CALL [CALCULATE]
    
    For this type of call GoAsm uses the opcodes FF15. This is a call to an absolute address. In 32-bit assembly this is a call to a 32-bit address, but in 64-bit assembly its a call to a 64-bit address.

    See also:-
    using static code libraries
    direct importing by ordinal or specific Dll
    using the C Run-time library


    Calls to Windows APIs - 32-bits and 64-bitstop

    Calling Windows APIs (which reside in Windows system Dlls) is very simple where there are no parameters, for example in 32-bit Windows you can use:-

    CALL GetModuleHandle
    
    or its more advanced alternative which can be used either for 32-bit or 64-bit Windows:-
    INVOKE GetModuleHandle
    
    There is nothing else to put in the source script. Since the function being called resides outside the executable you are making, it is the linker's job to find the Dll which contains the GetModuleHandle procedure and it will record the name of the Dll in your executable. GoLink does this from a list of Dlls which you supply.

    Most Windows APIs, however, expect to be sent parameters (also known as "arguments") when they are called. It is the programmer's job to ensure that these parameters are sent to the API correctly. The parameters contain the information, or pointers to information, which tell the API what to do. Sometimes the parameters contain addresses of places in memory where the API will insert information.

    How you send the parameters depends on whether you are assembling for 32-bit or 64-bits Windows. This is because they each use different calling conventions, and this affects the way parameters are sent and used. 32-bit Windows uses the standard calling convention (STDCALL) and 64-bit Windows uses the so-called fast calling convention (FASTCALL).

    GoAsm provides ARG and INVOKE which can be used for both platforms. GoAsm creates the correct code to suit the calling convention to be used. If you are writing only for 32-bits you can use PUSH and CALL to send the parameters, but if you want to port your code to 64-bit Windows later, you will need to change these to ARG and INVOKE. In both 32-bit and 64-bit source code you would use CALL to call procedures in your own executables, unless you are sending parameters to them using one of these calling conventions.

    In the STDCALL calling convention used in 32-bit Windows, all the parameters are put on the stack by the caller, and the stack pointer (ESP) is moved to the top of the parameters on the stack. Then the API is called. The API uses the parameters on the stack and before returning it restores the stack to equilibrium by moving the stack pointer to the position it was before the first parameter was put on the stack.
    In the FASTCALL calling convention used in 64-bit Windows, the first four parameters are put in the RCX,RDX,R8 and R9 registers instead of on the stack. However, subsequent parameters are put on the stack. The caller needs to ensure that the stack pointer (in this case RSP) is moved to the top of the parameters as usual, allowing for the first four parameters which are held in registers (this is to permit the API to keep them on the stack as if they had been put there in the first place). Another difference is that the API does not restore the stack into equilibrium before returning from the call (this change makes it easier for a handful of APIs which do not have a fixed number of parameters).

    To enable the same source to be used both for 32-bit and 64-bit programming you would send the parameters using ARG and then call the API using INVOKE, for example:-

    ARG 40h,RDX,RAX,[hwnd]
    INVOKE MessageBoxA
    
    In 32-bit assembly the ARG simply does the same as PUSH, and INVOKE does the same as CALL. GoAsm accepts a PUSH instruction of a 64-bit General Purpose register, so PUSH RDX is treated the same as PUSH EDX. Therefore the above call works on both platforms. In 32-bit assembly it translates as:-
    PUSH 40h,EDX,EAX,[hwnd]
    CALL MessageBoxA
    
    However in 64-bit assembly, the same code translates as:-
    MOV R9,40h
    MOV R8,RDX
    MOV RDX,RAX
    MOV RCX,[hwnd]
    SUB RSP,20h
    CALL MessageBoxA
    ADD RSP,20h
    
    See writing 64-bit programs for more details.

    Calls to Windows APIs - using INVOKE

    It is obviously important to send the parameters to the API in the right order. INVOKE helps you to do this by permitting you to put the parameters after the name of the API like in "C". This also helps when working with Windows documentation which always describes the parameters for APIs using "C" syntax. For example here is how the API MessageBox is described:-
     
    int MessageBox(
        HWND hwnd,            // handle of owner window
        LPCTSTR lpText,       // address of text in message box
        LPCTSTR lpCaption,    // address of title of message box  
        UINT uType            // style of message box
       );
    
    Using INVOKE you can follow the same order, for example:-
    INVOKE MessageBoxA, [hwnd],EAX,EDX,40h     
    
    which is the same as:-
    ARG 40h,RDX,RAX,[hwnd]
    INVOKE MessageBoxA
    
    Note that ARG (like PUSH) reads the parameters one way, whereas parameters after INVOKE are read the other way.

    INVOKE lets you straddle two or more lines using the continuation character:-

    INVOKE CreateWindowExA, WS_EX_OVERLAPPEDWINDOW, ADDR szClassName, \
                            ADDR szWindowName,\
                            WS_OVERLAPPEDWINDOW+THING,\
                            100,16,400,0,0,0,[hInstance],0
    
    Since GoAsm looks at the parameters to INVOKE starting from the end, errors near the end will be found first.

    When using INVOKE, if you like to tuck away your parameters in a defined word then GoAsm will still get them in the correct order, for example:-

    z_function_params=3,2,1
    INVOKE z_function, z_function_params
    
    produces the same code as:-
    ARG 1,2,3
    INVOKE z_function
    

    Calls to Windows APIs - ANSI and Unicode versions

    Windows APIs which accept character input or give character output (usually in the form of character strings) tend to have two different versions, an ANSI version and a Unicode version. The ANSI version accepts and/or outputs strings in ANSI, where a single byte of value 0 to 255 represents a single character based on the current character set. These characters are also sometimes called "multibyte" characters. The ANSI version of the API will end in "A" as in the CreateWindowExA example below. The Unicode version accepts and/or outputs strings in Unicode, that is two bytes per character based on the current Unicode character set. These are also sometimes called "wide" characters. The Unicode version of the API will end in "W".
    In your source script you need to specify which API you wish to call by adding A or W at the end of the API name. When you link your object file and the linker has been unable to find the API in the other executable (or in the .lib files if you are not using GoLink) this is probably because you have forgotten to add the required A or W. Another reason, however, could be that you haven't provided GoLink with the name of the Dll holding the API (or the correct .lib files if you are not using GoLink).
    If you want automatically to make the correct "A" or "W" call depending on whether you are making an ANSI or Unicode version of your application this can be done using Unicode/ANSI switching. See writing Unicode programs for information about this in detail.
    There is no difference between 32-bit assembly and 64-bit assembly in this respect. This is because 64-bit Windows has ANSI and Unicode versions of the APIs just like 32-bit Windows.
     

    PUSH or ARG pointers to strings and datatop

    Pointers to null terminated strings

    GoAsm supports an extension of PUSH or ARG which is very helpful when programming in Windows. Often in Windows you need to send to an API a parameter which is a pointer to a null-terminated string for example (in 32-bits):-
    MBTITLE   DB 'Hello',0
    MBMESSAGE DB 'Click OK',0
    PUSH 40h, ADDR MBTITLE, ADDR MBMESSAGE, [hwnd]
    CALL MessageBoxA
    
    To make this easier GoAsm permits the use of PUSH or ARG like this:-
    PUSH 40h,'Hello','Click OK',[hwnd]
    CALL MessageBoxA
    
    or, if you were writing source for 32-bit or 64-bit platforms:-
    ARG 40h,'Hello','Click OK',[hwnd]
    INVOKE MessageBoxA
    
    or if you prefer to send parameters after INVOKE:-
    INVOKE MessageBoxA, [hwnd],'Click OK','Hello',40h
    
    You can also use this with Unicode strings as follows:-
    ARG 40h,L'Hello',L'Click OK',[hwnd]
    INVOKE MessageBoxW
    INVOKE MessageBoxW, [hwnd],L'Click OK',L'Hello',40h
    
    When you use any of these forms the string will always be null-terminated. What is happening here is that GoAsm places the string in the const section if there is one (or the data section if there is one, if not, in the code section) and adds a null-terminator. Then GoAsm creates the correct instruction and gives it a pointer to the string. No symbol is made for debugging purposes.

    In 64-bit assembly, GoAsm ensures that Unicode strings are aligned on a word boundary as required by the system.

    Pushing pointers to raw data

    You can do a similar thing with ordinary raw data (in bytes) using the < and > operators. For example:-
    PUSH <23,24,25>                  ;push a pointer to the bytes 23,24,25
    
    or
    PUSH <23,6 DUP 20h,23>           ;push a pointer to the bytes 23,six spaces then 23
    
    or
    PUSH <'Hi',0Dh,0Ah,'There',0>    ;push a pointer to the null terminated string on two lines
    
    You can also use the < and > operators in this way with ARG and after INVOKE. What is happening here is that GoAsm places the data declaration between the < and > operators in the const section if there is one (or the data section if there is one, if not, in the code section). Then GoAsm creates the correct instruction and gives it a pointer to the data. No symbol is made for debugging purposes.
    Note that when using the < and > operators in this way no null terminator is added to strings.

    In 64-bit assembly, GoAsm ensures that data is aligned on a word boundary as would be required by the system if the data contains Unicode strings.

    Moving pointers to strings and data into registerstop

    You can also establish null terminated strings and data and move pointers to them into registers using the following syntax (for example):-
    MOV EAX,ADDR 'This is a string'
    MOV EAX,ADDR <'String',0Dh,0Ah>
    
    When GoAsm deals with this code it places a null terminated string or the data between the < and > operators in the const section if there is one (or the data section if there is one, if not, in the code section). Then GoAsm gives the pointer to the data so created to the instruction. No symbol is made for debugging purposes.
    Note that when using the < and > operators no null terminator is added to strings.
    Note also how this differs from the syntax for inserting character immediates into a register. The difference is in the use of the ADDR operator.

    This works the same way in 64-bit programming except that GoAsm ensures that a Unicode string or data is word aligned in memory as required by the system.


    Using character immediates in codetop

    GoAsm does not reverse store word and dword character immediates as MASM does. So for example,
    MOV AL,'1'
    MOV AX,'12'          ;regarded as bytes - 1 first then 2
    MOV EAX,'ABCD'       ;regarded as bytes - A first, then B then C then D
    
    This makes it much easier to add short strings to memory eg. to add the extension .fil to a filename in memory you can code:-
     
    MOV [EDI],'.fil'     ;or
    MOV EAX,'.fil'
    MOV [EDI],EAX
    
    and not
    MOV [EDI],'lif.'     ;or
    MOV EAX,'lif.'
    MOV [EDI],EAX
    
    CMP works in the same way for example:-
    CMP AL,'1'
    CMP EAX,'ABCD'
    CMP [EDI],'.fil'
    
    This does not change the usual reverse order of material not in quotes so for example when you want to add a carriage return and then a linefeed to text you can still use:-
    MOV AX,0A0Dh
    STOSW
    
    Here the carriage return (0Dh) which is in AL, is loaded into memory first, then the linefeed (0Ah) in AH is loaded into memory.
    If the string is shorter than the register or memory type absolute zeroes are added for example,
    MOV EAX,'ABC'        ;codes as A then B then C then zero
    
    When writing source code for Unicode programs you can ensure that character immediates are Unicode or if necessary, switched between ANSI and Unicode see using the correct string in quoted immediates and switching quoted strings and immediates.

    In 64-bit programming you can use the 64-bit registers to contain character immediates which are 8 characters long, for example:-

    MOV RAX,'Saturday'
    
    However, the CMP instruction is limited to 32-bits, so for example
    CMP RAX,'Saturday'
    
    would show an error.

    Type indicatorstop

    Why they are needed and how they are provided

    Looking at the instruction
    MOV [ESI],20h
    
    This puts the number 20h into a place in memory whose address is contained in the register esi. But what is missing from this instruction is whether the number should be loaded as a byte, as a word or as a dword. In other words should one, two or four bytes of memory be altered? All assemblers require a type indicator in instructions of this sort. The syntax in other assemblers is (using dword as an example):-
    MOV DWORD PTR [ESI],20h       ;MASM
    MOV DWORD [ESI],20h           ;NASM
    MOV D[ESI],20h                ;A386
    
    Of course I have used the A386 syntax which requires a lot less typing so that in GoAsm the type indicators you can use are:-
    B meaning byte
    W meaning word (two bytes)
    D meaning dword (four bytes)
    Q meaning qword (eight bytes)
    T meaning tword (ten bytes)
    

    Type indicator also required for named memory references

    Like NASM, GoAsm does not type-check, so it will not know the size of this sort of operation:-
    INC [COUNT]
    
    Here GoAsm does not know (and in fact does not care) whether COUNT is a byte, word or dword. Therefore you must give this a type indicator too for example:-
    INC B[COUNT]
    
    Although this is a little more work for the programmer, in fact it can be argued that it makes your source script easier to read and understand, since you can always see the size of the operation from the instruction itself, rather than having to go back to see if COUNT was declared as a byte, word or dword.

    What instructions require a type indicator?

    Generally all instructions where the size of the operation is not obvious. Some of these examples use named memory references, others memory references pointed to by registers:-
    AND B[MAINFLAG],0FEh
    ADC W[EAX],66h
    ADD D[MEM_AREA],66h
    BT D[EBX],31D
    CMP D[HELLOWORD],0Dh
    DEC D[ECX]
    DIV B[HELLO]
    INC D[EDX]
    MOV B[MEM_AREA],23h
    MOVSX EDX,B[EDI]
    MUL B[HELLO]
    NEG W[ESI]
    NOT D[HELLO3]
    OR B[MAINFLAG],1h
    SETZ B[BYTETEST] 
    SHL W[IAMAWORD],23h
    SHL D[IAMADWORD],CL
    SUB D[EBP+10h],20D
    TEST B[ESP+4h],1h
    XOR D[IMAWORD],11111111h
    
    And in 64-bit programming you might also see, for example
    ADC W[RAX],66h
    BT D[R12],31D
    INC Q[RDX]
    NEG W[R15D]
    

    What instructions do not require a type indicator?

    Where the size of the operation is obvious from the use of a register for example
    AND [MAINFLAG],CL
    CMP [HELLOWORD],EDI
    MOV [IAMABYTE],AL
    MOV [IAMADWORD],ESI
    OR [MAINFLAG],BH
    XCHG CL,[ESI]
    
    Also none of the mmx, xmm or 3DNow! instructions require a type indicator. Several of the x87 floating point instructions do not need a type indicator. Those which do can take more than one operand size. There are also several instructions which can only take one operand size so with these there is no need for a type indicator. For example CALL, JMP, PUSH, and POP always take a dword. See half stack operations for the use of PUSHW and POPW. Also some less common instructions do not need a type indicator, for example ARPL, BOUND, BSF, BSR, CMOV (in all forms), CMPXCHG, and CMPXCHG8B.
     

    Repeat instructionstop

    Repeat instructions are available for PUSH, POP, INC, DEC, and of course when declaring data, for example:-
    PUSH 0,23h,[hwnd],ADDR lParam,EAX 
    POP EAX,[EBP+2Ch],[hwnd]
    DEC ECX,EDX,[COUNT]
    INC [EBP+10h],EDI
    DB 23h,24h,25h
    
    The instructions here are always assembled in left-to-right order.
     

    Numbers and arithmetictop

    Numbers

    Most assemblers use the following syntax for numbers:-
    66ABCDEh      ;a hex number
    34567789      ;a decimal number
    1100011B      ;a binary number
    1.0           ;a real number
    1.0E0         ;a real number
    
    GoAsm accepts these numbers but also supports numbers in these formats:-
    9999999D      ;a decimal number
    0x456789      ;a hex number
    
    A hex number which begins with a letter (that is A to F, being values 10 to 15 decimal) must begin with a zero, for example:-
    0A789ABCDh
    or
    0xA789ABCD
    

    Arithmetic

    GoAsm can perform limited arithmetic in data declarations, duplicate amounts in DUP, definitions, when declaring definitions, when using definitions, and in operands to code instructions. You are not allowed to use the multiply sign (asterisk) inside square brackets other than when using an index register.

    Be careful using the OR, AND and NOT logical operators, since these are actually mnemonics. Although GoAsm recognises them if you use them in places where mnemonics are not expected, you can use instead | for OR, & for AND, and ! for NOT.

    Arithmetic in brackets is carried out first, otherwise calculations are carried out in strict left-to-right order. Here are some examples:-

    DB 2*3
    DB (2+30h)/(2+1)
    DD (2000h+40h-20h)/2
    DD SIZEOF HELLO/2
    DD 444444h & 226222h
    DB 20h/2 DUP 44h
    DB 6+2 DUP 0
    #define globule (2*3)/2
    DB globule
    DD globule|100h
    DD 2D00h>>8
    DQ 2D00h<<48
    MOV EAX,globule|100h
    MOV EAX,SIZEOF HELLO*2
    MOV EAX,ADDR HELLO+10h
    MOV EAX,0x68+0x69-0x70
    MOV EAX,[MemName+0x68+0x69-0x70]
    MOV EAX,[ESI*4+45000h]
    MOV EAX,[ESI*4+SIZEOF HELLO/2]
    MOV EAX,8+8*2       ;result is 32
    MOV EAX,8+(8*2)     ;result is 24
    
    Divisions are rounded according to the result eg.
    MOV EAX,32/3        ;puts 11 into eax
    MOV EAX,31/3        ;puts 10 into eax
    MOV EAX,10/4        ;puts 3 into eax
    
    GoAsm assumes that all multiplication and division is carried out using unsigned numbers. MUL and DIV are used at compile-time and not their signed counterparts IMUL and IDIV. See understand signed numbers for more about signed numbers.

    Declaring real numbers

    Real numbers are numbers which can contain a representation of a value of less than 1. GoAsm expects all real numbers in the source script to be in the form of a floating point number, that is a number made up of digits which have point within them. The point must be represented by a "period" (a full stop, ASCII character value 2Eh). The point can be anywhere within the digits. The real number may have a signed decimal exponent at the end of the number (using "e" or "E" following the IEEE Floating Point Standard). The x87 floating point registers of the processor can accept real numbers to 32, 64 or 80 bit resolutions. The 3DNow! and SSE instructions work with 32-bit real numbers and the SSE2 instructions use 64-bit real numbers.
    Sometimes these types are called:-
    32-bit single-precision
    64-bit double-precision
    80-bit extended-precision
    So real numbers can be declared as dwords (32-bit), qwords (64-bit) or twords (80-bit). Here are some examples of real number data declarations:-
    DD 1.6789E3
    DQ 1.6789E3
    DT 1.6789E3
    DD 3 DUP 7.6789E-2
    DQ 678.27896435E3 
    DT 1.2
    
    You may also declare PI directly either as a tword, qword or dword as follows:
    DD PI            ;pi as a dword
    DQ PI            ;pi as a qword
    DT PI            ;pi as a tword
    
    GoAsm tries to achieve maximum accuracy in providing pi by writing a known number directly into the mantissa.

    You can also declare real numbers as follows:-

    PUSH 1.1
    MOV EAX,1.1
    
    Both of these use a 32-bit format for the real number. The first places that number on the stack and the second moves it into the specified register.

    GoAsm's conversion accuracy

    GoAsm uses special algorithms to ensure optimum accuracy in loading the real number data declaration to data. In the case of a tword (80 bit) data declaration the calculation is performed if necessary to a maximum of 92 bits and then rounded down to fit into the 64 bit mantissa. Conversion to a qword (64 bits) is carried out using the maximum available precision (53 bit mantissa) with "near" rounding. Conversion to a dword (32 bits) is carried out using the maximum available precision (24 bit mantissa) with "near" rounding.

    Directly loading the exponent and mantissa

    Instead of using real numbers to load the floating point registers you can declare a tword and load the exponent and mantissa directly using the FLD instruction. In order to do this you will need to know the exponent and mantissa values to load (these can either be calculated or found and checked using one of the fpu panes in GoBug). Suppose, for example you want a representation of pi which is as accurate as possible and you know that this is an exponent of +0002 and a mantissa of +C90FDAA22168C235h. Then you can declare this number using:-
    DIRECT_PI DT 4000C90FDAA22168C235h
    
    and load it using:-
    FLD T[DIRECT_PI]
    
    The most significant bit (bit 79) in this tword declaration is a sign bit indicating whether the real number is positive or negative. In this case the number is positive because the sign bit is not set. The remainder of the first four hex digits contain the exponent. This is biased by a value of +3FFEh in 80 bit real numbers. This permits exponents of between -3FEEh and +4001h to be handled without using the most significant bit (the exponents become 0 to 7FFFh). The remainder of the hex digits contain the mantissa.
    It is much more difficult to load the exponent and mantissa directly using dword and qword data declarations. This is because the division between exponent and mantissa in those types of numbers is not at a 4 bit boundary. This makes it difficult to work out the correct hex numbers to declare, bearing in mind the bias which needs to be applied.
     

    Characters in GoAsmtop

    Strings of characters

    In your source script you will often be relying on character representation for example:-
    Mess DB 'I am a string of characters',0
    PUSH 'This is supposed to be a carat ^'
    MOV EAX,'£$|@'
    
    It must be asked what actual values are loaded by GoAsm when issuing these instructions? At assemble time GoAsm views your source script using Windows file mapping, and then reads it character by character. In other words GoAsm is given the value of the characters in the source script by Windows. When GoAsm loads in the object file strings of the sort shown above, it loads the same value character as given to it by Windows. In the case of conversions from ANSI to Unicode strings, these are passed first through the API MultiByteToWideChar. This means that the value given to GoAsm by Windows will match that in the current character set (code page). Accordingly you need to ensure that the character set used in the computer which runs GoAsm is the character set for which your program is designed to run.

    If you are using a source script which is in a Unicode format (UTF-8 or UTF-16) then the codepage issue disappears. The correct characters are given by their Unicode value.

    Characters specified directly

    Sometimes you will specify characters by their actual values to try to deal with character set variations for example,
    CMP AL,124D       ;see if character is an OR as in some character sets
    JZ >L4            ;yes
    CMP AL,221D       ;see if character is an OR as in some character sets
    JZ >L4            ;yes
    
    Here you have already allowed for a possible variation in the user's own character set. If necessary you can arrange for your code to test the user's character set at run-time, and to test for the correct characters or use the correct strings accordingly. You can also test the language of the user's machine and provide strings in the correct language. The resource APIs provide a way this can be done automatically - see the manual to GoRC, my resource compiler.
     

    Operativestop

    These are some operatives which may be used in the source script which have a special meaning to GoAsm.
    ,               - the instruction is not finished, continue
    ; or //         - a comment line - ignore to end of line
    /*.........*/   - continuous comment - ignore between the marks
    \               - the material is continuing on the next line
    - number        - the number is negative
    ! number        - invert the number (like NOT)
    NOT             - invert the number
    ~ number        - same
    +               - the plus sign
    -               - the minus sign
    *               - the multiply sign
    /               - the divide sign
    | or ¦          - bitwise OR
    OR              - bitwise OR 
    &               - bitwise AND
    AND             - bitwise AND
    << number       - bit shift left by the number
    >> number       - bit shift right by the number
    (....)          - perform calculation in brackets first
    

    ## in a definition has a special meaning see using double hashes in definitions.
     


    Advanced featurestop

    Structures - different types and uses

    What are structures?

    Structures are data areas of a fixed size which hold data in various components (structure members). They can range from very loose arrangements to highly formalised ones with structures within structures (nested structures). They can be data areas established by ordinary data declaration or from STRUCT templates. Structures are very important in Windows programming and GoAsm supports all types.
    See also unions.

    Using simple structures in Windows programming

    Let's take the LV_COLUMN structure which is used to organise the columns in a listview control. The following code sends the LVM_INSERTCOLUMN message (value 101Bh) to the ListView control to make a new column with the index number of the column in eax. The column details are contained in the LV_COLUMN structure. Here is how it might be used in 32-bit code:-
    PUSH ADDR LV_COLUMN,EAX,101Bh,hListView
    CALL SendMessageA              ;insert eax column
    
    Now let's look more closely at the LV_COLUMN structure.
    In the Windows header file Commctrl.h (pre-Win_IE 300 version) which contains information about the structure it is described as a structure of six dwords. In one sense therefore the structure can be regarded as 6 dwords which can be declared very simply as follows:-
    LV_COLUMN DD 6 DUP 0
    
    However, in the Windows information, each of the six dwords has a name which gives some idea of what it is used for, which is useful. Also the very first dword is a mask which identifies which of the later members of the structure are valid. This mask is important because a later version of the structure has another two members, and the mask needs to be different. So it might be better to declare the structure in data like this so that the mask can be initialised with a value, and so that you can see the names in your source script:-
    LV_COLUMN
      DD 0Fh       ;+0h mask
      DD 2h        ;+4h fmt=LVCFMT_CENTER=2
      DD 0         ;+8h cx
      DD 0         ;+0Ch pszText
      DD 0         ;+10h cchTextMax
      DD 0         ;+14h iSubItem
    
    Here see that whilst declaring the structure in data we have taken the opportunity to initialise two of the members with values which will not change and have included in the comments the offset details, member names and other information.

    Reading from and writing to the simple structure

    It is very easy to read from and write to the simple structure shown above for example:-
    MOV EDI,ADDR LV_COLUMN
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [EDI+0Ch],ESI           ;and give it to the structure
    MOV D[EDI+8h],50D           ;and make the width 50 pixels
    
    or you can use:-
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [LV_COLUMN+0Ch],ESI     ;and give it to the structure
    MOV D[LV_COLUMN+8h],50D     ;and make the width 50 pixels
    

    More formalised structures using STRUCT

    Some programmers prefer to be more formal when using structures by using a structure template. This is done in two stages. The first stage is to make a template by using STRUCT and give a name to the template. This does not actually declare any data.

    Here is an example of a structure template made with the name LV_COLUMN:-

    LV_COLUMN STRUCT
      mask       DD 0Fh       ;mask
      fmt        DD 2h        ;LVCFMT_CENTER=2
      cx         DD 0
      pszText    DD 0
      cchTextMax DD 0
      iSubItem   DD 0
    ENDS
    
    I have added some comments here to help understand the initialisation of two members of the structure. Note ENDS (literally END STRUCT) marks the end of the template. If you prefer you can also mark the end of the template by giving the structure name again followed by ENDS eg.
    LV_COLUMN ENDS
    
    The second stage is to use the template. You do this by using the template in the data section, usually preceded by a label, for example:-
    Lv1 LV_COLUMN
    
    Here you have declared six dwords using the LV_COLUMN structure template and you have given the structure declaration the label Lv1.

    The symbols created by formalised structures

    In GoAsm, symbols are made for the label of the structure itself and also for the each named member of the structure. These can then be referenced directly and also can be passed to the debugger.
    So for example:-
    RECT STRUCT
         left   DD
         top    DD
         right  DD
         bottom DD
    ENDS
    rc RECT
    
    creates the following symbols:-
    rc
    rc.left
    rc.top
    rc.right
    rc.bottom
    

    Reading from and writing to the formalised structure

    Using the formalised structure allows you to be more specific in your source script when reading from and writing to the structure, for example:-
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [Lv1.pszText],ESI       ;and give it to the structure
    MOV D[Lv1.cx],50D           ;and make the width 50 pixels
    
    or even
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV EDX,ADDR Lv1.pszText    ;get the psztext member
    MOV [EDX],ESI               ;and load the text to use
    MOV EDX,ADDR Lv1.cx         ;get the cx member
    MOV D[EDX],50D              ;and make the width 50 pixels
    
    But there is still nothing to stop you from doing this which is the same thing:-
    MOV ESI,ADDR ColumnText     ;get the column text to use
    MOV [Lv1+0Ch],ESI           ;and give it to the structure
    MOV D[Lv1+8h],50D           ;and make the width 50 pixels
    
    Although it is more complex to set up, the advantage of the former method is that when you look at your code in the symbolic debugger the symbols in the structure will appear in full, with both the structure label and the member name appearing which is some advantage. This is because GoAsm creates symbols for all the members of the structure and passes these to the linker. As far as I am aware this is unique to GoAsm and other assemblers do not do this.

    Getting the offset of structure members

    Sometimes you need to get the offset of a member within the structure. You do this by referring to the structure by name followed by a period and the name of the member, for example
    POINT STRUCT 
       left  DD 0
       right DD 0
    ENDS
    
    Then
    MOV EBX,POINT.right
    
    This loads the value 4 into EBX, which is the distance of the member from the beginning of the structure.

    This way of getting an offset is sometimes useful to get information sent by Windows in a structure. As an example, the OFNHookProc callback procedure receives from Windows information in a WM_NOTIFY message. The lParam parameter contains a pointer to an OFNOTIFY structure. This is a nested structure with the following form:-

    OFNOTIFY STRUCT
      hdr     NMHDR
      lpOFN   DD
      pszFile DD 
    ENDS
    
    where the NMHDR structure is:-
    NMHDR STRUCT
      hwndFrom DD
      idFrom   DD 
      code     DD
    ENDS
    
    So within your window procedure you can get the value of the member idFrom in the NMHDR (identifier of the control sending the message) as follows:-
    MOV ESI,[EBP+14h]                 ;get the pointer to the OFNOTIFY structure
    MOV EAX,[ESI+OFNOTIFY.hdr.idFrom]
    MOV EDX,[ESI+OFNOTIFY.pszFile]
    
    In fact what is happening here is that OFNOTIFY.hdr.idFrom resolves to a value of 4; OFNOTIFY.pszFile resolves to a value of 10h. These are their correct offsets from the beginning of the OFNOTIFY structure. Of course the structures concerned must be known to GoAsm. This is done by including the structure templates in the assembler source script, somewhere earlier in the file.

    Overriding the initialisation of the structure

    Suppose you have a structure called RECT as follows:-
    RECT STRUCT
        left   DD 10
        top    DD 10
        right  DD 120
        bottom DD 90
    ENDS
    
    You can override the initialisation of the structure using the < and >, { and } operators for example
    rc1 RECT <0,20,120,300>
    
    sets the dwords in the data structure to 0, 20, 120 and 300 respectively.
    You can use the question mark and the comma, or just the comma to ignore some members, for example:-
    rc1 RECT <0,?,?,300>
    rc1 RECT <0,,,300>
    
    here you override only the first and fourth members of the structure.

    Using braces you can pick and choose which members to override:-

    rc1 RECT {left=2,top=5}
    
    or you can mix the two methods:-
    rc1 RECT <{left=2,top=5},300h>
    
    When using braces you don't need to specify the full symbol name (in the above example this would be "rc1.left" and "rc1.top"). Instead you only specify the ultimate name ("left" and "top"). The override is also carried out into nested structures, so if you use the same names for members within a nested structure it is possible to initialise several members at once using one brace override.

    Initialising structure members which have DUP data declarations

    If members of a structure are established using DUP, you can either override the initialisation by using a string or by specifying each element within < and > brackets:-
    UP STRUCT
      DB 27 DUP 0
      DB 2  DUP 0
    ENDS
    Pent UP <'My cat was born on 23 April',<23h,4h>>
    
    So, for example here is the GUID structure and a typical initialisation for COM:-
    GUID STRUCT
        Data1 dd ?
        Data2 dw ?
        Data3 dw ?
        Data4 db 8 dup ?
    GUID ENDS
    IID_IShellLink GUID <0000214eeh, 00000h, 00000h, <0c0h, 00h, 00h, 00h, 00h, 00h, 00h, 46h>>
    

    Some syntax rules when using STRUCT

    One important rule is that since GoAsm is a one-pass assembler structure templates must be made in the source script before they are used. This is because GoAsm cannot know in advance how large the structure is going to be. GoAsm is rather more relaxed with its syntax for STRUCT than other assemblers. STRUC means the same as STRUCT. There is no need to provide any initial values at all, and it does not matter that members are not named, so
    RECT STRUCT
         left   DD
         top    DD
         right  DD
         bottom DD
    ENDS
    
    and
    RECT STRUCT
         left   DD 0
                DD 2 DUP 0
         bottom DD 0
    ENDS
    
    and
    RECT STRUCT DD 4 DUP 0 ENDS
    
    are equally valid structure declarations. However, where members are named they must be on a new line.
    You may reuse the name for structure members, provided the structure's name is different, eg.
    RECT STRUCT
         left   DD 0
         top    DD 0
         right  DD 0
         bottom DD 0
    ENDS
    RECT2 STRUCT
         left   DD 0
         top    DD 0
         right  DD 0
         bottom DD 0
    ENDS
    
    If you use ? in the initialisation of the structure members this has the same effect as using zero. This does not result in the data being recorded as uninitialised, as it would do with an ordinary data declaration, so
    RECT STRUCT
         left   DD ?
         top    DD ?
         right  DD ?
         bottom DD ?
    ENDS
    rc1 RECT
    
    is perfectly valid, but the data will go in the section initialised to zero as if zeroes had been used.

    In a structure template you can make additional data on one line in the usual way so that this would be a structure template of four dwords:-

    RECT STRUCT
         lefttop     DD 0,0
         rightbottom DD 0,0
    ENDS
    

    Repeat structure declarations

    It may be useful to create arrays and tables using structure templates. For example:-
    RECT <>,<>,<>,<>
    
    Creates four RECT structures (four dwords in each). Since no label has been used in front of the RECT, no symbols at all will be created and passed to the debugger. In this example:-
    Buffer RECT <0,0,10,10>,<5,5,20,20>,<8,8,30,30>
    
    an array is made of three RECT structures (four dwords in each) initialised to the values provided. Symbols will only be made for the very first structure. This is to avoid duplication of symbol names.

    If you want the members of the array to have unique symbol names you would need to use (for example):-

    Buffer1 RECT <0,0,10,10>
    Buffer2 RECT <5,5,20,20>
    Buffer3 RECT <8,8,30,30>
    
    or
    Buffer RECT3 <0,0,10,10,  5,5,20,20,  8,8,30,30>
    
    where RECT3 is a structure of 3 RECTS.

    If you don't need to initialise the structures you can repeat them using either:-

    Buffer RECT <>,<>,<>
    
    which creates three RECT structures, or
    Buffer RECT,RECT,RECT
    
    which does the same thing.

    You can also use DUP to repeat structures for example:-

    ThreeRects RECT 3 DUP <>
    FiveRects  RECT 5 DUP <23,24,25,26>
    
    In the second example each RECT is initialised to the same value. Initialisation of duplicated structures in this way can only be done at the top level and not in nested structures.

    Nested structures using STRUCT

    Structures can be nested by using a structure within another, so
    RECT STRUCT
         left   DD 0
         top    DD 0
         right  DD 0
         bottom DD 0
    ENDS
    StructTest STRUCT 
        a DD    6
        b RECT
        c DD    7
        d DD    8
    ENDS
    
    Then
    Hello StructTest
    
    Creates seven dwords. The symbols created (and passed to the debugger) are:-
    Hello
    Hello.a
    Hello.b
    Hello.b.left
    Hello.b.top
    Hello.b.right
    Hello.b.bottom
    Hello.c
    Hello.d
    
    and they can be read from or written to in the usual way, for example
    MOV D[Hello.b.left],100h     ;make rectangle start at 256 pixels
    
    Like structure members, nested structures need not be named, so that this is perfectly valid:-
    StructTest STRUCT 
          DD     6
          RECT
        c DD     7
        d DD     8
    ENDS
    

    Internally nested structures

    Structures can be nested by declaring a structure within a structure, so
    StructTest STRUCT 
        a DD    6
        b STRUCT
          left   DD 0
          top    DD 0
          right  DD 0
          bottom DD 0
          ENDS
        c DD    7
        d DD    8
    ENDS
    
    Then
    Hello StructTest
    
    produces the same result as StructTest in the previous example. The only difference is that the RECT structure is not available for use elsewhere.

    Overriding initialisation in nested structures

    You need carefully to use the < and >