Introduction

A Cross-Assembler is simply a assembler that runs on a different platform than the intended target platform. So, from here forward, it will simply be referred to as the assembler.

Essentially, the assembler described/used here doesn't do anything different than other commercial assemblers. In fact, some of the explanations and descriptions from other assemblers is used in this explanation. The assembler still has to examine each line and deciper, from the symtax, what the intent of the programmer is. To be efficient, the assembler needs to do an initial scan (Pass 1) of the source code and extract all of the symbolic references. Then, during a second scan (Pass 2) the assembler can resolve the symbolic references that are used in the Operand.

The assembler is designed to run in a Cygwin64 window, that is running on a Windows 10 platform. The target platform is a Motorola MC6800 MicroComputer. The assembler uses a Source Code file to create several output files. See the section below for details on the Assembler Files.

I started writing this assembler years ago. It mostly worked at the time, but I wanted to finish it up and possibly use it for something. The assembler is written in GAWK, which runs with a Bourne Shell wrapper, under Cygwin64. This allows me to use the capabilities of Bourne Shell and GAWK. GAWK is the Gnu implementation of AWK. I use GAWK, as opposed to AWK or NAWK, so that I can take advantage of the extra features that GAWK has.

An interesting outcome of using Cygwin64 on a Windows 10 platform is that, the Cygwin64 file system is visable from Windows. This allows me to use a Windows text editor, rather than Unix text editor. The only issue with using a Windows text editor is that, the Windows text editor adds an extra Carriage Return (CR) character after the Line Feed (LF). If the trailing CR is not removed, it causes a problem with decoding the instruction line. The fix is explained subsequently, in the Syntax section on Field Separators.

Assembler Files

When the MC6800 Microprocessor was first introduced, storage space for extra files was at a premium. Users were often not connected to a big storage device, when developing their programs. They often used paper tape or cassette tape for saving their program data. That meant that users would initially assemble a program, looking for syntax errors, before telling the assembler to create a list file. Even when the users had disk drives in the development system, they had to conserve space.

But times have changed. Now, there is no need to use any external media to store the program and it's auxiliary files. Storage space is readily available, as seen by the listing on the right. This is a directory listing for a MC6800 program. The program is a utility from the Motorola Users Group Library, UG001, which is a EXBUG ROUTINE, CBCDHX for the converting a hexadecimal character to a binary number. As you can see, there are up to 12 possible files, that are used or created by the assembler. They have the same name, but different extensions. There are also a few extra file. These would be the original files. Some of the assembler directories have these extra files, others don't. The total disk space used is about 275K Bytes.

The file without an extension, UG001, is the original source code. This file needs to be copied to a filename with a ".src" extension, UG001.src, prior to assembly. If there is a need to edit the source code, the UG001.src should be used. I have added the options (OPTs) for creating S-Record and Intel Hex file output files.

The file UG001.inf should be created by user, prior to assembly. This file contains information needed for the assembler to create a header page in the .rtf file (see the explanations below).

Another file that might be generated during assembly, is a UG001.err file. Any errors encountered during assembly will be directed to the "UG001.err" file. Howeverif there are no errors, the "UG001.err" file will not exist.

For the assembler to operate, the source code file must have a .src extension. If the file name with a .src extension doesn't exist, the assembly will fail. The original source code file, UG001 is not touched by the assembler.

Below is a brief description of each file required by, or created by, the assembler. While you can have other file in your working directory, the files with the following extensions should be avoided, as the assembler will over write them.


User Generated Files

These are files that are created by the user and used as input to the assembler. Only the source code file is required. All others are optional.

[File Name].src - This is the default for the Source Code file. The original source code probably doesn't have a file extension. But that was because the code was often assembled on machines that didn't need a file extension. The assembler will fail if the file name doesn't contain a ".src" extension.

[File Name].inf - This is a information file that is used during the generation of the ".rtf" file. The file is optional. However, when the file is used, a "title" page is created in the ".rtf" file. The "title" page is use to list things like "Programmer", "Company", "Title", "Hardware/Software requirements", etc... The assembler looks for specific keywords in the ".inf" file. The example shows the keywords that are currently recognized. If you don't want a particular keyword to be included on the "title" page, don't include it in the ".inf" file. Other keywords can be added to this list, but it means modifying the function "RTF_Title_Page( )" in the AWK code.

[File Name].pdf - The PDF file is not generated by the assembler. It is generated by the user, once the RTF file is created. However, a PDF file can be created directly from the RTF file, if CutePDF is installed on your PC, as a printer. You can then simply "print to PDF" from MS Word. For my purposes, I only use the free version.


Assembler Generated Files

These are files that created by the assembly process. The user has some control over the contents, but, with the possible exception of the ".err" file, these files will always be generated.

[File Name].cln - This is a special file created, and used, by the assembler. The file is, effectively, a copy of the source code file, but cleaned up for easier reading. During assembly, the ".src" file is read during Pass 1, but the ".cln" file is read during Pass 2. For details on this file, see the section on Cleaning.

[File Name].dbg - The assembler reads the Source Code file twice. Once to initialy resolve the address of reference symbols. And again, to actually generate the ".lst" file. As each line of source code is read, an entry is made in the Debug File (.dbg). There is one entry per line read in Pass 1 and Pass 2. The entries are labeled P1 and P2, respectively. Each entry includes information about the address mode that the assembler has deciphered. The user can then match their intentions against the assumptions made by the assembler. Incorrect syntax in the source code, can cause the assemble to generate a address mode that the user did not intend.

[File Name].err - This file contains any errors generated by the assembly process. If the file exists, there is at least one error and the contents of the error file is appended to the end of the ".lst" and ".rtf" file. If there are no errors, the error file will not exist. If the error file does not exist, a simple mes ".lst" and ".rtf" files.

[File Name].lst - This is the list file that is created during Pass 2 of the assembler. Most users are familiar with this file as it was always the main assembler listing. It contains Line Number, Address, Opcode Value, Operand Value, Symbol, Op-Code, and Operand for every line of source code. The example shows the format and spacing for each of the elements. Space is allocated for Symbols that are up to eight characters in length, and Operands that are up to 16 characters in length. This accounts for Operands that include an equation. In the case of some Pseudo-Ops, the Operand field can be longer that 16 characters. In that case, any comment will simply be pushed further to the right.

This was the main method of documentation for the assembly language program. See the section on the ".rtf" file for a different method of documentation.

[File Name].sym - During Pass 1, a Symbol table (array) is created. As each symbol is encountered, an entry (symbol address) in the Symbol table is made. After Pass 1, no symbols are added to the Symbol table. Pass 2 then references the Symbol table to resolve any symbolic references in the source code.

As the Symbol table is created, the symbols, and their addresses, are written to a ".sym" file. The ".sym" file can be viewed in a text editor. The ordering of the symbols will be in the order they are encountered and defines. After the assembly is completed (end of Pass 2), the Symbol table is sorted by symbol name. It is then appended to the end of the ".lst" and ".rtf" file. This does not affect the ".sym" file.

[File Name].dat - Contains a map of all of memory, in Hex ASCII format. The default start address for this file is $0000 and contains $FFFF (65,53610) entries. Each line in the file contains a memory address followed by 16, two digit, hex values. These hex values are the hex representation of the memory contents. At the start of assembly, the "bin_dat" array is initialized to all zeros. Then, during Pass 2, hex values representing the OpCode and Operands are entered in the array. Finally, after the assembly process is complete, the "bin_dat" array is written to the ".dat" file. This a good file to view to make sure you program is occupying the correct area of memory.

[File Name].rtf - This is a Rich Text Formatted (RTF) file. It contains the same information that the ".lst" contains. It can be opened directly in MicroSoft Word. The ".rtf" is initiated between the end Pass 1 and the start of Pass 2. This is when the file is initialized. Initially, the file is created with a header, footer, tab locations, and and optional "Title" page. By default, the document is paged with 68 lines per page. At the top of each page is the column header. The two column header lines are not part of the 68 line count. Many of the RTF parameters can be controlled using the OPT Pseudo-Op.

NOTE: Do not have the ".rtf" file open, while you are assembling source code. This will cause a Windows system error because it can't overwrite an open file. For example:

gawk: cmd. line:1732: fatal: can't redirect to `All_OpCode_Test.rtf' (Device or resource busy)

For the other files, if you use a text editor like EditPad Lite 7 or Notepad++, the open files will automatically be refreshed and you will always be seeing the latest version.


Assembler Generated Files (Optional)

These are files are optionally created during the assembly process. For them to be generated, the user needs to include several Pseudo-Ops to initiate their creation. If the appropriate Pseudo-Ops are used, and the files are created, the files are also appended to the end of the ".rtf" file.

[File Name].s19 - When the MC6800 was first introduced, S‑Records were written to paper tape from MIKBUG, when a user wanted to save their program. The mechanics of generating S‑Records that way is described in Print/Punch Memory Dump paper tape format. However, now the assembler can generate S‑Records at the end of the assembly operation, if the user defines the S‑Record settings. See the S19 Options in the Pseudo-Op section.

[File Name].hex - Contains a block of memory in Intel Hex format. The start/end address and the number of bytes per line, can be controlled using the IHEX Pseudo-Op. This file is intended to be used with a E-Prom burner.

Note: The text editor, Notepad++, has a "language" definition capability. This allows the user to define a particular language, that the editor uses to highlight text. Two of the languages are "S-Record" and "Intel Hex". When viewing the ".s19" or ".hex", and the correct language is defined, the editor will color highlight the file and validate it. Errors in these files will be highlighted in red.

Cleaning

When reading through this manual, you will see references to a Clean Operation or a Clean File. This is an invention of my own and is a feature unique to this assembler.

The assembler automatically creates several files during operation, as detailed in the section above. Cleaning is intended to create a easy, or easier, to read version of the original source file. While most source code, for the MC6800, follows a common format, there are some differences between assemblers. Plus, many of the programs are just sloppy, mixing spaces and tabs that causes the lines of code to be haphazard.

In many cases, the differences can be easily handled by the assembler. For example, the Pseudo-Op TITL, TITLE, NAM, and NAME all mean the same thing. So this assembler allows for all of them. Some assemblers require the title be delimited with quotes ("..."). Others, just don't care.

But there are differences significant enough so that, source code from one source, might not assemble correctly. So during Pass 1, the assembler creates a Clean file. It adjusts the original source code to fit this assembler. Sometimes it's just throwing away unnecessary blank lines. Or replacing strings of spaces with tabs. But sometimes it could be a more involved adjustment. The Clean operation is careful, not to eliminate multiple spaces when used in a FCC directive.

Some assemblers use a "*" or a ";" to indicated a comment follows. Even though the "*" is also used to specify the current program counter in a Operand. So to avoid a mis-interpretation I adjust every line of source code so that the "*" is only used to specify the current program counter. And, ";" is only used for comments.

If the Clean Operation causes any problems, like generating new errors, then the ".src" file may need some manual cleanup, before you use the assembler.

Operation

Most Unix users are familiar with stream processors like AWK/GAWK, SED, GREP, etc.. Their main use is with small command line programs to process a stream of data. I use it that way myself. However, AWK/GAWK works quite nicely as a complete program. There are some operations that are better handled by shell scripting and others by the stream processor.

    Pass 1 and 2 Processing
  • Blank or Comment Line
  • Comment Line with "*"
  • Symbol Only Lines
  • Pseudo-Ops
    • NAM, NAME, TITL, TITLE
    • OPT
    • SPC
    • END
    • EQU
    • ORG
    • RMB
    • FCB
    • FCC
    • FDB
  • Op-Codes
    • op_Relative (16)
    • op_Inh (51)
    • op_IdxExt (12)
    • op_DirExtInh (4)
    • op_ImmDirIndExt (20)
    • op_Imm3DirIndExt (3)

This is a user manual, not a tutorial, so I will not cover the processing of a source code program in minute detail. Assembling a source code file is broken up into several different phases.

Setup - The first phase is Setup. During the Setup, the required file names are initialized and multiple arrays are created. The assembler makes very extensive use of GAWKs associative arrays. For example, all of the internal calculations are performed in decimal, but, most of the input data will be in Hex, Octal, or Binary. So arrays are set up to convert between number bases with a simple array lookup. In the Hex to Decimal conversion array, for example, the array indexes all start with a "$". So when I go to convert a Hex number, say "$12AF", I simply reference it in the "hextod" array, Dec_Value = hextod[$12AF].

Setup is also the phase where all of the assembler variables and options are initialized. One of the important variabls is the GAWK Field Seperator. The Field Separator is used my GAWK to separate items on a line. In GAWK the entire line if referenced by the variable $0. Items on that line are then referenced by $1, $2, $3, ...$N. The Field Separator defines how that is done. The assembler uses the definition FS = "[ \t\r]+". This is a regular expression for matching one or more spaces, TABs, or carriage returns. But it is unique, in that, it doesn't strip leading and trailing whitespace from the line. This allows for splitting lines that may, or may not have a symbol starting in the first character of the line. For example, the line
INCH   LDAA MTTYSI
would be split into $1=INCH, $2=LDAA, and $3=MTTYSI, and the line
          TST   AECHO
would be split into $1="", $2=TST, $3=AECHO. Note that, in the second case, $1 is equal to a Null character. The space between line items can be a space, multiple spaces, or tabs.

Pass 1 - Pass 1 is a large "while" loop that reads the source code, line by line. The list on the right, shows the ordering of processing, for each line. Tthe assembler must look at it to determine how to process it. For the most part, during Pass 1, we are only interested in Symbols and their addresses.

The listing on the right shows the whole Bourne shell program. It's a little airy (lots of extra unnecessary spaces and blank lines), but that's how I write and don't see any need to compress it, once it's working. The listing only shows a skeleton of the GAWK program section. That's because the shell portion is really only about 10 percent of the whole program and the GAWK program is quite lengthy. The shell scripting takes care of the Initial Setup, File Name creation, etc.. Once everything is set up, it passes the file names to the GAWK program. The data passed to the GAWK program is at the very end of the listing.

The assembler makes extensive use of GAWKs Associative Array capabilty. An array is a table of values called elements. The elements of an array are distinguished by their indices. In GAWK, the indices may be either numbers or strings. This is because GAWK array subscripts are always strings. When a numeric value is used as a subscript, it is converted to a string value before being used for subscripting.

There will often be references to $0, $1, $2, $3, etc. These are references used in GAWK to distinguish between the different parts of a line. When read, a input line is separated into parts. The characters used to indicate separation are called FS or Field Separators. For the assembler, FS is defined as a "space", "\t" (tab), and "\r" (carriage return). The "\r" is included in the FS list to remove the Carriage Return (CR) character that is added by a Windows text editor.

In the example, $0 specifies the entier input line. Whereas, $1 only references the first item on the line, BUFF1, $2 references the second item on the line, RMB, and $3 references the third item on the line, 32. The entries from $4 to $N are considered a comment. As each line is read in, $2, which contains a Op-Code or Pseudo-Op, is examined and controls the behaviour of the assembler.

You might note that there is no mark that delimits the comment. While it is common to use a ";" to mark a comment, it isn't necessary. In this instance, everything past the Operand considered a comment.

Syntax

The syntax, or language requirements, for using the the assembler fall into one of two categories:

  1. Requirements for conversing with the host computers operating system.
    This category varies from service to service. For the purposes of this document, any services description refers to the Cygwin64 environment that is running on a Windows 10 platform.
  2. Requirements for conversing with the M6800 programs.

The emphasis in this section is on the language requirements of the M6800 programs. These requirements are constant, and except for minor variations in format, do not vary from system to system.

The source program is nothing more than a list of instructions that the M6800 is to execute during system operation. All that is required is that the mnemonic instructions, used by the programmer to write the program, be translated into binary machine language acceptable to the M6800. However, if the assembler is to be used to perform the translation, the language and format described in the following paragraphs should be used.

ASCII CHARACTER SET (7-BIT CODE)
MSB ⇒
LSB ⇓
0
000
1
001
2
010
3
011
4
100
5
101
6
110
7
111
0
0000
NULDLESP0@P'p
1
0001
SOHDC1!1AQaq
2
0010
STXDC2"2BRbr
3
0011
ETXDC3#3CScs
4
0100
EOTDC4$4DTdt
5
0101
ENQNAK%5EUeu
6
0110
ACKSYN&6FVfv
7
0111
BELETB^7GWgw
8
1000
BSCAN(8HXhx
9
1001
HTEM)9IYiy
A
1010
LFSUB*:JZjz
B
1011
VTESC+;A[k{
C
1100
FFFS,<L\l|
D
1101
CRGS-=M]m}
E
1110
SORS.>N^n~
F
1111
SIVS/?O_oDEL

The source program is written in an assembler language consisting of the 72 executive instructions and the assembly directives (Pseudo-Ops). The Pseudo-Ops are useful in generating, controlling, and documenting the source program. Some Pseudo-Ops are used to create symbols, or variables, that contain hardware addresses. The RMB Pseudo-Op can be used to reserve data area. And others, like FCB, FCC, and FDB generate data areas. The characters recognized by the assembler include A through Z of the alphabet, the integers 0 through 9, and the four arithmetic operators +, -, *, and /. In addition, the following special prefixes and separating characters may be used:

Warning:
If you modify the assembler, "DO NOT" use a apostrophe ANYWHERE in the GAWK code. Even in the comments. Contractions, like "can't" need to have the apostrophe eliminated and written as "cant". Using a apostrophe will generate a very difficult to find GAWK error.

  • # (pounds sign, e.g. #1234) specifies the immediate mode of addressing. The number itself, after the #, can be in the form of a Symbol or Equation. Or it can simply be a decimal, hexadecimal, octal, or binary number, as defined below.
  • $ (dollar sign, e.g. $E3) specifies a hexadecimal number.
  • @ (commercial at, e.g. @377) specifies an octal number.
  • % (percent, e.g. %10011100) specifies a binary number that is limited to 8 bits.
  • ' (apostrophe, e.g. 'A,'B,'C) specifies an ASCII literal character. See warning above, about using a apostrophe in the GAWK program.
  • , (comma), SPACE, and Horizontal TAB are used as field separaters. Normally a SPACE or Horizontal TAB is used to separate a Symbol ($1) from and Op-Code ($2) and an Op-Code ($2) from the Operand ($3). The comma is use when multiple data items or option specifications are used on a single line. When used this way, no spaces or tabs are used between the items.
  • ; (semi-colon) is used to indicate a comment.
    • When used at the start of a line, the semi-colon indicates a comment line.
    • When used after a OpCode/Operand, the semi-colon indicates that the rest of the line is a comment.
  • * (asterix) is a little odd and often does double duty. Many assemblers were written that used the asterix to indicate a comment line. However, the asterix was also used to specify the current address. The Cleaning process that this assembler does, replaces any asterix that, is used to mark a comment, with a semi-colon.
    • When used in the Operand ($3), the asterix indicates the current address. For example, the line PUNCH EQU * would assign the current address to the symbol PUNCH.
    • When used at the start of a line, the asterix indicates a comment line.
    • Comments that are added after the Operand can contain a leading *, but it is not required.

The character set is a subset of ASCII (American Standard Code for Information Interchange, 1968) and includes the ASCII characters, $20 (space) through $5F (). The ASCII code is shown in the table above. The subset is highlighted.

Address Modes

The MC6800 eight-bit microprocessing unit has seven address modes that can be used by the programmer. However, not all instructions have the ability to use every mode. So the assembler groups them according to modes they have in common. You can get a good idea of the groupings from the Instruction Set Summary tables. Below is a brief summary of the groupings and their address modes. For a more detailed explanation of the instructions modes see the Motorola M6800 Programming Reference Manual M68PRM.

The MC6800 Op-Code (instructions) are generally grouped by function. Like Accumulator/Memory, Index Register/Stack, Jump/Branch, and Conditional Code. That's how they are grouped in the Summary section. Note that, each of the Op-Codes, in each functional group, may have more than one addressing mode. Some Op-Code only have one mode, others have four.

However, the assembler is not trying to interpret your code. It.s only interested in what you write, on a line-by-line basis. The assembler expects each line to specify exactly what you want to do. This is specified with the Operand's syntax. So the assembler must look at each line of source code and determine which addressing mode to use. To do this, I separate the addressing modes into seven different catagories. Within the assembler, each of these catagories are an array. So each array is initially checked to determine how to deal with the rest of the information on a line of source code.

Below is a list of catagories and a brief description of each. This is also the order that a line of source code is evaluated by the assembler. Next to each catagory is the Op-Code, the Hex value for the Op-Code. In some cases, where it makes a difference, there is also a listing of the number of bytes, in parentheses.

  • Relative (16)
    BRA-$20, BHI-$22, BLS-$23, BCC-$24, BCS-$25, BNE-$26, BEQ-$27, BVC-$28, BVS-$29, BPL-$2A, BMI-$2B, BGE-$2C, BLT-$2D, BGT-$2E, BLE-$2F, BSR-$8D

    The Relative addressing mode is used with most of the branching instructions. These are 2-byte instructions. The first byte of the instruction is the Op-Code. The second byte of the instruction is called the Offset. The Offset is interpreted as a signed 7-bit number. If the MSB (most significant bit) of the Offset is 0, the number is positive, which indicates a forward branch. If the MSB of the Offset is 1, the number is negative, which indicates a backward branch. This allows the user to address data in a range of -12610 to 12910 bytes of the present instruction.


    A program using Branch Instructions

    The example on the right shows the use of Relative instruction for branching forward and backwards. In the first branch instruction (BRA NEXT), the address to be branched is $109. As Relative addressing is used, the offset is calculated as: $109 - $105 = $04 where $105 is the contents of PC which points to the next instruction. The Offset is written in the source code program as the operand of the branch instruction ($20 $04). The second branch instruction (BRA LAST) is a backward branch. The displacement (Offset) is calculated as: $105 - $10E = -$09 where $10E is the contents of PC. As the offset is a negative number, its 2's complement ($F7H) is used as the Offset ($20 $F7).

  • Inherent/Implied (51)
    ABA-$1B, CLRA-$4F/CLRB-$5F, CBA-$11, COMA-$43/COMB-$53, NEGA-$40/NEGB-$50, DAA-$19, DECA-$4A/DECB-$5A, INCA-$4C/INCB-$5C, PSHA-$36/PSHB-$37, PULA-$32/PULB-$33, ROLA-$49/ROLB-$59, RORA-$46/RORB-$56, ASLA-$48/ASLB-$58, ASRA-$47/ASRB-$57, LSRA-$44/LSRB-$54, SBA-$10, TAB-$16, TBA-$17, TSTA-$4D/TSTB-$5D,
    DEX-$09, DES-$34, INX-$08, INS-$31, TXS-$35, TSX-$30,
    NOP-$01, RTI-$3B, RTS-$39, SWI-$3F, WAI-$3E,
    CLC-$0C, CLI-$0E, CLV-$0A, SEC-$0D, SEI-$0F, SEV-$0B, TAP-$06, TPA-$07

    These instructions are referred to as Inherent. They are also known as Implied addressing. Either reference is correct.

    With these instructions, the operation is Inherent in the instruction name and they only require 1 byte of memory. For example, the instruction CLRA, says Clear Accumulator-A. CLRB says Clear Accumulator-B. Other instructions may specify the "X" (Index) register or the "S" (Stack) register. The last nine instructions, CLC to TPA, are used to set/clear Condition Code Register bits.

  • Indexed/Extended (13)
    CLR-$6F/$7F, COM-$63/$73, NEG-$60/$70, DEC-$6A/$7A, INC-$6C/$7C, ROL-$69/$79, ROR-$66/$76, ASL-$68/$78, ASR-$67/$77, LSR-$64/$74, TST-$6D/$7D, JMP-$6E/$7E, JSR-$AD/$BD
    Indexed/Extended Mode Instructions

    These instructions can be use with the Indexed or Extended addressing modes. If the Operand contains only a Symbol, or numeric value (Decimal || Hex || Octal || Binary), the number is assumed to be a 16 bit address (0 to FFFF16), and Extended mode addressing would be assumed. The instruction would require 3 bytes of memory.

    To indicate that you want to use the Indexed mode, the Operand must include a reference to the X-Register. An Indexed instruction Operand can be written in any of the following formats:

    • X | ,X | 0,X - The format X, when used alone, instructs the assembler that the address of the Operand is identical with the contents of the Index Register. This format has the same effect on the assembly as if 0,X had been written.
    • Number,X - Number can be decimal, hexadecimal, octal, or binary. Only values from 0 to FF16 are valid.
    • Symbol,X | Expression,X - If a Symbol or an Expression is used, rather than a number, the assembler will find or compute a numerical value of that Symbol or Expression. The source program must then include other statements (Pseudo-Ops) which define a numerical value for the Symbol or which enable the assembler to compute a numerical value for the Symbol or Expression. Only values from 0 to FF16 are valid.

  • Direct/Indexed/Extended (4)
    STAA-$97/$A7/$B7, STAB-$D7/$E7/$F7, STS-$9F/$AF/$BF, STX-$DF/$EF/$FF
    Direct/Indexed/Extended Mode Instructions

    This set of instructions is similar to the previous set, that use the Indexed and Extended addressing modes ,with the exception that, they can also be used in the Direct mode.

    If the Operand contains just a Symbol or numeric value (Decimal || Hex || Octal || Binary), that is greater than 25510, the number is assumed to be a 16 bit address (0 to FFFF16), and Extended mode addressing would be assumes. The instruction would require 3 bytes of memory.

    If the value in the Operand resolves to a value less than 256, it is assumed that the value is an address in Page 0 (0 to FF16). Then Direct mode addressing is used and only 2 bytes of memory are required.

    Note: If the Pseudo-Op RELOC had been specified, prior to encountering this instruction, and the Operand value resolved to less than 256, the mode would be forced to be Extended. The reason behind forcing the Extended mode is to make the code relocatable.

    To indicate that you want to use the Indexed mode, the Operand must include a reference to the X-Register. An Indexed instruction Operand can be written in any of the following formats:

    • X | ,X | 0,X - The format X, when used alone, instructs the assembler that the address of the Operand is identical with the contents of the Index Register. This format has the same effect on the assembly as if 0,X had been written.
    • Number,X - Number can be decimal, hexadecimal, octal, or binary. Only values from 0 to FF16 are valid.
    • Symbol,X | Expression,X - If a Symbol or an Expression is used, rather than a number, the assembler will find or compute a numerical value of that Symbol or Expression. The source program must then include other statements (Pseudo-Ops) which define a numerical value for the Symbol or which enable the assembler to compute a numerical value for the Symbol or Expression. Only values from 0 to FF16 are valid.

  • Immediate/Direct/Indexed/Extended (20)
    ADDA-$8B/$9B/$AB/$BB, ADDB-$CB/$DB/$EB/$FB, ADCA-$89/$99/$A9/$B9, ADCB-$C9/$D9/$E9/$F9, ANDA-$84/$94/$A4/$B4, ANDB-$C4/$D4/$E4/$F4, BITA-$85/$95/$A5/$B5, BITB-$C5/$D5/$E5/$F5, CMPA-$81/$91/$A1/$B1, CMPB-$C1/$D1/$E1/$F1, EORA-$88/$98/$A8/$B8, EORB-$C8/$D8/$E8/$F8, LDAA-$86/$96/$A6/$B6, LDAB-$C6/$D6/$E6/$F6, ORAA-$8A/$9A/$AA/$BA, ORAB-$CA/$DA/$EA/$FA, SUBA-$80/$90/$A0/$B0, SUBB-$C0/$D0/$E0/$F0, SBCA-$82/$92/$A2/$B2, SBCB-$C2/$D2/$E2/$F2

    This group of instructions can be used the same way as the previous group, with the addition of the Immediate mode. So, for Direct, Indexed, and Extended mode operation, refer to the previous section on Direct/Indexed/Extended instructions.

    With Immediate mode operation, the Operand would contain a value less than 25610 that is preceeded by a "#". The number that follows the "#" can be decimal, hexadecimal, octal, binary, or a symbol, that resolves to less than 25610.

  • Immediate3/Direct/Indexed/Extended (3)
    CPX-$8C/$9C/$AC/$BC, LDX-$CE/$DE/$EE/$FE, LDS-$8E/$9E/$AE/$BE

    This group of instructions can be used the same way as the previous group, with the addition of the Immediate mode. So, for Direct, Indexed, and Extended mode operation, refer to the previous section on Direct/Indexed/Extended instructions and the section on Immediate/Direct/Indexed/Extended instructions.

    In this small group, the Immediate mode operation, the Operand would contain a value between 0 and FFFF16, that is preceeded by a "#". The number that follows the "#" can be decimal, hexadecimal, octal, binary, or a symbol, that resolves to less than FFFF16, or 65,53510. The number range is larger than the previous Immediate mode operation because it deal directly with the X-Register and the Stack Register, which are 16 bits. This then means that the instruction would require 3 bytes. Hence the reference to Immediate3.

Pseudo-Opcodes

Pseudo Opcodes (Pseudo-Ops) are not machine instructions. They are directives to the assembler. These directives require various numbers and types of arguments. They will be listed individually below. There may be more Pseudo-Ops included later, but right now I don't find them necessary. In fact, some of the Pseudo-Ops you see now may be removed as their use is evaluated.


NAM || NAME || TITL || TITLE - This Pseudo-Op sets the name for the listing. The argument field is required and must be a string constant, though the null string ("") is legal. This title is printed after every page ejection in the listing, therefore, if page ejections have not been forced by th PAGE Pseudo-Op, the title will never be printed. The following statement would print the title "Random Bug Generator -- Ver 3.14159" at the top of every page of the listing:


EQU - This Pseudo-Op is used to assign a specific value to a Symbol, thus the Symbol on this line is REQUIRED. Once the value is assigned, it cannot be reassigned by respecifying the Symbol, with another EQU statement. In the example on the right, line 0001, 0002 and 0003 show the Symbol (DECV, HEXV, and BINV) being the assigned decimal, hex and binary values, respectively. Line 0005 shows an attempt to reassign the Symbol DECV. Line 0007 shows an attempt to use the EQU Pseudo-Op without a Symbol being specified.

The expression in the Operand field must not contain forward references. However, it may contain a reference to a previousy defined symbol, the current address "*", or an equation that contains a previousy defined Symbol. The equation must be simple, like PIA1+1 or PIA1-1.

Note: This Pseudo-Op is completely resolved in Pass 1, which is when the Symbols need to be defined. Pass 2 only needs to format the line in the output file.


SPC N - This Pseudo-Op provides N blank lines for formatting the program listing. The Operand field would normally contain the actual number (decimal, hex, octal or binary) equal to the number of lines to be left blank. If no number appears in the Operand field, one space is assumed. A symbol or an expression is also allowed.

Normally, the SPC Pseudo-Op does not appear in the listing. In it's place will be N blank lines, as specified on the command line. It is, however, assigned a line number. So, you may notice that the line numbers will skip a number.

It is important to note that SPC is a Option as well as a Pseudo-Op. If OPT NSP is specified, all further SPC pseudo-ops will be ignored and simply listed in the ".lst" and ".rtf" file. If OPT SPC is subsequently specified, the SPC Pseudo-Op will then resume being recognized. See OPTION SPC/NSP below.


PAGE - This Pseudo-Op allows you to insert a "new page" entry into the RTF file. The default page size is 68 lines of code, plus two lines of header. However, with this Pseudo-Op the developer can cause the RTF file to skip to the top of the next page.

The PAGE Pseudo-Op will not be printed in the ".lst" file or the ".rtf" file. However, it will be noted in the ".dbg" file and the ".cln" file.

See the OPTION PSIZE=[??] below to change the the default page size.


ORG - This Pseudo-Op is used to set the assembly program counter to a particular value. The expression that defines this value may not contain forward references. The default initial value of the assembly program counter is $0000. In the example on the right, line 0001, 0002, and 0003 set the program address counter to 0 decimal, 100 decimal ($64), and $100 hex, respectively.

If a Symbol is present on the same line as an ORG statement, it is assigned the new value of the assembly program counter. Line 0001, 0002, and 0003 demonstrate this.


RMB - This Pseudo-Op Reserves Memory Byte(s) Pseudo-Op is used to reserve a block of storage for program variables, or whatever. This storage is not initialized in any way, so its value at run time will usually be random. The argument expression (which may not contain forward references) is added to the assembly program counter. The example would reserve 10 bytes of storage called "STORAGE".


FCB - The Form Constant Byte(s) Pseudo-Op allows arbitrary bytes to be spliced into the object code. Its argument is a chain of zero or more expressions that evaluate to -12810 thru 12710 separated by commas. The expressions can be in:

  • ASCII ('A,'B,'C,..)
  • Decimal (41,42,43,..)
  • Hex ($41,$42,$43,..)
  • Octal (@41,@42,@43,..)
  • Binary (%10101010,%10101011,%10101100,..)
If a comma occurs with no preceding expression, a $00 byte is spliced into the object code. In the example, the sequence of bytes $FE, $FF, $00, $01, $02 could be spliced into the code.

The FCB Pseudo-Op uses the current Program Counter (PC) address to add the bytes to the code. A Symbol ($1) can be used at the beginning of the reference, but it is not required. The Symbol will be assigned the current PC where the first byte is located. The example shows MTAPE1 being assigned to the start of the string "CRLF0000S1". The $04 is a terminator.

Multiple bytes, seperated by commas, can be used. A side issue is that, multiple bytes will generate multiple lines in the assembly listing. This is controlled my the SHORT Option. The default is for the assembler to generate multiple lines. Specifying OPT SHORT will suppress all but the first line, in a multi-line listing.


FCC - The Form Constant Character(s) Pseudo-Op translates strings of characters into their 7-bit ASCII codes. Any of the characters which correspond to ASCII hexadecimal codes $20 (Space) through $5F (_) can be processed by this directive.

The example shows two ways of using FCC. The first method encloses a text string between two delimiters. In the example, the delimiter is a "Double Quote". Another common delimiter might be "/". The delimiter can be any non-numeric character as long as it is not also included in the string. The ASCII string can included spaces but may not include a carriage return. For a carriage return and line feed use the FCB pseudo-op.

The second example shows the Count, Comma, Text method. Where Count specifies how many ASCII characters to generate and the Text begins following the first Comma of the Operand. Should the Count be longer than the Text, spaces will be inserted to fill the count. Maximum count is 255.

Note: If you require Null characters ($00) in the delimited text, a lower case "z" can be used. The example shows a delimites string with a "z" between each word. This will include a Null ($00) character between each word.


FDB - The Form Double Byte(s) Pseudo-Op allows 16-bit words to be spliced into the object code. Its argument is a chain of zero or more expressions separated by commas. If a comma occurs with no preceding expression, a word of $0000 is spliced into the code. The word is placed into memory high byte (MSB) in low address, low byte (LSB) in high address as per standard Motorola order. The sequence of bytes $FE $FF $00 $00 $01 $02 could be spliced into the code with the example.

Note: Acceptible entries are Symbol, Decimal, Hex, Octal, or Binary Only. FDB is not for generating bytes from ASCII strings. Use FCB or FCC for generating ASCII bytes or strings.


END - This Pseudo-Op tells the assembler that the source program is over. Any further lines of the source file are ignored and not passed on to the listing. If an argument is added to the END statement, the value of the argument will be placed in the execution address slot in the Intel hex object file. The execution address defaults to the program counter value at the point where the END was encountered. Thus, to specify that the program starts at label START, the END statement would be:

If end-of-file is encountered on the source file before an END statement is reached, the assembler will add an END statement to the listing and flag it with a * (missing statement) error.

If the END Pseudo-Op is encountered, the S19_Start Address option is set, but the S19_End Address is not set, the END Pseudo-Op will be used to set the S19_End Address.
if ( S19_Start_Set && !S19_End_Set ) { END_Address = nextmem; }


OPT [Directive] - The "OPTION" Pseudo-Op is intended to be used to override an assembler default. The defaults for the assembler are:

  • Use Page 0 addressing with OpCodes that can optionally use Direct and Extended addressing.
  • SPC directive is recognized.
  • Allow the FCB, FDB and FCC directives to generate multiple object lines.
  • The default ".rtf" paper size is Letter (8-1/2 x 11), Orientation (Portrait), and Margins (Normal) ( 1.0" Top, 1.0" Right, 1.0" Bottom, 1.0" left ). Each page is set to contain 68 lines of assembled code.
  • Line numbering starts at "1" and is incremented by "1" for each line of source code.
  • The ".s19" and ".hex" files are not generated.

This gives the programmer optional control of the format of assembler output. The "OPT" directive is not translated into machine code. No label should be used with the "OPT" directive. However, if one is given, it is ignored. The options are written in the Operand field. Multiple options can be specified on the same line, separated by commas.

These Options are unidirectional, in that, once set, the Option is in force for the rest of the document. For example, when Option NSP is defined in the source code, all subsequent SPC directives are ignored, from the point in the source code that the Option is specified.

The option directives are:

  • RELOC

    If RELOC is not specified, the default is to use Direct addressing when possible.

    This option directs the assembler to use Extended addressing when referencing page 0 ($0000 to $00FF) addresses. Even if the OpCode is capable of Direct addressing. This then allows the data and program to be assembled Relocatable. OpCodes that are forced to use Extended addressing, due to RELOC being specified, are marked with a "R". This option should be specified at the beginning of the source code.

    Note: This is only applicable when a OpCode is capable of using Direct and Extended mode addressing. If the OpCode is not capable of Direct mode, the assembled code is not marked with an "R".


  • SHORT

    Assembler directives (Pseudo-Ops) such as FCB, FDB or FCC can produce multiple object lines. If SHORT is not specified, the default is to list all FCB, FDB or FCC objects.

    If SHORT is specified, the assembler will list the first line directive object and not list the other objects. Memory space for the unlisted objects is reserved.


  • SPC/NSP

    By default, the SPC Pseudo-Op is enabled and blank lines will be added as specified.

    These options allows the user to disable/enable the the SPC Pseudo-Op. By default, the SPC directive is enabled. If NSP Option is specified, the SPC directive will be ignored. The SPC directive may be disable/enable as often as desired in the program. See a description of the SPC directive above.


  • PGSIZE, PGMARG, and PGLINES

    The default settings for the ".rtf" file include a header and footer. The header contains the name of the assembler, the name of the file being assembled, and the current date and time. For example:

    MC6800 CROSS ASSEMBLER, All_OpCode_Test.src Wed, Feb 6, 2019 13:44:47

    The footer contains the name of the source code, that is contained in the "NAME" Pseudo-Op, and the current page number. For example:

    "MC6800 CROSS-ASSEMBLER ALL OP-CODE TEST FILE" - Page 1

    The default ".rtf" paper size is Letter (8-1/2 x 11), Orientation (Portrait), and Normal ( 1.0" Top, 1.0" Right, 1.0" Bottom, 1.0" left ). Each page is set to contain 68 lines of assembled code.

    The OPTions listed below can be set anywhere in you source code because, the OPTions are recognized during Pass 1, but not output to the ".rtf" file until Pass 2. However, if you set these options multiple time, only the last setting will be in effect.

    OPT PGSIZE=? OPT PGMARG=? OPT PGLINES=?
    Letter
    8.5" W x 11" H
    Normal
    1" L, 1" R, 1" B, 1" T
    where ? can be just about any number you want. The actual PGLINES value is a function of the PGSIZE (Height), PGMARG (Top and Bottom), and the document font (Courier New, 8pt font). With the default font size, the PGLINES can be calculated by the following formula (in Inches)
    PGLINES = Page Height - Top Margin - Bottom Margin * 7.555
    For example, the following is a calculation for Legal paper, Normal margins and the default font:
    PGLINES = 14" - 1" - 1" * 7.555 LPP = 90.66 90
    The calculated value is rounded down to the nearest integer. If you use a different font the LPP value will need to be recalculated.
    Tabloid
    11" W x 17" H
    Narrow
    0.5" L, 0.5" R, 0.5" B, 0.5" T
    Ledger
    17" W x 11" H
    Moderate
    0.75" L, 0.75" R, 1" B, 1" T
    Legal
    8.5" W x 14" H
    Wide
    2" L, 2" R, 1" B, 1" T
    A4
    8.27" W x 11.69" H

    Note: If you specify a value for PGLINES that is too big for the available space, the default paging mechanism in MS Word, may split the page early. This will cause several pages to only have a couple of lines of data.

    Further, if you have lines that are very long, the line may be split into more than one line. These will be lines that the assembler can not account for and the RTF page my advance early. Again, leaving you with pages that only have a few lines of data.



  • Set Line Number and Increment for the List file.
  • LINENUM=N,LINEINC=M

    Set the current Line Number to N and Line Number Increment to M, for the assembly listing.

    If you want the listing to start at N and increment by M, from the first line of the listing, include both options on one line as the first line of source code. If you list them separately, an odd incrementing may result.


  • S-Record Control - S19_START=K | S19_END=L | S19_JUMP=M

    The options listed above are for managing the output of S‑Records. With S‑Records, you can create a complete, loadable, copy of your program(s) or program fragment. The assembler does not automatically generate S‑Records unless these options are used.

    The S‑Record format, detailed later, is such that it can be readily loaded into memory. While there are 9 types of S‑Records, only 4 are used by the assembler (S0, S1, S5, and S9).

    By default, the assembler does not create a S‑Record file. To have one created, the user must add, as a bare minimum, two Options that define the start address (S19_START) and ending address (S19_END) for the S‑Record. Like in the example (line 1,2,3). Those lines will create a S‑Record file that starts at $0100 and ends at $0200 (25610 Bytes). Each of the S‑Records will be 3210 bytes in length (default). The file will contain 8 - S1 records and will be terminated by a single S9 record. The S9 is mostly just to terminate the load. And, as shown on lines 4 and 5, the start and end addresses can be specified on a single line.

    S19_Start and S19_End are used to define the start and end addresses, for the S‑Records. The S1_End address should the last address, plus 1. If you use the END Pseudo‑Op, to terminate your program, the assembler will list address required by S1_End.

    S19_Jump is used when you want the S-Record file to contain a S9 record. A S9 record contains a address that the S-Record loading program will use as a transfer address, when the reading of the S‑Record file is completed.

    To generate a S‑Record file, the assembler needs to know the start address (K) and the end address (L). If S19_Start and S19_End are not set, a S-Record file will not be created.

    NOTE: The S19_End address should the last address, plus 1.

    The default record length will $20 (3210) Bytes per Record. This is not subject to change.

    The last S‑Record, S9, can include a Jump Address. This can be set by defining the option S19_Jump. If it is not defined, the assembler uses address $0000. The Jump Address is the Starting Execution address. Assuming you are using MIKBUG to read the S‑Records, upon completion of the read operation, the computer will transfer to this address and run the program.

    In the example above, an S-Record file starting at $0100 (25610) and ending at $0200 (51210) will be created. The Starting Execution address is set to $0000.


  • IHEX_START=K,IHEX_END=L

    Options for the Intel Hex file can all be specified on on the same line or, they can be all specified on separate lines. The ordering of the options, or the location is the source code, is not important. This is because the Intel Hex file is created from the bin_dat array, after assembly is completed. However, it's usually best to get all of the option setting done at the beginning of the source code.


    Intel Hex file settings.

    To generate a Intel Hex file, the assembler span to know the address to start (K), the address to end (L), and the number of bytes per record (M). If IHEX_START and IHEX_END are not set, a Intel Hex file will not be created.

    If the options IHEX_START and IHEX_END are set, the default record length (IHEX_Bytes) will $20 (3210) Bytes per Record. In the example, an Intel Hex file starting at $0100 (25610) and ending at $0200 (51210) will be created. Each record will be $20 (3210) bytes in length.

S-Records

The assembler uses S-Records as a method of generating a loadable program file. If you look back on the description of the Print/Punch Memory Dump paper tape format, for the MEK6800 D1, you will see it is exactly the same.

An S-Record file consists of a sequence of specially formatted ASCII character strings. An S-Record will be less than or equal to 78 bytes in length. Type (2 bytes) + Count (2 bytes) + Address (8 bytes, Max) + Data (64 bytes) + Checksum (2 bytes) = 78 bytes. In the assembler, only 4 bytes of Address are used and the Data is limited to 64 Bytes. So the maximum length of a S-Record is 74 Bytes. Where Type (2 bytes) + Count (2 bytes) + Address (4 bytes) + Data (64 bytes) + Checksum (2 bytes) = 74 bytes.

The general format of an S-Record follow:

  • Type - A 2-character field. These characters describe the type of record (S0, S1, S2, S3, S5, S7, S8, or S9).
  • Count - A 2-character field. These characters when paired and interpreted as a hexadecimal value, display the count of remaining character pairs in the record.
  • Address - A 4, 6, or 8-character field. These characters grouped and interpreted as a hexadecimal value, display the address at which the data field is to be loaded into memory. The length of the field depends on the number of bytes necessary to hold the address. A 2-byte address uses 4 characters, a 3-byte address uses 6 characters, and a 4-byte address uses 8 characters. Note: For the purposes of this assembler, only 4-byte addresses will be used.
  • Data - 0 to 64 2-character fields. These characters when paired and interpreted as hexadecimal values represent the memory loadable data or descriptive information.
  • Checksum - A 2-character field. The checksum is an 8-bit field that represents the least significant byte of the one’s complement of the sum of the values represented by the pairs of characters making up the record’s Count, Address, and Data fields.

In general, the order of S-Records within a file is of no significance and no particular order may be assumed. However, the assembler generates S-Records four specific types of records. And it generates them in a specific order. The information below details the types. The examples show the ordering.

Each record is terminated with a line feed. If any additional or different record terminator(s) or delay characters are needed during transmission to the target system it is the responsibility of the transmitting program to provide them.


EXAMPLE: Shown on the right is a typical S-record format file.

The file consists of one S0 record, four S1 records, one S5 record and an S9 record.

S0 Record - Split apart, the S0 record looks like this:
S0 06 0000 48 44 52 1B
The S0 indicates it is a Header record. The 06 says that there are 6 character pairs (or ASCII bytes) to follow. The 6 character pairs (6 bytes) are the rest of the record. 0000 (4 bytes) is a dummy address. 48 44 52 (3 bytes) is the ASCII code for "HDR" (abbreviation for Header). And, 1B is the Checksum.

The S0 Record record commonly contains information about the program, so that the file can be easily identified. Generally, the header information within the data field is divided into several subfields. The Module Name (20 Bytes), Version Number (2 Bytes), Revision Number (2 Bytes), Text Description (0-36 Bytes).

S1 Record(s) - There are four S1 records. Split apart, the first S1 record looks like this:
S1 13 0000 28 5F 24 5F 22 12 22 6A 00 04 24 29 00 08 23 7C 2A
The S1 indicates a Data record. The 13 says that there are 1316 (1910) character pairs (or ASCII bytes) to follow. The 0000 (4 Bytes) is the address where the 1316 (1910) bytes of data, that follows, is to be loaded. The 2A, at the end, is the Checksum.

The next three S-Records are also S1 Records. Split apart, the S1 records looks like this:
S1 13 0010 00 02 00 08 00 08 26 29 00 18 53 81 23 41 00 18 13
S1 13 0020 41 E9 00 08 4E 42 23 43 00 18 23 42 00 08 24 A9 52
S1 07 0030 00 14 4E D4 92
The first S1 Record above, has a address of 0010 and has 1316 (1910) character pairs (or ASCII bytes) of data. The second S1 Record above, has a address of 0020 and has 1316 (1910) character pairs (or ASCII bytes) of data. The third S1 Record above, has a address of 0030 but only has 0716 (710) character pairs (or ASCII bytes) of data. The character pair, at the end of each S-Record, is the Checksum.

S5 Record - Following the S1 Records is a S5 Record. Split apart, the S5 record looks like this:
S5 03 00 04 F8
The S5 indicates a Count record. The 03 says that there are 0316 (0310) character pairs (or ASCII bytes) to follow. The 0004 (2 Bytes) says that there were four data records previous to this record. The F8, at the end, is the Checksum.

S9 Record - The last record is a Termination record. Split apart, the S9 Record looks like this:
S9 03 0000 FC
The S9 indicates a Termination record. The 03 says that there are 0316 (0310) character pairs (or ASCII bytes) to follow. The 0000 (2 Bytes) is the starting execution address. Upon completion of the read operation, the computer will transfer to this address and run the program. The FC, at the end, is the Checksum.

Intel Hex ASCII Format

Intel HEX consists of lines of ASCII text that are separated by line feed or carriage return characters or both. Each text line contains hexadecimal characters that encode multiple binary numbers. The binary numbers may represent data, memory addresses, or other values, depending on their position in the line and the type and length of the line. Each text line is called a record.

Record structure

A record (line of text) consists of six fields (parts) that appear in order from left to right:

  1. Start Character - One character, an ASCII colon ':'.
  2. Hex Byte Count - Two hex digits, indicating the number of bytes (hex digit pairs) in the data field. The maximum byte count is 255 (0xFF). 16 (0x10) and 32 (0x20) are commonly used byte counts.
  3. Address of First Data - Four hex digits, representing the 16-bit beginning memory address offset of the data. The physical address of the data is computed by adding this offset to a previously established base address, thus allowing memory addressing beyond the 64 kilobyte limit of 16-bit addresses. The base address, which defaults to zero, can be changed by various types of records. Base addresses and address offsets are always expressed as big endian values.
  4. Record Type - Two hex digits, 00 to 05, defining the meaning of the data field (see record types below).
  5. Data - A sequence of n bytes of data, represented by 2n hex digits. Some records (See Record Type 01 in example) omit this field (n equals zero). The meaning and interpretation of data bytes depends on the application.
  6. Checksum - Two hex digits, a computed value that can be used to verify the record has no errors.

Checksum Calculation

A record's checksum byte is the two's complement (negative) of the least significant byte (LSB) of the sum of all decoded byte values in the record preceding the checksum. It is computed by summing the decoded byte values and extracting the LSB of the sum (i.e., the data checksum), and then calculating the two's complement of the LSB (e.g., by inverting its bits and adding one).

For example, in the case of the record :0300300002337A1E, the sum of the decoded byte values is 03 + 00 + 30 + 00 + 02 + 33 + 7A = E2. The two's complement of E2 is 1E, which is the checksum byte appearing at the end of the record.

The validity of a record can be checked by computing its checksum and verifying that the computed checksum equals the checksum appearing in the record; an error is indicated if the checksums differ. Since the record's checksum byte is the negative of the data checksum, this process can be reduced to summing all decoded byte values — including the record's checksum — and verifying that the LSB of the sum is zero.

Text Line Terminators

Intel HEX records are separated by one or more ASCII line termination characters so that each record appears alone on a text line. This enhances legibility by visually delimiting the records and it also provides padding between records that can be used to improve machine parsing efficiency.

Programs that create HEX records typically use line termination characters that conform to the conventions of their operating systems. For example, Linux programs use a single LF (line feed, hex value 0A) character to terminate lines, whereas Windows programs use a CR (carriage return, hex value 0D) followed by a LF.

Record Types

While the Intel Hex ASCII Format Example, above, only shows two record types, Intel HEX has six standard record types. From the previous diagram, you can see that the Record Type follows the Start Character (:), Byte Count, and the Address. The Byte Count specifies the number of Data Bytes in the record only.

Note that, unlike an S-Record, the Intel HEX Record Type is not the first ASCII pair on the line. It is the third ASCII pair. Also note that, the assembler described in the manual only deals with Record Type 00 and 01. But, I describe the other Record Types for clarity.

  • 00 - Data - Contains data and the 16-bit starting address for the Data. For example, the line:
    :10 0000 00 DB 00 E6 0F 5F 16 00 21 11 00 19 7E D3 00 C3 00 4C
    shows a Byte Count of 10 (1610) is expected and the Data will start at Address 0000. The Record Type, 00, is then followed by 10 (1610) bytes of Data. The last byte on the line is the Checksum.
  • 01 - End of Data - For example, the line:
    :00 0000 01 FF
    This Record Type must occur exactly once per file, in the last line of the file. The Data field is empty (thus Byte Count is 00) and the Address field is typically 0000.
  • 02 - Extended Segment Address - For example, the line:
    :02 0000 02 12 00 EA
    The data field contains a 16-bit segment base address (thus byte count is always 02) compatible with 80x86 real mode addressing. The address field (typically 0000) is ignored. The segment address from the most recent 02 record is multiplied by 16 and added to each subsequent data record address to form the physical starting address for the data. This allows addressing up to one megabyte of address space.
  • 03 - Start Segment Address - For example, the line:
    :04 0000 03 00 00 38 00 C1
    For 80x86 processors, specifies the initial content of the CS:IP registers. The address field is 0000, the byte count is always 04, the first two data bytes are the CS value, the latter two are the IP value.
  • 04 - Extended Linear Address - For example, the line:
    :02 0000 04 FF FF FC
    Allows for 32 bit addressing (up to 4GiB). The record's address field is ignored (typically 0000) and its byte count is always 02. The two data bytes (big endian) specify the upper 16 bits of the 32 bit absolute address for all subsequent type 00 records; these upper address bits apply until the next 04 record. The absolute address for a type 00 record is formed by combining the upper 16 address bits of the most recent 04 record with the low 16 address bits of the 00 record. If a type 00 record is not preceded by any type 04 records then its upper 16 address bits default to 0000.
  • 05 - Start Linear Address - For example, the line:
    :04 0000 05 00 00 00 CD 2A
    The address field is 0000 (not used) and the byte count is always 04. The four data bytes represent a 32-bit address value. In the case of 80386 and higher CPUs, this address is loaded into the EIP register.

Named formats

Special names are sometimes used to denote the formats of HEX files that employ specific subsets of record types. For example:

  • I8HEX files use only record types 00 and 01 (16 bit addresses)
  • I16HEX files use only record types 00 through 03 (20 bit addresses)
  • I32HEX files use only record types 00, 01, 04, and 05 (32 bit addresses)

This information comes from "Microprocessors and Programmed Logic", Second Edition, Kenneth L. Short, 1987, Prentice-Hall, ISBN 0-13-580606-2.