Assembly language programing under Unix is highly undocumented. It is generally assumed that no one would ever want to use it because various Unix systems run on different microprocessors, so everything should be written in C for portability.
In reality, C portability is quite a myth. Even C programs need to be modified when ported from one Unix to another, regardless of what processor each runs on. Typically, such a program is full of conditional statements depending on the system it is compiled for.
Even if we believe that all of Unix software should be written in C, or some other high-level language, we still need assembly language programmers: Who else would write the section of C library that accesses the kernel?
In this tutorial I will attempt to show you how you can use assembly language writing Unix programs, specifically under FreeBSD.
This tutorial does not explain the basics of assembly language. There are enough resources about that (for a complete online course in assembly language, see Randall Hyde’s Art of Assembly Language; or if you prefer a printed book, take a look at Jeff Duntemann’s Assembly Language Step-by-Step). However, once the tutorial is finished, any assembly language programmer will be able to write programs for FreeBSD quickly and efficiently.
Chapter 2 – System Calls
2.1. Default Calling Convention
An assembly language program can do that as well. For example, we could open a file:
kernel: |
open: |
The 5 that we have placed in EAX identifies the kernel function, in this case open.
2.2. Alternate Calling Convention
open: |
% brandelf -f Linux filename |
2.4. Call Numbers
2.4.1. The syscalls File
N.B.: Not only do FreeBSD and Linux use different calling conventions, they sometimes use different numbers for the same functions.
syscalls.master describes how the call is to be made:
|
It is the leftmost column that tells us the number to place in EAX.
The rightmost column tells us what parameters to push. They are pushed from right to left.
EXAMPLE 2.1: For example, toopena file, we need topushthemodefirst, thenflags, then the address at which thepathis stored.
Chapter 3 – Return Values
3.1. Man Pages
N.B.: The information presented in the man pages applies to C programs. The assembly language programmer needs additional information.
3.2. Where Are the Return Values?
N.B.: I am aware of one system call that returns the value inEDX:SYS_fork. All others I have worked with useEAX. But I have not worked with them all yet.
TIP: If you cannot find the answer here or anywhere else, study libc source code and see how it interfaces with the kernel.
3.3. Where Is errno?
errno is part of the C language, not the Unix kernel. When accessing kernel services directly, the error code is returned in EAX, the same register the proper return value generally ends up in.
Chapter 4 – Creating Portable Code
4.1. Dealing with Function Numbers
%ifdef LINUX |
4.2. Dealing with Conventions
Both, the calling convention, and the return value (the errno problem) can be resolved with macros:
%ifdef LINUX |
4.4. Using a Library
sys.open: |
sys.exit: |
4.5. Using an Include File
N.B.: This is the approach we will use throughout this tutorial. We will name our include file system.inc, and add to it whenever we deal with a new system call.
We can start our system.inc by declaring the standard file descriptors:
%define stdin 0 |
Next, we create a symbolic name for each system call:
%define SYS_nosys 0 |
section .text |
We create a macro which takes one argument, the syscall number:
%macro system 1 |
Finally, we create macros for each syscall. These macros take no arguments.
%macro sys.exit 0 |
Chapter 5 – Our First Program
We are now ready for our first program, the mandatory Hello, World!
1: %include 'system.inc' |
Here is what it does: Line 1 includes the defines, the macros, and the code from system.inc.
Lines 10-13 ask the system to write hbytes bytes of the hello string to stdout.
N.B.: If you have come to Unix from MS DOS assembly language background, you may be used to writing directly to the video hardware. You will never have to worry about this in FreeBSD, or any other flavor of Unix. As far as you are concerned, you are writing to a file known as stdout. This can be the video screen, or a telnet terminal, or an actual file, or even the input of another program. Which one it is, is for the system to figure out.
5.1. Assembling the Code
5.1.1. Installing nasm
If you do not have nasm, type:
% su |
N.B.: If your system is not FreeBSD, you need to get nasm from its home page. You can still use it to assemble FreeBSD code.
Now you can assemble, link, and run the code:
% nasm -f elf hello.asm |
Chapter 6 – Writing Unix Filters
%include 'system.inc' |
N.B.: For simplicity sake, we are ignoring the possibility of an error condition at this time.
% nasm -f elf hex.asm |
N.B.: If you are migrating to Unix from MS DOS, you may be wondering why each line ends with0Ainstead of0D 0A. This is because Unix does not use the cr/lf convention, but a “new line†convention, which is0Ain hexadecimal.
%include 'system.inc' |
Once you have changed hex.asm to reflect these changes, type:
% nasm -f elf hex.asm |
Chapter 7 – Buffered Input and Output
%include 'system.inc' |
% nasm -f elf hex.asm |
%include 'system.inc' |
% nasm -f elf hex.asm |
Not bad for a 644-byte executable, is it!
N.B.: This approach to buffered input/output still contains a hidden danger. I will discuss—and fix—it later, when I talk about the dark side of buffering.
7.1. How to Unread a Character
WARNING: This may be a somewhat advanced topic, mostly of interest to programmers familiar with the theory of compilers. If you wish, you may skip to the next chapter, and perhaps read this later.
While our sample program does not require it, more sophisticated filters often need to look ahead. In other words, they may need to see what the next character is (or even several characters). If the next character is of a certain value, it is part of the token currently being processed. Otherwise, it is not.
For example, you may be parsing the input stream for a textual string (e.g., when implementing a language compiler): If a character is followed by another character, or perhaps a digit, it is part of the token you are processing. If it is followed by white space, or some other value, then it is not part of the current token.
This presents an interesting problem: How to return the next character back to the input stream, so it can be read again later?
One possible solution is to store it in a character variable, then set a flag. We can modify getchar to check the flag, and if it is set, fetch the byte from that variable instead of the input buffer, and reset the flag. But, of course, that slows us down.
The C language has an ungetc() function, just for that purpose. Is there a quick way to implement it in our code? I would like you to scroll back up and take a look at the getchar procedure and see if you can find a nice and fast solution before reading the next paragraph. Then come back here and see my own solution.
The key to returning a character back to the stream is in how we are getting the characters to start with:
First we check if the buffer is empty by testing the value of EBX. If it is zero, we call the read procedure.
If we do have a character available, we use lodsb, then decrease the value of EBX. The lodsb instruction is effectively identical to:
mov al, [esi] |
The byte we have fetched remains in the buffer until the next time read is called. We do not know when that happens, but we do know it will not happen until the next call to getchar. Hence, to “return†the last-read byte back to the stream, all we have to do is decrease the value of ESI and increase the value of EBX:
ungetc: |
But, be careful! We are perfectly safe doing this if our look-ahead is at most one character at a time. If we are examining more than one upcoming character and call ungetc several times in a row, it will work most of the time, but not all the time (and will be tough to debug). Why?
Because as long as getchar does not have to call read, all of the pre-read bytes are still in the buffer, and our ungetc works without a glitch. But the moment getchar calls read, the contents of the buffer change.
We can always rely on ungetc working properly on the last character we have read with getchar, but not on anything we have read before that.
If your program reads more than one byte ahead, you have at least two choices:
If possible, modify the program so it only reads one byte ahead. This is the simplest solution.
If that option is not available, first of all determine the maximum number of characters your program needs to return to the input stream at one time. Increase that number slightly, just to be sure, preferably to a multiple of 16—so it aligns nicely. Then modify the .bss section of your code, and create a small “spare†buffer right before your input buffer, something like this:
section .bss |
You also need to modify your ungetc to pass the value of the byte to unget in AL:
ungetc: |
With this modification, you can call ungetc up to 17 times in a row safely (the first call will still be within the buffer, the remaining 16 may be either within the buffer or within the “spareâ€).
Chapter 8 – Command Line Arguments
N.B.: If you have come from the MS DOS programming environment, the main difference is that each argument is in a separate string. The second difference is that there is no practical limit on how many arguments there can be.
First, we need to add two new entries to our list of system call numbers:
%define SYS_open 5 |
Then we add two new macros at the end of the file:
%macro sys.open 0 |
Here, then, is our modified source code:
%include 'system.inc' |
In the .text section we have replaced the references to stdin and stdout with [fd.in] and [fd.out].
Chapter 9 – Unix Environment
9.2. webvars
9.2.1. CGI: A Quick Overview
I have a detailed CGI tutorial on my web site, but here is a very quick overview of CGI:
The web server communicates with the CGI program by setting environment variables.
The CGI program sends its output to stdout. The web server reads it from there.
It must start with an HTTP header followed by two blank lines.
It then prints the HTML code, or whatever other type of data it is producing.
N.B.: While certain environment variables use standard names, others vary, depending on the web server. That makes webvars quite a useful diagnostic tool.
9.2.2. The Code
The code follows. I placed comments and explanations right inside the code:
;;;;;;; webvars.asm ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
Assemble and link it as usual:
% nasm -f elf webvars.asm |
Then you need to use your browser to view its output. To see its output on my web server, please go to http://www.int80h.org/webvars/. If curious about the additional environment variables present in a password protected web directory, go to http://www.int80h.org/private/, using the name asm and password programmer.
Chapter 10 – Working with Files
One of the first programs I wrote for Unix was tuc, a text-to-Unix file converter. It converts a text file from other operating systems to a Unix text file. In other words, it changes from different kind of line endings to the newline convention of Unix. It saves the output in a different file. Optionally, it converts a Unix text file to a DOS text file.
I have used tuc extensively, but always only to convert from some other OS to Unix, never the other way. I have always wished it would just overwrite the file instead of me having to send the output to a different file. Most of the time, I end up using it like this:
% tuc myfile tempfile |
It would be nice to have a ftuc, i.e., fast tuc, and use it like this:
% ftuc myfile |
In this chapter, then, we will write ftuc in assembly language (the original tuc is in C), and study various file-oriented kernel services in the process.
At first sight, such a file conversion is very simple: All you have to do is strip the carriage returns, right?
If you answered yes, think again: That approach will work most of the time (at least with MS DOS text files), but will fail occasionally.
The problem is that not all non-Unix text files end their line with the carriage return / line feed sequence. Some use carriage returns without line feeds. Others combine several blank lines into a single carriage return followed by several line feeds. And so on.
A text file converter, then, must be able to handle any possible line endings:
carriage return / line feed
carriage return
line feed / carriage return
line feed
It should also handle files that use some kind of a combination of the above (e.g., carriage return followed by several line feeds).
10.1.Finite State Machine
10.1.1. The Final State
N.B.: Now that we have expressed our algorithm as a finite state machine, we could easily design a dedicated digital electronic circuit (a “chipâ€) to do the conversion for us. Of course, doing so would be considerably more expensive than writing an assembly language program.
10.2. Implementing FSM in Software
|
Another approach is by using an array of function pointers, something like this:
(output[state])(inputchar); |
Yet another is to have state be a function pointer, set to point at the appropriate function:
(*state)(inputchar); |
call ebx |
10.3.Memory Mapped Files
The syscalls.master file lists the POSIX version like this:
|
This differs slightly from what mmap(2) says. That is because mmap(2) describes the C version.
When we are finished working with a memory-mapped file, we unmap it with the munmap syscall:
TIP: For an in-depth treatment of mmap, see W. Richard Stevens’ Unix Network Programming, Volume 2, Chapter 12. 10.6. ftuc
;;;;;;; open flags |
%define SYS_mmap 197 |
We add the macros for their use:
%macro sys.mmap 0 |
;;;;;;; Fast Text-to-Unix Conversion (ftuc.asm) ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
WARNING: Do not use this program on files stored on a disk formated by MS DOS or Windows. There seems to be a subtle bug in the FreeBSD code when usingmmapon these drives mounted under FreeBSD: If the file is over a certain size,mmapwill just fill the memory with zeros, and then copy them to the file overwriting its contents.
Chapter 11 – One-Pointed Mind
As a student of Zen, I like the idea of a one-pointed mind: Do one thing at a time, and do it well.
11.1. CSV
I will illustrate this principle with a specific real-life example I was faced with recently:
N.B.: While it took me 20 minutes to write, it took me almost a day to debug. This was because of the .code problem described in the change log. I am just mentioning this so you do not wonder why the code itself says it was started on one day, updated the next. This time I decided to let it do a little more work than a typical tutorial program would:
It parses its command line for options;
It displays proper usage if it finds wrong arguments;
It produces meaningful error messages.
Here is its usage message:
Usage: csv [-t |
All parameters are optional, and can appear in any order.
The -t parameter declares what to replace the commas with. The tab is the default here. For example, -t; will replace all unquoted commas with semicolons.
I did not need the -c option, but it may come in handy in the future. It lets me declare that I want a character other than a comma replaced with something else. For example, -c@ will replace all at signs (useful if you want to split a list of email addresses to their user names and domains).
The -p option preserves the first line, i.e., it does not delete it. By default, we delete the first line because in a CSV file it contains the field names rather than data.
The -i and -o options let me specify the input and the output files. Defaults are stdin and stdout, so this is a regular Unix filter.
I made sure that both -i filename and -ifilename are accepted. I also made sure that only one input and one output files may be specified.
To get the 11th field of each record, I can now do:
% csv '-t;' data.csv | awk '-F;' '{print 1}' |
The code stores the options (except for the file descriptors) in EDX: The comma in DH, the new separator in DL, and the flag for the -p option in the highest bit of EDX, so a check for its sign will give us a quick decision what to do.
Here is the code:
;;;;;;; csv.asm ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
Much of it is taken from hex.asm above. But there is one important difference: I no longer call write whenever I am outputing a line feed. Yet, the code can be used interactively.
I have found a better solution for the interactive problem since I first started writing this tutorial. I wanted to make sure each line is printed out separately only when needed. After all, there is no need to flush out every line when used non-interactively.
The new solution I use now is to call write every time I find the input buffer empty. That way, when running in the interactive mode, the program reads one line from the user’s keyboard, processes it, and sees its input buffer is empty. It flushes its output and reads the next line.
11.1.1. The Dark Side of Buffering
.loop: |
Because this filter works with raw data, it is unlikely to be used interactively.
The image editor will load our filter using the C function
popen().It will read the first row of pixels from a bitmap or pixmap.
It will write the first row of pixels to the pipe leading to the
fd.inof our filter.getcharwill find an empty input buffer, so it will callread.The kernel will suspend our filter until the image editor sends more data to the pipe.
Chapter 12 – Using the FPU
12.1. Organization of the FPU
That said, the assembly language op codes are not push and pop because those are already taken.
12.1.1. The Packed Decimal Format
TIP: You can use it to get decimal places by multiplying the TOS by a power of 10 first.
The remaining 9 bytes store the 18 digits of the number: 2 digits per byte.
80 00 00 00 00 00 01 23 45 67 |
Alas it is not! As with everything else of Intel make, even the packed decimal is little–endian.
That means our -1234567 is stored like this:
67 45 23 01 00 00 00 00 00 80 |
Remember that, or you will be pulling your hair out in desperation!
N.B.: The book to read—if you can find it—is Richard Startz’ 8087/80287/80387 for the IBM PC & Compatibles. Though it does seem to take the fact about the little–endian storage of the packed decimal for granted. I kid you not about the desperation of trying to figure out what was wrong with the filter I show below before it occurred to me I should try the little–endian order even for this type of data.
12.2. Excursion to Pinhole Photography
12.3. Designing the Pinhole Software
We are now ready to decide what exactly we want our pinhole software to do.
12.3.1. Processing Program Input
One program, ftuc used the state machine to consider at most two input bytes at a time.
|
|
There is no reason for the computer to spit out a number of complaints:
Syntax error: What |
12.3.2. Offering Options
Why have two ways of choosing?
This type of choice is usually done with command line parameters.
Given this system, the program may find conflicting options, and handle them this way:
We also need to decide what format our PC option should have.
It may crash the program because we have not designed it to handle huge numbers.
Or, we might say, “Tough! The user should know better."â€
12.3.3. The Output
We need to decide what we want our software to send to the output, and in what format.
So, it makes perfect sense to start each line with the focal length as entered by the user.
No, wait! Not as entered by the user. What if the user types in something like this:
|
Clearly, we need to strip those leading zeros.
What if the user types something like this:
|
We will slap him in the face, in a manner of speaking:
17459765723452353453534535353530530534563507309676764423 ??? ??? ??? ??? ??? |
Now, while we are taking these three steps, we also need to watch out for one of two conditions:
0 ??? ??? ??? ??? ??? |
At this point we have yet another trap to face: Too much precision.
N.B.: I “only†used ten digits in the above example. Imagine the absurdity of going for all 18!
We, therefore, must devise an algorithm to reduce the number of significant digits.
Here is mine (I think it is awkward—if you know a better one, please, let me know):
N.B.: The10000is only good if you want four significant digits. For any other number of significant digits, replace10000with10raised to the number of significant digits.
We will, then, output the pinhole diameter in microns, rounded off to four significant digits.
fmul st0, st0 |
frndint rounds the TOS to the nearest integer. fld1 pushes a 1. fscale shifts the 1 we have on the TOS by the value in st(1), effectively raising 2 to st(1).
Finally, fsqrt calculates the square root of the result, i.e., the nearest normalized f–number.
5.6 is a constant. We do not have to have our FPU waste precious cycles. We can just tell it to divide the square of the f–number by whatever 5.6² equals to. Or we can divide the f–number by 5.6, and then square the result. The two ways now seem equal.
12.4. FPU Optimizations
In assembly language we can optimize the FPU code in ways impossible in high languages, including C.
We can take that idea even further! In our program we are using a constant (the one we named PC).
fld1 ; TOS = 1 |
We can generalize all these optimizations into one rule: Keep repeat values on the stack!
TIP: PostScript is a stack–oriented programming language. There are many more books available about PostScript than about the FPU assembly language: Mastering PostScript will help you master the FPU.
12.5. pinhole—The Code
;;;;;;; pinhole.asm ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
When we have no more input, it can mean one of two things:
Yet, our main code relies on the
carry flagto tell it when to quit—and it works.
12.6. Using pinhole
Our session might look like this:
% pinhole |
12.7. Scripting
You have probably seen shell scripts that start with:
#! /bin/sh |
#!/bin/sh |
...because the blank space after the #! is optional.
The script might look something like this:
#! /usr/local/bin/pinhole -b -i |
Because 120 is a medium size film, we may name this file medium.
We can set its permissions to execute, and run it as if it were a program:
% chmod 755 medium |
Unix will interpret that last command as:
% /usr/local/bin/pinhole -b -i ./medium |
It will run that command and display:
80 358 224 256 1562 11 |
% ./medium -c |
% /usr/local/bin/pinhole -b -i ./medium -c |
80 331 242 256 1826 11 |
% ./medium -b -e > bender |
Chapter 13 – Caveats
The reason? Both the PC BIOS and MS DOS are notoriously slow when performing these operations.
That is generally a very bad idea in Unix environment! Let me explain why.
13.2. Unix Is an Abstraction
% program1 | program2 | program3 > file1 |
N.B.: These are caveats, not absolute rules. Exceptions are possible. For example, if a text editor has determined it is running on a local machine, it may want to read the scan codes directly for improved control. I am not mentioning these caveats to tell you what to do or what not to do, just to make you aware of certain pitfalls that await you if you have just arrived to Unix form MS DOS. Of course, creative people often break rules, and it is OK as long as they know they are breaking them and why.
Appendix A – Assembly Language Pearls
String Length – learn how to calculate the length of a text string in assembly language.
Smallest Unix Program – see how we can shrink the smallest Unix program.
Appendix B – BSD Style Copyright
I have edited it for inclusion in assembly language programs. If you want it, download the nasm-compatible BSD-style copyright, insert your name, and include it in your source code.

No comments:
Post a Comment