-
Notifications
You must be signed in to change notification settings - Fork 13
/
tajga-fasm-tutorial.html
342 lines (332 loc) · 101 KB
/
tajga-fasm-tutorial.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
<meta name="author" content="Vid">
<meta name="dcterms.date" content="2004-12-27">
<meta name="description" content="FASM tutorial for beginners, by TAJGA Team. No programming knowledge required.">
<meta name="keywords" content="fasm, flat assembler, asm, assembler, assembly, DOS, MS Windows, TAJGA Team">
<title>PureBASIC Archives — TAJGA FASM Tutorial</title>
<style type="text/css">code{white-space: pre;}</style>
<link rel="stylesheet" href="../../../shared/css/tutorial.css">
<script src="../../../shared/js/highlight.pack.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<!--[if lt IE 9]>
<script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
<![endif]-->
</head>
<body>
<header>
<h1 class="title">TAJGA FASM Tutorial</h1>
<p class="subtitle">A beginners’ FASM tutorial for MS DOS — by Vid, 2004</p>
</header>
<pre class="nohighlight"><code>Current revision: 2017-01-26</code></pre><p>This is a reprint of the <em>TAJGA FASM Tutorial</em> by «<strong>Vid</strong>», from the <strong>TAJGA Team</strong>. License: <a href="#legal-stuff">unrestricted</a> public domain (confirmed by Vid via private messagging). Last edited by the author in 2004-12-27.</p><p>Ported to Markdown, edited and reprinted by Tristano Ajmone <a href="https://github.com/tajmone"><strong>@tajmone</strong></a> (2017-01-24). Original tutorial files downloaded from:</p><ul><li><a href="http://bos.asmhackers.net/docs/FASM%20tutorial/">http://bos.asmhackers.net/docs/FASM%20tutorial/</a></li></ul><div class="alert alert-warn"><p><strong>CHANGES</strong> — Some changes made to original text:</p><ul><li>English text polished to improve readability.</li><li>Some paragraphs rewritten to improve clarity.</li><li>Fixed a few code typos/errors.</li><li>Added or corrected links to mentioned references.</li><li>Styled notes, tips and warnings with CSS Alert Boxes.</li><li>Applied highlighter style to key passages.</li></ul></div>
<div class="alert alert-info"><strong>NOTE</strong> — The original website of the <strong>TAJGA Team</strong> is now defunct (tajga.kallimagarden.com), but the team has now an <a href="https://github.com/tajga">organization profile</a> and a <a href="http://tajga.github.io/">website</a> on GitHub.</div>
<nav id="TOC">
<h1>Table of Contents</h1>
<ul><li><a href="#introduction">Introduction</a><ul><li><a href="#for-whom">For whom?</a></li><li><a href="#what-os">What OS?</a></li><li><a href="#development-status-of-tutorial">Development status of tutorial</a></li><li><a href="#additional-articles">Additional articles</a></li><li><a href="#legal-stuff">Legal stuff</a></li><li><a href="#translating-this-tutorial">Translating this tutorial</a></li></ul></li><li><a href="#getting-started">1. Getting Started</a></li><li><a href="#first-program">2. First Program</a></li><li><a href="#labels-addresses-and-variables">3. Labels, Addresses and Variables</a><ul><li><a href="#labels">3.1. Labels</a></li><li><a href="#defining-variables">3.2. Defining Variables</a></li><li><a href="#addresses-and-basics-of-segmentation">3.3. Addresses and basics of segmentation</a></li><li><a href="#the-org-directive-explained">3.4. The ‘org’ directive explained</a></li></ul></li><li><a href="#endian-encodings-and-word-registers">4. Endian Encodings and Word Registers</a><ul><li><a href="#endian-encodings">4.1. Endian encodings</a></li><li><a href="#word-registers">4.2. Word registers</a></li><li><a href="#string-output-using-int-21hah9">4.3. String output using int 21h/ah=9</a></li></ul></li><li><a href="#jumps-and-branching">5. Jumps and Branching</a><ul><li><a href="#instruction-pointer">5.1. Instruction pointer</a></li><li><a href="#jumps">5.2. Jumps</a></li><li><a href="#comparing-and-conditonal-jumps">5.3. Comparing and conditonal jumps</a></li></ul></li><li><a href="#bit-arithmetics">6. Bit Arithmetics</a><ul><li><a href="#encoding-numbers-in-bits">6.1. Encoding numbers in bits</a></li><li><a href="#binary-constants">6.2. Binary constants</a></li><li><a href="#bit-operations">6.3. Bit operations</a></li><li><a href="#binary-operations-instructions">6.4. Binary operations instructions</a></li><li><a href="#testing-bits">6.5. Testing bits</a></li></ul></li><li><a href="#arithmetic-instructions-more-on-flags">7. Arithmetic Instructions: More on Flags</a><ul><li><a href="#addition-and-substraction">7.1. Addition and substraction</a></li><li><a href="#overflows">7.2. Overflows</a></li><li><a href="#zero-flag">7.3. Zero Flag</a></li><li><a href="#carry-flag-more-binary-arithmetic-instructions">7.4. Carry flag: more binary arithmetic instructions</a></li><li><a href="#some-examples">7.5. Some examples</a></li></ul></li></ul>
</nav>
<h1 id="introduction">Introduction</h1><h2 id="for-whom">For whom?</h2><p>It is meant for beginners, so almost no programing knowledge is required. Altough learning assembly as first programing language is quite hard, it is possible. You only need to know how to use the command line (<code>command.com</code> in DOS/Win95/98, <code>cmd</code> in WinNT/XP). Some programming knowledge is very helpful, but not nescessary.</p><h2 id="what-os">What OS?</h2><p>I decided to write tutorial for DOS, because it allows you to use whole your machine however you wish, unlike Windoze. I just want this tutorial to cover more assembly topics than Windows allows, so DOS is my only choice. Porting from DOS to Windows isn’t very difficult, if you already know protected mode. <a href="http://win32assembly.programminghorizon.com/tutorials.html">IczLion’s tutorial</a> covers this, and is also translated to FASM. I personally don’t like starting learning assembly under Windows, because there are too many things abstracted from you.</p><h2 id="development-status-of-tutorial">Development status of tutorial</h2><p>This tutorial is very far from being complete, but it already contains enough informations to be worth reading for beginners and intermediates.</p><p>Currently, <a href="#arithmetic-instructions-more-on-flags"><strong>Chapter 7</strong></a> is done and is being revised.</p><h2 id="additional-articles">Additional articles</h2><p>There are also some other articles connected with assembly and FASM included in this tutorial package.</p><p>Articles:</p><ul><li><a href="./fasm-preprocessor-guide.html">FASM preprocessor guide</a> (complete)</li></ul><h2 id="legal-stuff">Legal stuff</h2><p>This tutorial is without any warranty (you are using it at your own risk <strong>:)</strong> ). That also means you can use it however you like, and/or do whatever you like with it. If you would like to include tutorial or some part of it in a project you are working on, or you want to translate it to some other language, or something like that, then it would be nice if you wrote me an <a href="mailto:vid@InMail.sk">e-mail</a> and mention me somewhere in your work.</p><h2 id="translating-this-tutorial">Translating this tutorial</h2><p>There were several requests for permission to translate this tutorial. Of course, I’ve granted it; but I realized that it is hard to maintain such a translation: if I edit the already-writen text, the maintainer doesn’t always know about it. So I decided to create change log, inside which all changes will be logged. The log will be provided on request.</p><p>I will provide links to translated versions as soon as they are ready.</p><p>Please help me to improve this tutorial by sending all sugestions and error reports to <a href="https://board.flatassembler.net/topic.php?t=1178">tutorial’s thread on FASM board</a> or with <a href="mailto:vid@InMail.sk">email</a> if you like.</p><p>Also, if you find that something in this tutorial isn’t explained comprehensibly enough , please tell what is it, where in tutorial is it mentioned, and what you don’t understand about it, so I can add it for future readers.</p><p>This tutorial was translated to HTML (from original plain-text version) by <a href="https://web.archive.org/web/20070128085747/http://decard.net/?"><strong>Decard</strong></a>. He also created a utility to convert FASM source to HTML text with syntax highlighting used for code blocks in this tutorial. There is also <a href="https://web.archive.org/web/20060925090450/http://www.decard.net/article.php?body=tajga">online version of this tutorial</a> hosted at his site.</p><h1 id="getting-started">1. Getting Started</h1><p>I assume you have some basic knowledge about what bytes are, and an idea on what ASCII code is. Maybe I’ll describe ASCII in a next versions of tutorial.</p><p>First, try to compile and empty source file. Just create an empty “<code>empty.asm</code>” file and type in the command line:</p><pre class="dos"><code>fasm empty.asm empty.bin</code></pre><p>You should see that a “<code>empty.bin</code>” file is created, and it’s length is zero.</p><p>Now we will create binary file containing some data. Create a text file containing the follwoing line:</p><pre class="fasm"><code>db 'a'</code></pre><p>and compile it (I hope you already know how). When you look at the created file you should see that it’s 1 byte long and it contains character “<code>a</code>”.</p><p>Now let’s analyze the source: <code>db</code> is a “<strong>directive</strong>” (a directive is command to the compiler, remember this!) which means “<strong>define byte</strong>”. So this directive will put a byte into the destination file. Value of byte should follow this directive. For example, <code>db 0</code> will insert a byte with value 0 into the destination file. But if you wanted to insert a character, you would have to remember its ASCII value. In this case, you can enter the character enclosed in straight single quotes (<code>'</code>) and the compiler will “get” its value for you. This is how the above code works.</p><div class="alert alert-info"><p><strong>directive</strong></p><p>A command to the compiler.</p></div><p>Now let’s create a file with more than one character. It will be:</p><pre class="fasm"><code>db '1'
db '2'
db '3'</code></pre><p>I think it should be clear how this works: it stores three bytes into the destination file, which should now contain a simple line with <code>123</code>. By the way you can’t write:</p><pre class="fasm"><code>db '1' db '2' db '3'</code></pre><p>because <mark>every directive must be on separate line</mark>. But if you want to define more bytes, you can use a simple <code>db</code> directive followed by multiple values, sperated by commas (<code>,</code>):</p><pre class="fasm"><code>db '1','2','3'</code></pre><p>This will also produce a file with <code>123</code>.</p><p>But what if you wanted to define something longer, for example a file containing <code>This is my first long string in FASM</code>? You could write:</p><pre class="fasm"><code>db 'T','h','i','s' etc...</code></pre><p>but this isn’t very nice. For this reason, if you want define more consecutive characters using <code>db</code> , you can use this form:</p><pre class="fasm"><code>db 'This is my first long string in FASM'</code></pre><p>So you have to enclose whole text in quotes. You could also write:</p><pre class="fasm"><code>db 'This is my first long string in ','FASM'</code></pre><p>or</p><pre class="fasm"><code>db 'Thi','s is my first lo','ng string in',' FASM'</code></pre><p>etc.</p><div class="alert alert-info"><p><strong>string</strong>, <strong>quoted string</strong></p><p>Text enclosed in quotes is called a “<strong>string</strong>”. In general, a “string” is an array of characters. The term for denoting a string inside source code is “<strong>quoted string</strong>”.</p></div><h1 id="first-program">2. First Program</h1><p>You may wonder why I’m fooling about creating text files when you want to learn assembly. But text files are just some “<strong>arrays</strong>” of bytes. You haven’t learnt just how to create a text file: you learnt how to define a file containing any data you want! And this is what a runnable program is — a special “<strong>data</strong>” file, an array of numeric values, called “<strong>machine code</strong>”. You only have to know the meaning of these values <strong>:)</strong>. Of course, it’s very hard to remember all the values and their’s meanings, and this is what an assembler is for: It translates programs from a human acceptable language to machine code. Therefore, you only have to learn this human acceptable language <strong>:)</strong>.</p><div class="alert alert-info"><p><strong>machine code</strong></p><p>An array of numeric values that represents instructions to the processor (CPU).</p></div><p>Now we’ll look into DOS <code>.COM</code> (<strong>COM files</strong>) programs (occasionaly called “<strong>memory image</strong>” — you will learn why later on, when you get into it). These are the simplest executable (runnable) files under DOS and Windows.</p><p>So let’s create our first <code>.COM</code> file, which won’t do anything.</p><pre class="fasm"><code>org 256
int 20h</code></pre><p>Compile this to a <code>.COM</code> file and run it. Nothing should happen. Now let’s look at what those two lines of code mean. (This is going to be funny…)</p><pre class="fasm"><code>org 256</code></pre><p>Right now, I won’t explain what this directive does. Just put this line in the begginning of every <code>.COM</code> file! It doesn’t define any data, it doesn’t even do anything that you can notice. We’ll get back to this later on.</p><pre class="fasm"><code>int 20h</code></pre><p>This is an “<strong>instruction</strong>”. An instruction is a command for the processor, which is stored in the created file as one or more bytes. When you run a <code>.COM</code> executable file, the processor walks through it and decodes its instructions from machine code and does what these instructions instruct it to. Instruction <code>int 20h</code> says that this is the end of execution of the file. So, the first instruction in this code tells the processor to stop execution, therefore the executable file does nothing, as you saw.</p><div class="alert alert-info"><p><strong>instruction</strong></p><p>A single command to the processor.</p></div>
<div class="alert"><p><strong>BY THE WAY</strong>: <code>int 20h</code> is NOT the processor instruction which ends the execution of the <code>.COM</code> program. It instructs the processor to call a system procedure. The system procedure to be called is chosen by the number following <code>int</code> — in our case: number <code>20h</code> (it IS a sort of number), which means calling the procedure to end a <code>.COM</code> file.</p><p><code>int</code> could be followed by a different number, and a different system procedure would be called. But right now, we can abstract away from this, forget about it and take <code>int 20h</code> as the instruction to stop a program.</p></div><p>So, “<strong>machine code</strong>” is a set of “<strong>instructions</strong>”. <mark>There is a difference between directives and instructions.</mark> Directives are commands for the compiler — how it should define data, and what data it should define. Instructions are defined data which encodes what the processor will do when you execute the program. For example, <code>db 0,0</code> is a directive which defines two zero bytes, but it is an instruction in case it is executed, because two zero bytes have special meaning for the processor (don’t bother about their meaning right now). <code>org 256</code> is a directive, not an instruction, because it doesn’t define any data. You will get into this by practice.</p><p>Instruction <code>int 20h</code> is simple, it doesn’t need any arguments (=parameters, or values which changes its effect). But what if some instruction DOES need arguments? For this reason the processor has it’s own “<strong>variables</strong>” (variable is a general term for a space which stores some value). These variables are called “<strong>registers</strong>”. The first registers we’ll learn are <code>al</code>, <code>ah</code>, <code>bl</code>, <code>bh</code>, <code>cl</code>, <code>ch</code>, <code>dl</code>, and <code>dh</code>, which are <strong>byte-sized</strong> (they contain a value within the range 0 to 255).</p><div class="alert alert-info"><p><strong>register</strong></p><p>An “internal” processor’s variable.</p></div>
<div class="alert"><strong>BY THE WAY</strong>: <code>int 20h</code> takes its argument in the <code>AL</code> register, but, again, we can abstract from this. And, in fact, value <code>20h</code> is an instruction argument too, but we abstracted from this before. This is what I was talking about when I mentioned that this was “going to be funny”.</div><p>Now, how do we set the value of a register? There is a instruction which does this, for example:</p><pre class="fasm"><code>mov al,10</code></pre><p>this instruction sets the value of the <code>al</code> register to 10. <code>mov</code> stands for “<strong>move</strong>”. The destination of “moving” follows <code>mov</code> (separated with spaces) — in our case, it’s the <code>al</code> register. Then comes the source of “moving”, separated by a comma (<code>,</code>) — in our case, it’s number 10. So this instruction “moves” value 10 to register <code>al</code>. Another example of moving:</p><pre class="fasm"><code>mov al,bl</code></pre><p>This copies the value in the <code>bl</code> register to the <code>al</code> register. It won’t change the value in the <code>bl</code> register. <mark>The source of <code>mov</code> always remains unchanged.</mark></p><div class="alert"><p><strong>NOTE:</strong> You will often find people talking about the <code>mov</code> instruction. But <code>mov</code> is not instruction, and <code>int</code> is not an instruction either. <code>mov al,bl</code> and <code>int 20h</code> are instructions, for example. <code>mov</code> and <code>int</code> are called “<strong>instruction mnemonics</strong>”. But just accept it: everyone calls it “<strong>instruction</strong>”; and probably you will too, after some time — and I probably will too, sorry <strong>:)</strong>.</p>Arguments of an instruction (the part of the instruction without instruction mnemonics, like <code>al</code> and <code>10</code> in <code>mov al,10</code>) are called “<strong>instruction operands</strong>” (or “<strong>instruction arguments</strong>”)</div>
<div class="alert alert-info"><p><strong>instruction mnemonics</strong> (this term is not so improtant right now)</p><strong>instruction operand</strong></div><p>Now let’s get to how registers are used. We will use the <code>int 21h</code> instruction which can do MANY things depending on the value in <code>ah</code> register. We won’t learn the meaning of every value, right now we will talk only about value 2. If value 2 is in the <code>ah</code> register when instruction <code>int 21h</code> is executed, then the character in <code>dl</code> (more precisely: the character whose ASCII code is in <code>dl</code>) is printed to screen (console).</p><div class="alert"><strong>NOTE:</strong> if you are using a Windows file manager (like Total Commander), you’ll see a window appear for a very short time and then disappear. Our character is being displayed in this window and you probably can’t notice it. You must run a shell (<code>cmd</code> on XP, <code>command</code> on older Windozes) and run your program from it. Anyway, if you can’t handle this, forget about assembly for a while and learn using your operating system first — and then, don’t forget to return to assembly!</div><p>Okay, so let’s look at a program which prints character “<code>a</code>”:</p><pre class="fasm"><code>org 256
mov ah,2
mov dl,'a'
int 21h
int 20h</code></pre><p>Here is its analysis:</p><pre class="fasm"><code>mov ah,2</code></pre><p>sets value of “<code>ah</code>” register to 2 — this should be clear.</p><pre class="fasm"><code>mov dl,'a'</code></pre><p>this moves character “<code>a</code>” into <code>dl</code> register. (In fact, there is nothing like “character a” in assembly. You might have noticed that I wrote that registers can contain numeric values. Nothing about characters. The way this works is that the compiler translates a character enclosed in quotes into its numeric (ASCII) code, which is then recognized by <code>int 21h</code> as the code for this character. In assembly, character “<code>a</code>” means ASCII code for character “<code>a</code>”)</p><pre class="fasm"><code>int 21h</code></pre><p>In this case, when “<code>ah</code>” contains value 2, this prints the character in “<code>dl</code>”</p><pre class="fasm"><code>int 20h</code></pre><p>And we musn’t forget to stop execution. Otherwise, the program will most probably crash.</p><div class="alert"><p><strong>NOTE:</strong> In assembly a character enclosed in quotes is the same as the ASCII code of that character.</p></div><p>So, the code for printing multiple characters (“<code>ab</code>”) is:</p><pre class="fasm"><code>org 256
mov ah,2
mov dl,'a'
int 21h
mov dl,'b'
int 21h
int 20h</code></pre><p>we don’t have to set <code>ah</code> to 2 again, for the second <code>int 21h</code>, because <code>ah</code> will retrain the value of 2 previously set. Also <code>dl</code> will retain its value, therefore the following code:</p><pre class="fasm"><code>org 256
mov ah,2
mov dl,'a'
int 21h
int 21h
int 21h
int 20h</code></pre><p>will print “<code>aaa</code>”.</p><h1 id="labels-addresses-and-variables">3. Labels, Addresses and Variables</h1><p>Okay, let’s get to variables. In the previous chapter I wrote that variable is general term for a space which stores some value. Registers, for example, are variables. But there is a limited number of registers (VERY limited: around 8 + a few special ones), and their number is rarely sufficient. For this reason memory (RAM — random access memory) is used.</p><div class="alert"><strong>NOTE:</strong> When someone says “variable” he almost always means memory variable.</div><h2 id="labels">3.1. Labels</h2><p>The problem is that you have to know WHERE in memory some value is being stored. A position in memory (called “<strong>address</strong>”) is given by number. But it’s quite hard to remember this number (address) for every variable.</p><div class="alert alert-info"><p>term: <strong>address</strong></p>A number which gives a position in memory.</div><p>Another problem with addresses is that when you change your program, addresses might changed too, so you would have to correct their number everywhere they are used. For this reason addresses are represented by “<strong>labels</strong>”. A label is just some word (not a string, it is not enclosed in quotes) which, in your program, represents an address in memory. When you compile your program, the compiler will replace the label with the proper address. Labels consists of alphabet characters (“<code>a</code>” to “<code>z</code>”, “<code>A</code>” to “<code>Z</code>”), numbers (“<code>0</code>” to “<code>9</code>”), underscores (“<code>_</code>”) and dots (“<code>.</code>”). But <mark>the first character of a label can’t be a number or a dot</mark>. Also, a label can’t have the same name as a directive or an instruction (instruction mnemonics). Labels are case sensitive in FASM (“<code>a</code>” is NOT same as “<code>A</code>”).</p><p>Labels Examples:</p><table><thead><tr class="header"><th>LABEL</th><th>STATUS</th></tr></thead><tbody><tr class="odd"><td><code>name</code></td><td>a label</td></tr><tr class="even"><td><code>a</code></td><td>a label</td></tr><tr class="odd"><td><code>A</code></td><td>a label, different from “<code>a</code>”</td></tr><tr class="even"><td><code>name2</code></td><td>a label</td></tr><tr class="odd"><td><code>name.NAME2</code></td><td>a label</td></tr><tr class="even"><td><code>name._NAME2</code></td><td>a label</td></tr><tr class="odd"><td><code>_name</code></td><td>a label</td></tr><tr class="even"><td><code>_</code></td><td>a label</td></tr><tr class="odd"><td><code>.name</code></td><td>not a label, because is starts with a dot</td></tr><tr class="even"><td><code>1</code></td><td>not a label, because it starts with a number</td></tr><tr class="odd"><td><code>1st_name</code></td><td>not a label, for the same reason</td></tr><tr class="even"><td><code>name1 name2</code></td><td>not a label, because it contains a space</td></tr><tr class="odd"><td><code>mov</code></td><td>not a label, because “<code>mov</code>” is an instruction mnemonic</td></tr></tbody></table><div class="alert"><strong>NOTE</strong>: Labels starting with dot have special meaning in<br />
FASM, which you will learn later.</div>
<div class="alert alert-info"><p>term:<br />
<strong>label</strong></p><p>A placeholder for some address; ie: a placeholder for some number, because an address is a number.</p>In FASM you can use a label the same way as any other number (not really, but it doesn’t really matter for you right now).</div><p>You can define labels using the “<code>label</code>” directive. This directive should be followed by the label itself (the label name). For example:</p><table><thead><tr class="header"><th>DIRECTIVE</th><th>LABEL STATUS</th></tr></thead><tbody><tr class="odd"><td><code>label name</code></td><td>a label definition, it defines label “<code>name</code>”</td></tr><tr class="even"><td><code>label _name</code></td><td>a label definition, it defines label “<code>_name</code>”</td></tr><tr class="odd"><td><code>label label</code></td><td>not a label definition, because a label can’t be named “<code>label</code>”</td></tr></tbody></table><p>This directive defines a label that will then represent the address of the data defined behind it.</p><div class="alert alert-info">directive: <strong>label</strong></div>
<div class="alert alert-info"><p><strong>label definition</strong></p>The <strong>label</strong> directive followed by a label-name.</div><p>A shorter way to define a label is to write just the label name followed by colon (<code>:</code>), like this:</p><pre class="fasm"><code>name:
_name:</code></pre><p>But we won’t be using this shorter way right now, in our examples.</p><h2 id="defining-variables">3.2. Defining Variables</h2><p>Now let’s go back to the problem with variables: how to define a variable in memory. The program you create (a compiled program, in machine code) is loaded to memory at execution time, where the processor executes it, instruction by instruction. Look at this program:</p><pre class="fasm"><code>org 256
mov al,10
db 'this is a string'
int 20h</code></pre><p>This program will probably crash, because after the processor executes <code>mov al,10</code> it reaches a string. Inside a program there is no difference between strings and instructions in machine code. Both are translated into an array of numeric values (bytes). There is no way the processor can distinguish whether a numeric value is the translation of a string or the translation of an instruction. In this example, the processor will execute instructions whose numeric representation (in machine code) is the same as the ASCII representation of the string “<code>this is a string</code>”. Now look at this:</p><pre class="fasm"><code>org 256
mov al,10
int 20h
db 'this is a string'</code></pre><p>This program will not crash, because before reaching the bytes defined by the string the processor reaches instruction <code>int 20h</code>, which ends the program’s execution. Therefore the bytes defined with a string will not be executed, they will just take up some space. This is how you can define a variable — define some data in a place where the processor won’t try to execute it (beyond <code>int 20h</code>, in our case).</p><p>Here is a code with a byte-sized variable of value 105:</p><pre class="fasm"><code>org 256
mov al,10
int 20h
db 105</code></pre><p>The last line defines a byte variable containing 105.</p><p>Now, how can we access a variable? First, we must know the address of the variable. For this purpose we can use a label (described above, re-read it if you have forgotten):</p><pre class="fasm"><code>org 256
mov al,10
int 20h
label my_first_variable
db 105</code></pre><p>So we already know the address of variable: it’s represented by the label <code>my_first_variable</code>. Now, how do we access it? You might think that you could do this:</p><pre class="fasm"><code>mov al,my_first_variable</code></pre><p>but you can’t! Remember, I told that a label (<code>my_first_variable</code> in our case) stands for the address of the variable. So this instruction will move the address of the variable to the <code>al</code> register, not the variable’s contents. To access the contents of a variable (or the contents of any memory location) you must enclose its address in square brackets (<code>[</code> and <code>]</code>). Therefore, to access the contents of our variable, and copy it’s value to <code>al</code>, we use:</p><pre class="fasm"><code>mov al,[my_first_variable]</code></pre><p>Now we will define two variables:</p><pre class="fasm"><code>org 256
<some instructions>
int 20h
label variable1
db 100
label variable2
db 200</code></pre><p>To copy the value of <code>variable1</code> to <code>al</code> we use:</p><pre class="fasm"><code>mov al,[variable1]</code></pre><p>To copy <code>al</code> to <code>variable1</code> use</p><pre class="fasm"><code>mov [variable1],al</code></pre><p>To set the value of <code>variable1</code> (exactly: to set the value of a variable which is stored at the address represented by <code>variable1</code>) to <code>10</code> we could try:</p><pre class="fasm"><code>mov [variable1],10</code></pre><p>but this will cause an error (try it yourself if you wish). The problem here is that you know that you are changing the variable at address <code>variable1</code> to <code>10</code>. But what is the size of the variable? In the previous two cases a byte-size could be determined because you used the <code>al</code> register which is byte sized, so the compiler decided that the variable at <code>variable1</code> is byte sized too, because you can’t move between operands with different sizes. But in this case, value 10 could be of any size, so it can’t decide the memory size of the variable. To solve this we use “<strong>size operators</strong>”. We will talk about two size operators for now: <code>byte</code> and <code>word</code>. You can put the size operator before the instruction operand when accessing it, to let the compiler know what the variable size is:</p><pre class="fasm"><code>mov byte [variable1],10</code></pre><p>Another way to do it is:</p><pre class="fasm"><code>mov [variable1], byte 10</code></pre><p>in this case the compiler knows that value <code>10</code> being moved is byte sized, so it decides that the variable is byte-sized too (because we can move a byte sized value only to a byte sized variable).</p><p>But it would be hard to always remember and always write the size of a variable when you access it. For this reason you can assign the size of the variable to its label when you define it. Just write the size operator after the label’s name in its definition:</p><pre class="fasm"><code>label variable1 byte
db 100</code></pre><p>or</p><pre class="fasm"><code>label variable1 word
dw 1000</code></pre><p>now, every time you use <code>[variable1]</code> it will have the same meaning as <code>byte [variable1]</code> (or <code>word [variable1]</code> in the second example). So <code>mov [variable1],10</code> will work — in the first case, it will store value <code>10</code> into the byte at address <code>variable1</code>; in the second case, it will store it into a word.</p><div class="alert alert-info"><strong>size operator</strong></div>
<div class="alert alert-warn"><p><strong>NOTE:</strong> <mark>You can’t move values between variables of different size</mark>:</p><pre class="fasm"><code>mov byte [variable1], word 10</code></pre><p>or</p><pre class="fasm"><code>mov [variable1],al
...
label variable1 word
dw 0</code></pre></div>
<div class="alert alert-warn"><p><strong>NOTE:</strong> <mark>You can’t access two memory locations in one instruction</mark> (except for some special instructions). This is wrong, and it won’t compile:</p><pre class="fasm"><code>mov [variable1],[variable2]</code></pre><p>use this instead:</p><pre class="fasm"><code>mov al,[variable2]
mov [variable1],al</code></pre>This will cause you some problems in the beginning, but it will force you to write faster code — which is the main reason for coding in assembly.</div>
<div class="alert alert-warn"><p><strong>NOTE:</strong> The size operator assigned to a label in its definition has lower priority than a size operator within an instruction for accessing a variable; therefore:</p><pre class="fasm"><code>mov byte [variable],10
label variable word
dw 0</code></pre><p>will access a BYTE, while</p><pre class="fasm"><code>mov [variable],10</code></pre>will access a WORD.</div><p>I think you noticed that having two lines to define one variable is too much. There is a shorter way to define variables:</p><pre class="fasm"><code>variable1 db 100</code></pre><p>which is the same as</p><pre class="fasm"><code>label variable1 byte
db 100</code></pre><p>notice that size of variable is defined too. In general, if data definiton (using <code>db</code> or <code>dw</code> directive) is preceded by a label, then it will define this label too, and assign to the label the same size of the defined data. It can be used with words too:</p><pre class="fasm"><code>variable2 dw 100</code></pre><p>An example of variables usage:</p><pre class="fasm"><code>mov ah,2
mov dl,[character_to_write]
int 21h
int 20h
character_to_write db 'a'</code></pre><h2 id="addresses-and-basics-of-segmentation">3.3. Addresses and basics of segmentation</h2><p>Now we will discuss addresses a little further. I told you that an address is number (<strong>!</strong>) which refers to a position in memory. You’ve learnt how to represent this number with labels, so that their numeric addresses are managed by the compiler. But you still don’t know anything about the format of this number. I will try to explain it a little in this chapter.</p><p>As you probably know, data in memory are stored in “<strong>bits</strong>” which can have value <code>0</code> or <code>1</code>. You can consider memory as a (one dimensional) array of bits. 8 consecutive bits make one <strong>byte</strong>. An address is the number (index, position in array) of a byte. For example address “<code>0</code>” is the address of the first bit of memory (or address of the first byte), address “<code>1</code>” is the address of the eighth bit (or address of the second byte) of memory, etc. The easiest way to comprehend it is to think of memory as a (one dimensional) array of bytes</p><p>Addresses in <code>.COM</code> files are word-sized numbers, so</p><pre class="fasm"><code>label var1
<some data>
mov al,var1</code></pre><p>is wrong. It may work if <code>var1</code> is less than 256 (so it fits into a byte sized register), but as a general rule store addresses in word-sized variables — we’ll talk about them later on.</p><p>Now, some addresses examples. Check this file:</p><pre class="fasm"><code>label variable1
db 10
label variable2
db 20
label variable3
db 30</code></pre><p>here the address represented by <code>variable1</code> is <code>0</code>, whereas <code>variable2</code> stands for <code>1</code>, and <code>variable3</code> for <code>2</code>.</p><p>OK, this looks nice except that it is’nt true, at all! The problem is that usually there are multiple programs loaded in memory at the same time (operating system, mouse driver, you program, etc.). In this context the program would have to know WHERE in memory will it be loaded so it can access it’s variables. For this reason addresses are “<strong>relative</strong>”. It means that for each loaded program is reserved a region in memory called “<strong>segment</strong>”. All memory addresses accessed by this program are going to be relative to the begginning of this region. So <code>[0]</code> doesn’t mean the first byte of memory, but the first byte of the segment.</p><div class="alert alert-info"><p><strong>segment</strong></p>A consecutive region of memory reserved for a program.</div><p>How does this work? The processor has a few special registers (segment registers) which hold the address of the segment (ie: the address of the first byte of the segment). Every time you access memory in your program the content of this segment register is added to the address you provided; therfore <code>mov al,[0]</code> accesses the first byte of your segment.</p><div class="alert"><strong>NOTE:</strong> I have told that memory addresses in <code>.COM</code> programs are words. That means they can be in the range 0 to 65535. Therefore the maximal size of a segment is 65536 bytes. This can be “tricked” by changing the content of segment registers, but don’t bother with this right now.</div>
<div class="alert"><strong>NOTE:</strong> A segment is a region in memory. But the term “segment” is often used to indicate the beginning address of this region. Sad but true.</div><p>So an absolute address in memory has two parts: segment (exactly: the address of the segment’s beggining) and, as second part, a word-sized value called “<strong>offset</strong>” which is the address relative to the segment (ie: address of segment’s beginning).</p><div class="alert alert-info"><p><strong>offset</strong></p>An address relative to a segment, or address “inside” a segment.<br />
(the first definition is more exact, but the second is easier to comprehend)</div>
<div class="alert alert-warn"><strong>IMPORTANT:</strong> I stated that labels represent the address of a variable. As a matter of fact, labels in FASM represent the offset of a variable. This is why FASM is called “<strong>flat</strong>” assembler — you’ll understand this later on (much much later <strong>:)</strong>).</div><p>I won’t get deeper into segment registers — on how a segment’s beginning address is stored in them (there IS difference). Right now, take segment registers as some kind of black box that works even if we ignore how.</p><h2 id="the-org-directive-explained">3.4. The ‘org’ directive explained</h2><p>As your program is loaded, it often needs some external info from the program that launched it. The best example is command line arguments; or it may need know WHO launched it; etc. This value must, of course, be stored in the same segment of the program. In <code>.COM</code> files this data (passed to your program by the program that you launched it from) is stored in the first 256 bytes of the segment. Therefore, your program is loaded from offset 256 onward.</p><div class="alert"><strong>NOTE:</strong> The 256-bytes structure at the beginning of a <code>.COM</code> file is called “<strong>PSP</strong>”, which stands for “<strong>program segment prefix</strong>”</div><p>Now imagine this <code>.COM</code> program:</p><pre class="fasm"><code>mov al,[variable1]
int 20h
variable1 db 0</code></pre><p>(notice: no <code>org 256</code> directive). Instruction <code>mov al, [variable1]</code> takes up 3 bytes, <code>int 20h</code> takes up 2 bytes, therefore <code>variable1</code> will stand for offset 5. Therefore instruction <code>mov al,[variable1]</code> is <code>mov al,[5]</code>. So this instruction accesses the 6th byte of the segment (first byte is at offset 0). But I already told you that the first 256 bytes of the segment store some informations, and that your program is loaded beyond them, from offset 256 onward. So you don’t want <code>variable1</code> to be 5, you want it to be 256+5. And this is what the <code>org</code> directive does: It sets the “<strong>origin</strong>” of the file’s addresses. <code>org 256</code> tells FASM to add 256 to the offset held by every label defined beyond this directive (before the next <code>org</code> directive). And this is exactly what we want in <code>.COM</code> files.</p><p>Therefore the previous code example won’t access the variable you want, it will access something in PSP (first 256 bytes of segment). To make it work properly use:</p><pre class="fasm"><code>org 256
mov al,[variable1]
int 20h
variable1 db 0</code></pre><div class="alert"><strong>NOTE:</strong> <code>org</code> affects labels at defintion-time (for example at <code>label variable byte</code> or <code>variable db 0</code>), not when they are used (like at <code>mov ax,[variable]</code>). That means, that if you change addresses’ “origin” via the <code>org</code> directive after defining some label, that label will still hold the same value before and beyond the <code>org</code> directive.</div><p>I won’t tell you anything about the data contained in the PSP, you dont have to worry about it for now.</p><h1 id="endian-encodings-and-word-registers">4. Endian Encodings and Word Registers</h1><h2 id="endian-encodings">4.1. Endian encodings</h2><p>We should already have a precise idea about byte variables. You already know they are 8 bit wide (not so important now) and that they can contain a numeric value ranging from <strong>0</strong> to <strong>255</strong>. Regarding word variables, you know that they are 16 bits wide and they contain a value ranging from 0 to 65535.</p><!-- NOTE: THIS PART WAS REVISED TO MAKE IT CLEARER --><p>Whether you can see it or not, a word has the same size as two bytes. Now let’s deal with how values are stored in two bytes. Both bytes can contain a value ranging from <strong>0</strong> to <strong>255</strong>. From their combination we get <strong>256*256</strong>, that is <strong>65536</strong>. But how is this value actually stored in two bytes?</p><p>Let’s say one of the bytes (<strong>byte#1</strong>) holds value 0. The other byte (<strong>byte#2</strong>) can hold a value from 0 to 255. In this case we can store numbers ranging from 0 to 255 in our word. Now let’s suppose that <strong>byte#1</strong> holds 1; we can store in the other byte a value 0-255, which gives us numbers 256 to 511. When <strong>byte#1</strong> contains 2, we can store 256 other possible values in the other byte, which gives us numbers 512 to 767; and so on. In total, we have 256*256 combinations which, as I said, amounts to 65536.</p><p>It is like with decimal numbers: every digit is a value 0 to 9, and the “true” value of a digit depends on it’s position. The last digit holds value 0 to 9, the previous one holds 10*(0 to 9), the next one 100*(0 to 9), and so on.</p><p>It’s the same with words: One of the two bytes hold value 0 to 255, the other one holds value 256*(0 to 255). The byte holding 0..255 is called “<strong>low order byte</strong>”, the other one (holding 256*(0..255)) is called “<strong>high order byte</strong>”.</p><div class="alert alert-info">terms: <strong>low order byte</strong>, <strong>high order byte</strong></div><p>Examples (word value = high order byte : low order byte)</p><pre class="nohighlight"><code> WORD | HOB | LOB |
-------------------
0 = 0 : 0
1 = 0 : 1
255 = 0 : 255
256 = 1 : 0
257 = 1 : 1
511 = 1 : 255
512 = 2 : 0
513 = 2 : 1 ( 513 / 256 = 2 | 513 mod 256 = 1 )
65535 = 255 : 255 ( 65535 / 256 = 255 | 65535 mod 256 = 255 )</code></pre><p>On last problem remains: The order of these bytes. (ie: which comes first, low order byte or high order byte?). This is handled differently on different computers. On IBM PCs (and compatible) low order byte comes first, and high order byte second. For example, with:</p><pre class="fasm"><code>label variable
dw 0</code></pre><p>then <code>byte [variable]</code> is the low order byte, and <code>byte [variable + 1]</code> is the high order byte. (The <code>+ 1</code> addition to <code>variable</code>’s offset is carried out by the compiler, the value of <code>variable</code> is constant, so <code>variable + 1</code> is constant as well. It means the next byte beyond <code>variable</code>’s offset. I think this should be clear enough to need no further explaination).</p><div class="alert"><strong>NOTE:</strong> When the low order byte comes first, then it’s called “<strong>little endian encoding</strong>”; when it’s the high order byte that comes first, then it’s called “<strong>big endian encoding</strong>”. But these terms are not important, especially for a beginner ASM coder.</div><h2 id="word-registers">4.2. Word registers</h2><p>Beside the byte registers (like <code>al</code>,<code>ah</code>, <code>dl</code>…) the processor has also some word registrs, of course. As you know, a word is combination of two bytes, and it’s the same with registers. Word registers are a combination of byte registers. The first word registers we’ll learn are <code>ax</code>, <code>bx</code>, <code>cx</code> and <code>dx</code>.</p><p><code>ax</code> is the combination of <code>al</code> and <code>ah</code>, where <code>al</code> is the low order byte, and <code>ah</code> the high order byte. The same goes for the rest: <code>bx</code> = <code>bh:bl</code>, <code>cx</code> = <code>ch:cl</code>, <code>dx</code> = <code>dh:dl</code>.</p><p>If you were to “<strong>emulate</strong>” register <code>ex</code> in memory it would be:</p><pre class="fasm"><code>label ex word
el db 0
eh db 0</code></pre><p><code>el</code> would be the low order byte, so it comes first.</p><div class="alert alert-info"><p>terms: <strong>word register</strong></p>word registers: <code>ax</code>, <code>bx</code>, <code>cx</code>, <code>dx</code></div>
<div class="alert"><strong>NOTE:</strong> The letters <code>a</code>, <code>b</code>, <code>c</code> and <code>d</code> stand for “<strong>accumulator</strong>”, “<strong>base</strong>”, “<strong>counter</strong>” and “<strong>data</strong>”, it has nothing to do with alphabetical order. The real order of these registers is <code>ax</code>, <code>cx</code>, <code>dx</code>, <code>bx</code>; but it is not important until you want to generate/change machine code yourself.</div><p>Now, if you want to set the value in register <code>ax</code> to 52 you use:</p><pre class="fasm"><code>mov ax,52</code></pre><p>but you also could use:</p><pre class="fasm"><code>mov al,52
mov ah,0</code></pre><p>To set <code>dx</code> to 12345:</p><pre class="fasm"><code>mov dx,12345</code></pre><p>but it could also be done (no reason to do it this way in real coding, this is just to demonstrate word to byte:byte relations):</p><pre class="fasm"><code>mov dh,48
mov dl,57</code></pre><p>because 48 is equal to “12345 / 256”, and 57 is “12345 modulo 57” (modulo is the remainder after division).</p><div class="alert"><p><strong>NOTE:</strong> You know that the instruction operand can be a number (numeric constant), like “<code>0</code>”, “<code>256</code>”, “<code>12345</code>” etc. But every assembler I know of allows you to place an expression as operand. During compilation, the value of the expression is evaluated and the expression is “replaced” by it’s result. So <code>mov dx,(1 + 5)</code> is same as <code>mov dx,6</code>. Therefore the previous code example could be better written as</p><pre class="fasm"><code>mov dh,12345/ 256
mov dl,12345 mod 256</code></pre>(“<code>/</code>” is the division operator, <code>mod</code> (modulo) is the operator which returns the remainder of a division. You don’t have to know these operators right now, but you should already know something about expressions).</div><p>The processor has also other word registers: <code>sp</code>, <code>bp</code>, <code>si</code>, <code>di</code>. But you can’t access directly the byte parts of these registers, you must access the whole word. This is a limitation of the processor, so there’s nothing you can do about it. For example, if you want set the high order byte of <code>si</code> to 17 you must do it this way:</p><pre class="fasm"><code>mov ax,si
mov ah,17
mov si,ax</code></pre><p>So first you copy the value of <code>si</code> to <code>ax</code>. The high order byte of <code>ax</code> can be accessed dirctly (it’s the <code>ah</code> register), so you set it to <code>17</code>. The low order byte of the word remains unchanged. Then you copy back the value from <code>ax</code> to <code>si</code>. Now the word’s high order byte has been changed to <code>17</code>, while its low order byte remains unchanged.</p><div class="alert"><strong>NOTE:</strong> Register <code>sp</code> always has a special function; <code>bp</code> usually has a special function (in code generated by most (all?) non-assembly compilers). Registers <code>si</code> and <code>di</code> can be used whenever you want. This means you shouldn’t change <code>sp</code> and <code>bp</code> unless you know what you are doing.</div><h2 id="string-output-using-int-21hah9">4.3. String output using int 21h/ah=9</h2><p>This should belong to <a href="#labels-addresses-and-variables"><strong>Chapter 3</strong></a>, about addresses, but you need to know the <code>dx</code> register which is explained here.</p><p>Here we will talk about another usage of <code>int 21h</code>. You should already know that when <code>ah</code> contains 2 then <code>int 21h</code> prints the character stored in <code>dl</code>. But if we wanted to display a long text we would have to set <code>dl</code> for every char, and this would be a bad method. Wouldn’t it be better if we just stored the string we want to display somewhere in a file (like we did in <a href="#getting-started"><strong>Chapter 1</strong></a>) and then just display it from here?</p><p>For this we can use <code>int 21h</code> with value <code>9</code> in <code>ah</code> and the string’s address in the <code>dx</code> register. Something like:</p><pre class="fasm"><code>mov ah,9
mov dx,address_of_string
int 21h</code></pre><p>But another problems pops up: how to determine the length of the string, ie: the number of characters to display from the given address. There are different methods to achieve this, we will talk about the simplest one, the one used by <code>int 21h/ah=9</code>. It relies on a special character, which is reserved as end-of-string marker. With <code>int 21h/ah=9</code>, it’s the “<code>$</code>” character. So, to store the string “<code>Hello World</code>”, you define “<code>Hello World$</code>”, where “<code>$</code>” means end of string. Example of displaying a string:</p><pre class="fasm"><code>org 256
mov ah,9
mov dx,text_to_display
int 21h
int 20h
label text_to_display
db 'Hello World$'</code></pre><p>This program will print “<code>Hello World</code>”.</p><p>This method of marking the end of a string has a limitation: you can’t print the “<code>$</code>” character. For example:</p><pre class="fasm"><code>org 256
mov ah,9
mov dx,text_to_display
int 21h
int 20h
label text_to_display
db 'It costed 50$, maybe more$'</code></pre><p>will of course print only “<code>It costed 50</code>”. This can be worked around this way:</p><pre class="fasm"><code>org 256
mov ah,9
mov dx,text1
int 21h
mov ah,2
mov dl,'$'
int 21h
mov ah,9
mov dx,text2
int 21h
int 20h
label text1
db 'It costed 50$'
label text2
db ', maybe more$'</code></pre><p>The first part (first <code>int 21h</code>) will print “<code>It costed 50</code>”, then <code>int 21h/ah=2</code>, will print “<code>$</code>” and the second <code>int 21h/ah=9</code> will print “<code>, maybe more</code>”. We won’t deal any further with this limitation, for now — this was just to improve on the explanation.</p><p>Let’s now take a closer look at <code>int 21h/ah=9</code>. As you maight have realized already, it will print every character (exactly: every character whose ASCII code is in byte form) from the address contained in <code>dx</code> until the first “<code>$</code>” character after the address in <code>dx</code>.</p><div class="alert"><p><strong>NOTE:</strong> ASCII codes 0 to 31 (I think) have a special meaning for <code>int 21h/ah=9</code>. These codes have characters assigned to them (smiling faces, diamonds etc.) but <code>int 21h/ah=9</code> doesn’t print them, but does something else. For example, the character with ASCII code 7 will produce a short beep. Try this:</p><pre class="fasm"><code>org 256
mov ah,9
mov dx,text
int 21h
int 20h
label text
db 'Beep',7,'$'</code></pre>It should print “<code>Beep</code>” and then beep.</div><p>Other common values are 10 and 13: <code>10</code> causes the cursor to return to the first column of the current row; <code>13</code> causes cursor to down move one row (if them bottom of screen is reached, then the screen is scrolled). So a combination of these two causes the cursor to move to the first column of the next row. These two should (but don’t always do) work in any order, but you should always place <code>13</code> first. These two characters are often called EOL (end of line). Try this example:</p><pre class="fasm"><code>org 256
mov ah,9
mov dx,text
int 21h
int 20h
label text
db 'Line 1',13,10,'Line 2$'</code></pre><p>it should print:</p><pre class="nohighlight"><code>Line 1
Line 2</code></pre><div class="alert"><strong>NOTE:</strong> ASCII code 13 is called “CR” (carriage return) and code 10 is called “LF” (line feed).</div><p>Another example on addresses (previous chapter), but with word registers. Check by yourself whether you understood <a href="#labels-addresses-and-variables"><strong>Chapter 3</strong></a>:</p><pre class="fasm"><code>org 256
mov ah,9
mov dx,[address_of_text]
int 21h
text db 'Hello World$'
address_of_text dw text</code></pre><p>Here we load the <code>dx</code> register with the contents of <code>address_of_text</code> variable, which holds value <code>text</code>, and (as we already know) <code>text</code> is a placeholder for the offset of ‘<code>Hello World$</code>’ string. Thus the word-sized variable <code>address_of_text</code> holds the offset of that string. Therefore, loading <code>dx</code> with the contents of <code>address_of_text</code> will load it with the offset of the string we want to print. I hope you got it.</p><h1 id="jumps-and-branching">5. Jumps and Branching</h1><p>You should know a little about how instructions are processed by the processor. It fetches an instruction in machine code, executes it, and then moves to next instruction. This is repeated until instruction <code>int 20h</code> is reached. In this chapter we will learn something about instructions which change this behaviour.</p><h2 id="instruction-pointer">5.1. Instruction pointer</h2><p>The processor loads the first instruction (it determines the number of bytes the instrution consists of), executes it and then moves to another instruction. But how does this mechanism works? The processor has a special word register “<code>ip</code>” which holds the address of the instruction currently executed. After the instruction is executed, the processor adds its size to “<code>ip</code>” and executes the instruction located at the (new) address in “<code>ip</code>”. This mechanism works like this:</p><ul><li>Loop:<ul><li>Execute instruction on <code>ip</code></li><li><code>size</code> = size of instruction on <code>ip</code></li><li><code>ip</code> = <code>ip</code> + <code>size</code></li></ul></li><li>Until instruction <code>int 20h</code> is found</li></ul><div class="alert"><strong>NOTE:</strong> As with others pointers, “<code>ip</code>” doesn’t hold the full address of the instruction, just the offset part. Be we shouldn’t worry about this right now.</div>
<div class="alert"><strong>NOTE:</strong> “<code>ip</code>” stands for “<strong>instruction pointer</strong>”.</div><h2 id="jumps">5.2. Jumps</h2><p>The <code>ip</code> register is not like the other registers (<code>ax</code>, <code>ah</code>, <code>bp</code>, …). It’s contents can’t be changed using the <code>mov</code> instruction. <code>mov ip,5</code> doesn’t work. But there is a special instruction which can change the value of <code>ip</code> register: it’s the <code>jmp</code> instruction (“<code>jmp</code>” = “<strong>jump</strong>”). This instruction has one operand, the new address for <code>ip</code> register. So <code>jmp 5</code> has an effect like <code>mov ip,5</code> would if it was an instruction. Example:</p><pre class="fasm"><code>org 256
jmp Start
text db 'Text to output'
Start:
mov ah,9
mov dx,text
int 21h
int 20h</code></pre><p>The first instruction sets the value of <code>ip</code> to the address of <code>mov ah,9</code> instruction (its address is held in label <code>Start</code>). Thus the processor won’t try to execute the bytes defined by “<code>Text to output</code>” string and this program will work.</p><div class="alert"><strong>NOTE:</strong> Of course, when <code>ip</code> is changed by the <code>jmp</code> instruction, then the size of this instruction is NOT added to it.</div><h2 id="comparing-and-conditonal-jumps">5.3. Comparing and conditonal jumps</h2><p>If you can write code in any language you should already know about branching, ie: conditional execution of some parts of code. For example, suppose you want a value not greater than 10 in <code>al</code>. If the value in <code>al</code> register is > 10, you will set <code>al</code> to 10. This is branching — if some condition is true then something is executed, otherwise it is not executed. Assembly implementation of this mechanism is that when a condition is false you will jump over the conditional code, when the condition is true you will just continue the execution. It is as if this C code:</p><pre class="c"><code>if (condition)
ConditionalCode(); // this can be any C code, not just function call</code></pre><p>would be writen this way:</p><pre class="c"><code>if (!condition) goto LabelAfterConditionalCode;
ConditionalCode(); // this can be any C code, not just function call
LabelAfterConditionalCode:</code></pre><p>The first problem is how to decide whether a condition is true. In assembly, there is an instruction which can compare two operands. It is <code>cmp</code>. Its operands follow the same rules as <code>mov</code>’s operands (almost every instruction follows these or very similar rules). Some examples of comparing:</p><pre class="fasm"><code>cmp ax,bx ; compare value of "ax" to value of "bx"
cmp al,byte [SomeLabel] ; comapre value of "al" to byte at SomeLabel
cmp ax,5 ; compare value of "ax" to 5
cmp ax,al ; wrong, operands have different size</code></pre><p>This instructions checks whether the first operand is the same, greater or lesser compared to the second one.</p><div class="alert alert-info">instruction: <strong>cmp</strong></div><p>OK, we can compare, but how are the results of comparison stored? The CPU (the processor) has a special register called “<strong>flags register</strong>” in which it stores results of comparison (and some other things too). This register (like <code>ip</code>) can’t be accessed with <code>mov</code> or similar instructions; its value is set by the <code>cmp</code> instruction. Right now you don’t have to bother HOW the result of comparison is stored in this register — you would need to understand bit arithmetics for that.</p><div class="alert alert-info">register: <strong>flags</strong></div><p>OK, we can compare, and we know that the result is stored in <code>flags</code>. The only thing we need now is the conditional jump itself. A conditional jump is a jump which is taken only when a condition you specified is true (in the flags register). It will be best explained on example. We compare <code>ax</code> to <code>bx</code> (<code>cmp ax,bx</code>). A conditional jump can jump if <code>ax</code> < <code>bx</code>, or when <code>ax</code> = <code>bx</code>, or when <code>ax</code> >= <code>bx</code> etc. These jumps are (op1 is first operand of <code>cmp</code>, op2 is second):</p><ul><li><code>je</code> — jump if op1 = op2 (op1 is “<strong>equal to</strong>” op2)</li><li><code>ja</code> — jump if op1 > op2 (op1 is “<strong>greater than</strong>” op2)</li><li><code>jb</code> — jump if op1 < op2 (op1 is “<strong>less than</strong>” op2)</li><li><code>jae</code> — jump if op1 >= op2 (op1 is “<strong>greater than or equal to</strong>” op2)</li><li><code>jbe</code> — jump if op1 <= op2 (op1 is “<strong>less than or equal to</strong>” op2)</li></ul><p>Example code (don’t try to compile it, it is not a <code>.COM</code> executable, it’s just a snippet of code):</p><pre class="fasm"><code>cmp ax,10
jbe AX_lesser_than_10
mov ax,10
AX_lesser_than_10:</code></pre><p>this piece of code will check whether value in <code>ax</code> is less than or equal to 10, and if not (if the value in <code>ax</code> is greater than 10) it will set <code>ax</code> to 10. The corresponding C code is:</p><pre class="c"><code>if (ax > 10) ax=10;</code></pre><p>or, more similar to our assembly version:</p><pre class="c"><code>if (ax <= 10) goto AX_lesser_than_10
ax=10;
AX_lesser_than_10:</code></pre><p>Another example: get maximum of <code>{ax,bx}</code> and store it in <code>ax</code>:</p><pre class="fasm"><code>cmp ax,bx
jae AX_already_contains_greater_value
mov ax,bx
AX_already_contains_greater_value:</code></pre><p>So compare <code>ax</code> to <code>bx</code>, if it is greater or equal then it already contains the greater value, so we dont need to change anything. If <code>ax</code> is less than <code>bx</code> then we must move <code>bx</code>‘s’ value (=greater value) to <code>ax</code>.</p><p>A more complicated version: store maximum of <code>{ax,bx}</code> in <code>cx</code>:</p><pre class="fasm"><code>cmp ax,bx
ja AX_bigger
mov cx,bx
jmp done
AX_bigger:
mov cx,ax
done:</code></pre><p>here we compare <code>ax</code> to <code>bx</code>, then if <code>ax</code> is less than <code>bx</code> the jump won’t take place and we continue with <code>mov cx,bx</code>, which stores the greater value in <code>cx</code>, as desired, and then <code>jmp done</code> skips the instructions used in case <code>ax</code> is greater. Otherwise, if <code>ax</code> is greater than <code>bx</code>, then <code>jmp AX_bigger</code> takes place, so the next instruction is <code>mov cx,ax</code> which moves the greater value (ie: the one in <code>ax</code>) to <code>cx</code>. As you can see, the code was divided into two “branches”: one for <code>ax</code>><code>bx</code>, the other for <code>ax</code><=<code>bx</code>. Finally, both branches reach the instruction beyond <code>done:</code>, and at this point <code>cx</code> always holds the greater value. By the way, there could be <code>jae</code> instead of <code>ja</code>, because for the case when <code>ax</code>=<code>bx</code> both branches have the same effect.</p><div class="alert alert-info">instructions: <strong>je, ja, jb, jae, jbe</strong></div><p>But what can we do if we want to jump when operands are NOT equal? We could do something like this:</p><pre class="fasm"><code>cmp ax,bx
je Same
jmp NotSame
Same:
...
NotSame:</code></pre><p>but this is not needed because there are instructions which jump when the condition is false. These are <code>jne</code>, <code>jna</code>, <code>jnb</code>, <code>jnae</code> and <code>jnbe</code>. Instruction <code>jne</code> jumps when operands are not equal, <code>jna</code> when first operand is not greater than second operand, etc. Therefore:</p><pre class="fasm"><code>cmp ax,bx
jne NotSame:
...
NotSame:</code></pre><p>where the <code>...</code> part is executed only if the value in <code>ax</code> is equal to the value in <code>bx</code>.</p><div class="alert"><strong>NOTE:</strong> <code>jna</code> is the same as as <code>jbe</code>, <code>jnb</code> is the same as as <code>jae</code>, <code>jb</code> is the same as as <code>jnae</code>, and <code>ja</code> is the same as as <code>jnbe</code>.</div>
<div class="alert alert-info">instructions: <strong>jne, jna, jnb, jnae, jnbe</strong></div>
<div class="alert alert-warn"><strong>IMPORTANT:</strong> Many instruction change the <code>flags</code> register, not just <code>cmp</code>, so conditional jumps should come right after <code>cmp</code>, with no instructions between them.</div><h1 id="bit-arithmetics">6. Bit Arithmetics</h1><p>This is what most tutorials usually start with. After reading this you will be confused, it’s normal. You’ll master this through practice. Return to this chapter whenever needed. So let’s go.</p><h2 id="encoding-numbers-in-bits">6.1. Encoding numbers in bits</h2><p>You know that computers use “<strong>bits</strong>”, which are variables that can contain one of two possible values: <code>0</code> or <code>1</code>. When a bit’s value is <code>0</code>, we say that it’s “<strong>clear</strong>”, when it’s <code>1</code>, we say that it’s “<strong>set</strong>”</p><div class="alert alert-info"><p>terms:</p><p><strong>bit</strong> — A variable containing <code>0</code> or <code>1</code>.</p><p><strong>clear bit</strong> — A bit containing <code>0</code>.</p><strong>set bit</strong> — A bit containing <code>1</code>.</div><p>Now, how can we store a number in these bits? It’s similar to storing a word in two bytes (<a href="#word-registers"><strong>Chapter 4.2</strong></a>, re-read it). One bit contains a <code>0</code> or <code>1</code> value, therefore a number that consists in just one bit can only contain values 0 and 1. When we add another bit, we can still store 0 and 1 in the first bit, but we have another bit which now can hold 2*(0 or 1). A further bit holds 4*(0 or 1), and then 8*(0 or 1), etc.</p><p>Like I said before, a byte consists of 8 bits. So it can hold a value of:</p><pre class="nohighlight"><code>1*(0 or 1) + 2*(0 or 1) + 4*(0 or 1) + 8*(0 or 1) + 16*(0 or 1) + 32*(0 or 1) + 64*(0 or 1) + 128*(0 or 1)</code></pre><p>which is value ranging from 0 (when all bits are <code>0</code>) to <code>1+2+4+8+16+32+64+128</code> = 255 (when all bits are <code>1</code>). Can you see it?</p><p>It is similar with a word, except you have 16 bits instead of 8; check it yourself if you wish.</p><p>Now some terms: the bit which holds 1*(0 or 1) is <strong>bit#0</strong>; the next one, which holds 2*(0 or 1) is <strong>bit#1</strong>; and so on until <strong>bit#7</strong>, which holds 128*(0 or 1). So bits are enumerated starting from 0, not from 1 — as you would maybe exepect. Bit#0 is called the “<strong>low order bit</strong>”, the highest bit (which holds the greatest value) is the “<strong>high order bit</strong>”. For example, the high order bit of a byte value is bit#7, and the high order bit of a word value is bit#15.</p><div class="alert alert-info">terms: <strong>low-order bit</strong>, <strong>high-order bit</strong></div>
<div class="alert alert-warn"><strong>IMPORTANT</strong>: Bits are enumerated starting from 0, not from 1, so the first bit is bit#0.</div><p>A number encoded this way (in bits) is called a “<strong>binary number</strong>”.</p><h2 id="binary-constants">6.2. Binary constants</h2><p>You have been using numeric constants before, probably without realizing you were using them. These numeric constants were just numbers you wrote in a source file which was assembled into a binary file. Examples of numeric constants ar: “<code>0</code>”, “<code>50</code>”, “<code>-100</code>”, “<code>123456</code>”.</p><p>You used them here:</p><pre class="fasm"><code>db 5
mov al,20
cmp ax,5
db 'Some string',0
org 256</code></pre><p>These numbers were decimal numbers, the type which is normally used by people. The assembler then translated them to binary form. But sometimes you want to specify numbers directly in binary format. Of course you don’t have to manually translate them to decimal, you can specify them directly in binary. Here are some examples of binary numbers: <code>0</code>, <code>101011</code>, <code>1101011</code>, <code>11111</code>, etc. To distinguish them from decimal numbers, every binary number must end with the “<code>b</code>” character, therefore: “<code>0b</code>”, “<code>101011b</code>”, “<code>1101011b</code>”, “<code>11111b</code>” etc. Here the first binary digit (the first bit, the first <code>0</code> or <code>1</code>) is the high-order bit, and the last one is the low-order bit. So if you write “<code>1101</code>”, then bit#0 = 1, bit#1 = 0, bit#2 =1, bit#3 = 1.</p><p>Example table:</p><table><thead><tr class="header"><th style="text-align: left;">decimal</th><th style="text-align: left;">binary</th></tr></thead><tbody><tr class="odd"><td style="text-align: left;">0</td><td style="text-align: left;"><code>0b</code></td></tr><tr class="even"><td style="text-align: left;">1</td><td style="text-align: left;"><code>1b</code></td></tr><tr class="odd"><td style="text-align: left;">2</td><td style="text-align: left;"><code>10b</code></td></tr><tr class="even"><td style="text-align: left;">3</td><td style="text-align: left;"><code>11b</code></td></tr><tr class="odd"><td style="text-align: left;">4</td><td style="text-align: left;"><code>100b</code></td></tr><tr class="even"><td style="text-align: left;">5</td><td style="text-align: left;"><code>101b</code></td></tr><tr class="odd"><td style="text-align: left;">6</td><td style="text-align: left;"><code>110b</code></td></tr><tr class="even"><td style="text-align: left;">7</td><td style="text-align: left;"><code>111b</code></td></tr><tr class="odd"><td style="text-align: left;">10</td><td style="text-align: left;"><code>1010b</code></td></tr><tr class="even"><td style="text-align: left;">15</td><td style="text-align: left;"><code>1111b</code></td></tr><tr class="odd"><td style="text-align: left;">16</td><td style="text-align: left;"><code>10000b</code></td></tr></tbody></table><p>I could teach you a way to translate numbers between decimal and binary forms, but you won’t need it just now anyway, and plenty of other tutorials are full of such information.</p><p>Binary numeric constants are just another way to express some number. Writing “<code>5</code>” is the same as writing “<code>101b</code>”. For example, this will work too:</p><pre class="fasm"><code>org 100000000b
mov ah,1001b
mov dx,string
int 21h
int 20h
string db 'Hello world writen using binary constants',0</code></pre><p><code>org 100000000b</code> is the same as <code>org 256</code>, and <code>mov ah,1001b</code> is the same as <code>mov ah,9</code></p><h2 id="bit-operations">6.3. Bit operations</h2><p>Now let’s think about what we can do with a bit (which can hold a <code>0</code> or <code>1</code> value).</p><p>First, we can “<strong>set</strong>” it (set its value to <code>1</code>) or “<strong>clear</strong>” it (set its value to <code>0</code>). Then we can “<strong>flip</strong>” its value (from <code>0</code> to <code>1</code>, from <code>1</code> to <code>0</code>). And that is probably all. This operation is also called “<strong>bit complement</strong>” (<code>0</code> is the complement of <code>1</code>, and <code>1</code> is the complement of <code>0</code>).</p><p>Now, what can we do with 2 bits? You can think of bits as boolean values, which can be either true (<code>1</code>) or false (<code>0</code>). Now, what operations can we make with boolean values? If you programmed before you’ll probably know the answer.</p><p>First of all, there is <code>and</code> (like “<code>a and b</code>” where “<code>a</code>” and “<code>b</code>” are boolean values). When we have two boolean values, the result of <code>and</code>ing them is true only when they are both true, otherwise the result is false. (See Table below)</p><p>Then comes <code>or</code>. As you know, the result of <code>or</code>ing two values is true when at least one of them is true. And finally — and less known — there is <code>xor</code>, which means “<strong>exclusive or</strong>” (the previous one was “<strong>inclusive or</strong>”, but everyone calls it just “<strong>or</strong>”). The result of <code>xor</code>ing is 1 when one operand is <code>1</code> and the other is <code>0</code>.</p><p>Here is the Table:</p><table><thead><tr class="header"><th style="text-align: center;">A</th><th style="text-align: center;">B</th><th style="text-align: center;">A and B</th><th style="text-align: center;">A or B</th><th style="text-align: center;">A xor B</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">0</td><td style="text-align: center;">0</td><td style="text-align: center;">0</td><td style="text-align: center;">0</td><td style="text-align: center;">0</td></tr><tr class="even"><td style="text-align: center;">0</td><td style="text-align: center;">1</td><td style="text-align: center;">0</td><td style="text-align: center;">1</td><td style="text-align: center;">1</td></tr><tr class="odd"><td style="text-align: center;">1</td><td style="text-align: center;">0</td><td style="text-align: center;">0</td><td style="text-align: center;">1</td><td style="text-align: center;">1</td></tr><tr class="even"><td style="text-align: center;">1</td><td style="text-align: center;">1</td><td style="text-align: center;">1</td><td style="text-align: center;">1</td><td style="text-align: center;">0</td></tr></tbody></table><div class="alert"><strong>NOTE:</strong> There are 16 possible operations on two bits, but we won’t talk about all of them.</div><p>Now the interesting part: In late times, processors designers didn’t like having lots of instructions on their processors. But as you saw, we defined 3 operations for a single bit and 3 for two bits. So they found a way to achieve operations on single bit by using operations on two bits. Remember, the operations on a single bit were: setting it to <code>0</code>, setting it to <code>1</code> and flipping its value (<code>0</code>-><code>1</code>, and <code>1</code>-><code>0</code>). How?</p><p>First let’s talk about clearing a bit (setting its value to <code>0</code>). Note that the result of <code>and</code> is <code>0</code> whenever at least one of operands is <code>0</code>. So if we <code>and</code> any bit (<code>0</code> or <code>1</code>) with <code>0</code> we always get <code>0</code>, and when we <code>and</code> with <code>1</code> the bit will reamin unchanged. And this is what we wanted. It is similar to setting a bit (to <code>1</code>). The result of <code>or</code>ing is <code>1</code> when at least one operand is <code>1</code>. So <code>or</code>ing any bit with <code>1</code> will always produce <code>1</code>, and <code>or</code>ing with <code>0</code> will leave the bit unchanged.</p><p>How can we flip a bit? The result of <code>xor</code>ing is <code>1</code> when one operand is <code>1</code> and the other is <code>0</code>. So <code>xor</code>ing any value with <code>1</code> will always produce that value’s complement, and <code>xor</code>ing it with <code>0</code> will leave the bit unchanged. This last one is not so obvious, so you better look at it in the Table.</p><h2 id="binary-operations-instructions">6.4. Binary operations instructions</h2><p>First of all, you know the the smallest registers we have are the 8 bits (byte) registers. Also the smallest part of memory that we can access is one byte (8 bits). For this reason, the instructions used for binary operations will operate on two 8-bit numbers instead of on two bits. What will be the result? <strong>Bit#0</strong> of the result will be the result of the operation between <strong>bit#0</strong> of the first argument and <strong>bit#0</strong> of the second argument. <strong>Bit#1</strong> of the result will be the result of the operation on <strong>bits#1</strong> of the arguments, etc. You ’ll see it.</p><p>Our first operation will be an “<code>and</code>”. Example:</p><pre class="fasm"><code>mov al,00010001b
mov bl,00001001b
and al,bl</code></pre><!-- NOTE: EXAMPLE CORRECTED: "bl,00000001b" => "bl,00001001b" --><p>first we load <code>al</code> with <code>00010001b</code>, so it’s <strong>bits #0</strong> and <strong>#4</strong> contain <code>1</code>, the remaining bits contain <code>0</code>. Then we load <code>bl</code> with <code>00000001b</code>, so it’s <strong>bit#0</strong> contains <code>1</code>, the others contain <code>0</code>. When we <code>and</code> <code>al</code> with <code>bl</code> (this is how asm coders usually describe it) it works as <code>al = al and bl</code> would – ie: the result of <code>and</code>ing <code>al</code> with <code>bl</code> is stored in <code>al</code>.</p><p>So what’s the final result (in <code>al</code>)? <strong>Bit#0</strong> of <code>al</code> contained <code>1</code> and was <code>and</code>ed with <code>1</code>. “<code>1 and 1</code>” is <code>1</code>, so <strong>bit#0</strong> in <code>al</code> will be <code>1</code>. <strong>Bits #1</strong> to <strong>#2</strong> and <strong>#5</strong> to <strong>#7</strong> would be “<code>0 and 0</code>” which is <code>0</code>. <strong>Bit#3</strong> would contain “<code>0 and 1</code>” which is <code>0</code> too. <strong>Bit#4</strong> will contain “<code>1 and 0</code>” which is <code>0</code> again. So the result will be <code>00000001b</code>.</p><p>A better way to write the previus code would be:</p><pre class="fasm"><code>mov al,00010001b
and al,00001001b</code></pre><p>(I used <code>bl</code> in the previous example only to simplify referencing the second number in the text).</p><p>Now, an example of <code>or</code>ing:</p><pre class="fasm"><code>mov al,00010001b
or al,00001001b</code></pre><p>… the result will be <code>00011001b</code>. (see: <code>or</code> description, in previous section).</p><p>And of <code>xor</code>ing:</p><pre class="fasm"><code>mov al,00010001b
xor al,00001001b</code></pre><p>… the result will be <code>00011000b</code> — bits <code>xor</code>ed with <code>0</code> will stay unchanged, bits <code>xor</code>ed with <code>1</code> will be flipped (to their complement).</p><div class="alert alert-info">instructions: <strong>and, or, xor</strong></div><p>These instructions take the same arguments as <code>mov</code> — ie: the first argument can be a memory variable or a register, the second one can be a memory variable, a register or a constant. <mark>Both arguments must be of the same size, and only one of the arguments can be a memory variable.</mark></p><h2 id="testing-bits">6.5. Testing bits</h2><p>If you have programmed before, you probably already know about boolean variables (ocassionaly called “logical”). They can hold two values: <code>TRUE</code> or <code>FALSE</code>. You can see that they can be stored in a bit wuite finely — <code>1</code> for <code>TRUE</code>, and <code>0</code> for <code>FALSE</code>.</p><div class="alert alert-info">term: <strong>boolean variable</strong></div><p>The problem here is that the smallest data directly accessible is a byte (8 bits). As you know, you can access a byte register or a byte memory variable, not a bit. It’s truely this way: there are no instruction which can access just one bit. (Of course there are, you just don’t need to know about them right now <strong>:)</strong>)</p><p>But when you work with boolean variables you want to access just one single bit, not all 8 bits or more. There are some tricks to achieve this:</p><p>Use only one bit of the whole byte and leave the other bits cleared. Thus if you want to verify if the bit is <code>0</code>, you just check if the whole byte is equal to <code>0</code>. If it isn’t, then our bit is <code>1</code>. Example:</p><pre class="fasm"><code>cmp [byte_boolean_varaible],0
je byte_boolean_variable_is_false
jnz byte_boolean_varaible_is_true</code></pre><p>where <code>byte_boolean_variable</code> is a byte varaible with only one bit used. When this variable is <code>0</code> then its value is <code>FALSE</code>, otherwise its value is <code>TRUE</code>.</p><p><code>byte_boolean_variable_is_***</code> are labels used for branching, as shown in a previous chapter. By the way, a better “more assembly” way to implement the previous code is:</p><pre class="fasm"><code> cmp [byte_boolean_varaible],0
je byte_boolean_variable_is_false
byte_boolean_varaible_is_true:
<here value is TRUE>
byte_boolean_varaible_is_false:
<here value is FALSE></code></pre><p>beacause in the first version the <code>jnz</code> conditonal jump would always take place, because the instruction is executed only when <code>je</code> didn’t take place. If you don’t understand it, read again the previous chapter.</p><p>But this approach leaves 7 bits unused, and this is a waste of space. (Not in case of a single variable, but surely so with an array of similar variables). Clearly, we can “<strong>pack</strong>” 8 boolean variables into a single byte (8 bits). The only problem is setting and reading it.</p><p>First, we’ll set all 8 bits (boolean variables) using <code>mov</code> instruction.</p><pre class="fasm"><code>mov [eight_booleans],00000000b</code></pre><p>this would set all variables to zero (clear them). If we want to set some of them to one, we just set the bits in which they are stored.</p><pre class="fasm"><code>mov [eight_booleans],00010100b</code></pre><p>this will set variables in <strong>bits #2</strong> and <strong>#4</strong>, and leave all others clear.</p><p>First, how to <strong>clear</strong> one bit and leave all others unchanged? We handled this before: we can do it by <code>and</code>ing:</p><pre class="fasm"><code>and [eight_booleans],11110111b</code></pre><p>… this will clear <strong>bit#3</strong> (<code>and</code>ed with <code>0</code> so the result will be <code>0</code>), and leave all other bits unchanged (<code>and</code>ed with <code>1</code> so they will reamain unchanged). This will clear <strong>bits #3</strong> and <strong>#5</strong>:</p><pre class="fasm"><code>and [eight_booleans],11010111b</code></pre><p>All this should be clear to you if you understood <a href="#binary-operations-instructions"><strong>Chapter 6.4</strong></a>.</p><p>Now, how to <strong>set</strong> one of the variables:</p><pre class="fasm"><code>or [eight_booleans],00001000b</code></pre><p>… this sets <strong>bit#3</strong> to <code>1</code> (<code>or</code>ing with <code>1</code> always yelds <code>1</code>) and leave the others unchanged (<code>or</code>ing with <code>0</code> leaves unchanged).</p><p>And, of course, using <code>xor</code> we can <strong>flip</strong> bit(s):</p><pre class="fasm"><code>xor [eight_booleans],00001000b</code></pre><p>… will flip <strong>bit#3</strong> and leave the others unchanged.</p><p>These were just a reminder, now let’s deal with how to check the value of bits. Checking the value of bit is called “<strong>bit testing</strong>”.</p><div class="alert alert-info">term: <strong>bit testing</strong></div>
<!-- NOTE: FOLLOWING PARAGRAPH WAS REVISED --><p>You often need to test the value of some boolean variable and then do something (jump somewhere) if it is (or isn’t) TRUE. We did this with a byte-sized boolean variable using the <code>cmp</code> instruction, but it is impossible to use <code>cmp</code> for testing just a single bit of a byte. For this reason, there is a <code>test</code> instruction. It takes the same arguments as <code>mov</code>, <code>xor</code>, <code>and</code>, <code>cmp</code>, etc.</p><p>It <code>and</code>s it’s operands and then sets flags accordingly, so that if the result of <code>and</code>ing them was <code>0</code> then <code>je</code> will jump, otherwise (if result wasn’t zero) <code>je</code> won’t jump (and <code>jnz</code> will).</p><pre class="fasm"><code>test arg1,arg2</code></pre><p>… acts similarly to:</p><pre class="fasm"><code>and arg1,arg2
cmp arg1,0</code></pre><p>… but it doesn’t modify <code>arg1</code> and you use <code>jz</code> (jump if zero) and <code>jnz</code> (jump if not zero) conditional jumps. <code>jz</code> jumps if the result of virtual <code>and</code>ing (testing) is zero. Similary, <code>jnz</code> jumps, if result is not zero (eg. at least one of the tested bits is non-zero)</p><div class="alert"><strong>NOTE:</strong> In fact, <code>jz</code> is the same instruction as <code>je</code>, and <code>jnz</code> is the same as <code>jne</code>; therefore, in our <code>and</code>/<code>cmp</code> example, using <code>jz</code> would be the same as using <code>je</code>.</div>
<div class="alert alert-info">instruction: <strong>test</strong></div><p>An example of using <code>test</code>:</p><pre class="fasm"><code>test [eight_booleans],00001000b
jz bit_3_is_clear
bit_3_is_set:
<...>
bit_3_is_clear:
<...></code></pre><!-- TEXT CORRECTED TO MATCH CODE: `je` => `jz` --><p>… all bits but the third one of <code>eight_booleans</code> are <code>and</code>ed against <code>0</code> (but <code>eight_booleans</code> remains unmodified), which means they are cleared, only the value of <strong>bit#3</strong> will remain. The result of this operation will be zero (and <code>jz</code> will jump) only if <strong>bit#3</strong> is <code>0</code>. If it’s <code>1</code>, the result of the operation will be <code>00001000b</code>, not <code>0</code>, so <code>jz</code> won’t jump.</p><p>Now a slightly more dificult example:</p><pre class="fasm"><code>test [eight_booleans],00101000b
je bits_3_and_5_clear
bits_3_and_5_not_both_clear:
<...>
bits_3_and_5_clear:
<...></code></pre><p>… <strong>bits #3</strong> and <strong>#5</strong> of <code>eight_booleans</code> will remain, so the result of the operation will be <code>0</code> (and <code>je</code> will jump) only when both these bits are <code>0</code>. If at least one of these bits is <code>1</code> the result won’t be <code>0</code> (it can be <code>00001000b</code>, <code>00100000b</code> or <code>00101000b</code>) and <code>jz</code> won’t jump. But testing two bits at once is not usual practice, at least not for beginners, I gave this example just to provide a better picture of how <code>test</code> works.</p><h1 id="arithmetic-instructions-more-on-flags">7. Arithmetic Instructions: More on Flags</h1><p>In this chapter you will learn how to perform basic math operations in assembly language. Then we’ll look deeper into how the processor carries them out, and in doing so yoi’ll learn more about flags.</p><h2 id="addition-and-substraction">7.1. Addition and substraction</h2><p>The simplest case of addition is addition by one, called “<strong>increment</strong>”. For example, if we increment a variable holding value 5, it will contain 6, etc.</p><p>The instruction to perform increment is <code>inc</code> (its name should be obvious). It has one operand, which tells what should be incremented (ie: to what will 1 be added). The operand can be a register or a memory variable. It can’t be a constant, obviously, because such an instruction (even if it existed) wouldn’t have any effect. Example of increment:</p><pre class="fasm"><code>mov ax,5
inc ax ;increment (add 1 to) value in ax
;here ax holds value 6</code></pre><p>… I think it should be self-explaining (if it isn’t, you’ll see later why).</p><p>Substracting 1 from a value is called “<strong>decrement</strong>”. Decrement is the opposite of increment. The instruction which performs decrement is <code>dec</code>. Example:</p><pre class="fasm"><code>mov ax,5
inc ax ;increment (add 1 to) value in ax
;here ax holds value 6
dec ax ;decrement (subtract 1 from) value in ax
;here ax holds value 5 again</code></pre><!-- FIXED: "(substracting 0)" => " (subtracting 1)" -->
<div class="alert alert-info"><p>terms: <strong>increment</strong> (adding 1), <strong>decrement</strong> (substracting 1)</p>instructions: <strong>inc</strong>, <strong>dec</strong></div><p>If you wan’t to add or substract more than 1, you can use more <code>inc</code>s or <code>dec</code>s, but that is a rather ugly way to do it, requires more typing, and the code gets big and slow. So there is instruction which can add any value, this instruction is <code>add</code>. It takes two arguments, the first one is the <em>destination</em> (ie: the value being added to), and the second one is the value to be added. Argument types are the same as for <code>mov</code>: the first one can be a register or a memory variable, the second one can be a constant, a register or a memory variable (only if the first one isn’t memory variable! Always remember: <mark>a single instruction can’t access two memory locations</mark>). Example:</p><pre class="fasm"><code>mov ax,5
add ax,5
;here ax contains 10</code></pre><p>Another example:</p><pre class="fasm"><code>mov ax,5
mov bx,5
add bx,[five]
add ax,bx
;here ax contains 15, bx contains 10
five dw 5</code></pre><p>The instruction for substracting is <code>sub</code>. It’s the exact opposite of <code>add</code>, but it’s used the same way:</p><pre class="fasm"><code>mov ax,15
mov bx,10
sub bx,[five]
sub ax,bx
;here ax contains 10, bx contains 5
five dw 5</code></pre><h2 id="overflows">7.2. Overflows</h2><p>There are some cases with addition and substraction which I haven’t yet mentioned. For example, if you try to add 10 to a byte-sized variable holding 250 (the biggest number a byte-sized variable can hold is 255). In such cases, we say that an <code>overflow</code> has occured.</p><p>But the question is, what happens to the result of an operation that has overflown? When the upper limit of a variable is crossed, the result of the operation will be the rest of the value to be added. We can say that the operation will be “<strong>wrapped</strong>” from maximum value to minimal value. For example:</p><pre class="nohighlight"><code>byte 255 + 1 = 0
byte 255 + 2 = 1
byte 254 + 3 = 1
byte 250 + 10 = 5
byte 255 + 255 = 254
word 65535 + 1 = 0
word 65535 + 65535 = 65534
etc. </code></pre><p>There is also another case, when the result of the operation falls below the lower limit (which is 0 for all variable sizes). In this case the result of the operation will be wrapped from the lower limit to the upper limit. This case is called <code>underflow</code>. For example:</p><pre class="nohighlight"><code>byte 0 - 1 = 255
byte 0 - 255 = 1
byte 254 - 254 = 0
byte 254 - 255 = 255
etc.</code></pre><div class="alert"><strong>NOTE:</strong> The word <code>oveflow</code> is usually used for both <code>overflow</code> and <code>underflow</code>.</div>
<div class="alert alert-info">Terms: <strong>Overflow</strong>, <strong>Underflow</strong></div><p>We also need to know how to check if an overflow has occured after performing an operation, to prevent bugs. For this purpose, flags are used. I already mentioned flags in <a href="#comparing-and-conditonal-jumps"><strong>Chapter 5.3</strong></a>. We used flags for checking the results of comparison at conditional jumps, and I also said that there shouldn’t be any instrcutions between a comparison and its jumps because many instructions change the flags (of course, you can place an instruction there if you are sure it won’t change any needed flag). Arithmetic instructions <code>add</code> and <code>sub</code> use a flags’ bit called <code>CF</code> (carry flag). If an overflow occurs, they set it to <code>1</code>, otherwise they set it to <code>0</code>. You can test the carry flag with conditional jumps <code>jc</code> and <code>jnc</code> (see <a href="#comparing-and-conditonal-jumps"><strong>Chapter 5.3</strong></a> about conditional jumps). <code>jc</code> jumps if the carry flag is set, <code>jnc</code> jumps if the carry flag is not set. Here is an example of testing overflows:</p><pre class="fasm"><code>add ax,bx
jc overflow
no_overflow:
sub cx,dx
jc underflow
no_underflow:</code></pre><div class="alert alert-info"><p><strong>carry flag (CF)</strong> — One bit (flag) of “flags” register.</p>conditional jump instructions: <strong>jc</strong>, <strong>jnc</strong></div><h2 id="zero-flag">7.3. Zero Flag</h2><p>Instructions <code>inc</code> and <code>dec</code> don’t set <code>CF</code>, so you can’t test for overflows using <code>CF</code> with them. But there is another rule that can be used to prevent overflows with <code>inc</code> and <code>dec</code>. This rule is that when the result of an operation is zero, the flag called “<strong>zero flag</strong>” (<code>ZF</code>) is set. This flag is tested with <code>jz</code> (jump if zero flag is set) and <code>jnz</code> (jump if zero flag is clear) conditional jump instructions.</p><p>With this you can create loops, ie: repeat several times some part of code .</p><p>For example, the following code:</p><pre class="fasm"><code> org 256
mov cx,5
here:
mov dl,'a'
mov ah,2
int 21h
dec cx
jnz here
int 20h</code></pre><p>… will write:</p><pre class="nohighlight"><code>aaaaa</code></pre><div class="alert"><p><strong>NOTE:</strong> You can optimize the previous code example to:</p><pre class="fasm"><code> org 256
mov cx,5
mov dl,'a'
mov ah,2
here:
int 21h
dec cx
jnz here
int 20h</code></pre>since the value of <code>dl</code> and <code>ah</code> isn’t changed anywhere in the loop, we don’t need to set them each time the loop repeats.</div><p>Not only <code>add</code> and <code>sub</code> instructions set <code>ZF</code> if result is zero (and clear it otherwise). All basic arithmetic instructions do this. So far, you’ve learned these arithmetic instructions: <code>add</code>, <code>sub</code>, <code>and</code>, <code>xor</code> and <code>or</code>. So, after any of these instruction, <code>ZF</code> tells you if destination (first argument) of the operation holds 0. For example, You can use this behavior to check if the value of a register is 0. So far, you’ve learnt to do this with:</p><pre class="fasm"><code>cmp ax,0
jz ax_is_zero</code></pre><p>But you can also do it using “<code>or</code>”:</p><pre class="fasm"><code>or ax,ax
jz ax_is_zero</code></pre><p>… <code>or</code> won’t change <code>ax</code>, because <code>1</code> <code>or</code>ed with <code>1</code> is <code>1</code>, and <code>0</code> <code>or</code>ed with <code>0</code> is <code>0</code>. (Read again <a href="#bit-arithmetics"><strong>Chapter 6</strong></a> if you aren’t following this.) Btw, this was used on older computers because such code is faster and a few bytes smaller than with <code>cmp</code>.</p><h2 id="carry-flag-more-binary-arithmetic-instructions">7.4. Carry flag: more binary arithmetic instructions</h2><p>I mentioned the carry flag a little in connection with overflows. But <code>CF</code> is really a general-purpose flag because it can be tested easily (<code>jc</code>, <code>jnc</code> and a few more), and its value can be easily set. You will find many more uses of <code>CF</code> later on.</p><p>How to set <code>CF</code>? There are two instructions for this: <code>stc</code> and <code>clc</code>. <code>stc</code> stands for “<strong>SeT Carry</strong>”, and it “sets” the carry flag (ie: sets its value to <code>1</code>) — so <code>jc</code> performs a jump, and <code>jnc</code> doesn’t, etc., etc. (you should understand this aleady). Instruction <code>clc</code> (CLear Carry) clears <code>CF</code>.</p><p>Once we know how to work with <code>CF</code>, we can learn the rest of bit arihmetic operations. First, let’s look at <code>shl</code>. It shift the bits of a register to the left, ie: 0th bit becomes 1st, 1st becomes 2nd, and so on. The last bit (7th in a byte, 15th in a word) is moved to <code>CF</code>. The first bit becomes <code>0</code>. This way (if the highest bit was zero) we have multiplied the shifted register by 2.</p><p>Before shifting:</p><pre class="nohighlight"><code>|| bit#7 | bit#6 | bit#5 | bit#4 | bit#3 | bit#2 | bit#1 | bit#0 ||</code></pre><p>After shifting:</p><pre class="nohighlight"><code>|| bit#6 | bit#5 | bit#4 | bit#3 | bit#2 | bit#1 | bit#0 | 0 || ; ( CF = bit7 )</code></pre><p>Let me explain why the number is multipied by 2. If you remember the beginning of <a href="#bit-arithmetics"><strong>Chapter 6</strong></a>, you know that a number before shifting is:</p><pre class="nohighlight"><code>128*bit#7 + 64*bit#6 + 32*bit#5 + 16*bit#4 + 8*bit#3 + 4*bit#2 + 2*bit#1 + bit#0</code></pre><p>… so after shifting it becomes:</p><pre class="nohighlight"><code>128*bit#6 + 64*bit#5 + 32*bit#4 + 16*bit#3 + 8*bit#2 + 4*bit#1 + 2*bit#0</code></pre><p>… which is:</p><pre class="nohighlight"><code>2*(64*bit#6 + 32*bit#5 + 16*bit#4 + 8*bit#3 + 4*bit#2 + 2*bit#1 + bit#0)</code></pre><p>Therefore, if the highest bit is zero the number is multiplied by two. This way we can easily multiply by powers of two (<code>2</code>, 2^2=<code>4</code>, 2^3=<code>8</code>, 2^4=<code>16</code>, etc.). Furthermore, the highest bit is stored in <code>CF</code>, so we can test with <code>jc</code> and <code>jnc</code> if the multiplication overflowed.</p><p>Usually we want to shift more than once (multiply by 4, 8, 16, …), so <code>shl</code> takes a second argument, which tells how many times we want to shift. If we shift by a number greater than 1, <code>CF</code> will contain <code>1</code> if ANY of the discarded bits (<em>x</em> highest bits, where <em>x</em> is the number of shifts) contained <code>1</code>. This way we can still check for overflows. If you are beginner, don’t worry too much about checking for overflows, you probably won’t do it anyway <strong>:)</strong> (and therefore your program will probably contain bugs).</p><p>There is one limitation to <code>shl</code>: its arguments don’t follow the same rules as the other instructions you’ve learned (<code>mov</code>, <code>add</code>, etc.) <mark>The fisrt argument can be a register or a memory location, but the second one can only be a numeric constant or the <code>CL</code> register</mark> (really, no other!).</p><div class="alert"><strong>NOTE:</strong> Orignially, with 8086 (that’s 086, first of 80x86 series known as x86, like 286 or 486), there was only a <code>shl</code> instruction which could shift by one, and so for example <code>shl ax,3</code> was compiled into 3 <code>shl</code>s. There also wasn’t any shifting by register, you had to make a loop for that. Fortunately 80286 had shifting by constant and by <code>CL,</code> so it is OK now.</div><p>We’ve dealt with left shifting, but there is also another type of shifting, ie: shifting to the right. I hope you can by now imagine what it does, so I’ll drop just few notes about it. The instruction that performs this is <code>shr</code> (shift right). Its effect is division by two (or powers of two) without remainder. When shifting right by two, the remainder (<code>0</code> or <code>1</code>) is then found in <code>CF</code>; apart from this, <code>CF</code> beheaves like shifting left by a number greater than two: If the remainder isn’t <code>0</code> (ie: at least one of the discarded bits was <code>1</code>) then <code>CF</code> is set, otherwise it is clear.</p><h2 id="some-examples">7.5. Some examples</h2><p>At least we are now able to print the output of a number (print the number on the screen). It’s a pity that we can only write it in binary form. So here is our task: Write a program that outputs any binary number. For now, we will hardcode the number into the program, ie: <code>mov</code>e it into some register as a constant. Here is the source:</p><pre class="fasm"><code>org 100h
mov bx,65535 ;we store in bx the number we want to display
;(because it's not used by DOS services we use)
mov cx,16 ;we are displaying 16 digits (bits)
;display one digit from BX each loop
display_digit:
shl bx,1
jc display_one
;display '0'
mov ah,2
mov dl,'0'
int 21h
jmp continue
;display '1'
display_one:
mov ah,2
mov dl,'1'
int 21h
;check if we want to continue
continue:
dec cx
jnz display_digit
;end program
int 20h</code></pre><p>I hope you understand this, it’s quite simple. At each loop we shift the <code>BX</code> register left by one, so the upper bit is moved to <code>CF</code>, then we print ‘<code>0</code>’ or ‘<code>1</code>’ depending on the value of <code>CF</code> (previously the upper bit of the number) and continue to loop until we’ve printed 16 digits (because a word has 16 bits). Example of stepping through the code:</p><!-- FIXED: "Start: CF = 16" => "Start: CX = 16" --><pre class="nohighlight"><code>Start: CX = 16, BX = 1100101000001011b
Pass1: CX = 15, BX = 1001010000010110b, CF = 1
Pass2: CX = 14, BX = 0010100000101100b, CF = 1
Pass3: CX = 13, BX = 0101000001011000b, CF = 0
...
Pass14: CX = 2, BX = 1100000000000000b, CF = 0
Pass15: CX = 1, BX = 1000000000000000b, CF = 1
Pass16: CX = 0, BX = 0000000000000000b, CF = 1</code></pre><p>In my opinion, if you made it up to this point, having (generally) understood everything, you can consider yourself more than just a beginner — congratulations!!! There is still much to learn to become a well-armed assembly programmer, but now you have a solid grounding from which to start expanding your knowledge – with or without use of this tutorial. (But there are several parts which will be explained in further detail, which are hard to find in any tutorial).</p>
</body>
</html>