Last semester, I took a course on x86 (32 bit) MASM assembly language. It was objectively a difficult course and about a third of the students dropped. I had a strong approach from the beginning and was able to ace the course. This post describes my strategy for conquering x86 assembly.

Table of Contents
What makes Assembly hard?
Setting up the Development Environment
Including the Irvine libraries
Floor the gas on studying early - use my flashcards
Thoughts on Code Style
Additional Resources
One of my projects
Final Thoughts

What makes Assembly hard?

Higher level languages share a lot of syntax. It’s not hard to grasp Python if you’ve already mastered Java, or approach Ruby if you understand Javascipt. You’ll see similarities in loops, variable declarations, and functions. You just have to alter the phrasing a bit, but most of the code can be transcribed line-for-line. Assembly, however, deals with processor instructions on the most basic level. A for-loop only exists insofar as you repeatedly jump to a specific line in the code. You can’t just operate on any variable; you have to manually move values into registers to operate them and remove them to memory locations to store them. You can’t just compare variables, you have to store them in registers, execute a compare instruction, then check what flags are set in a flags register to branch. There are more steps for every given process and every movement of information has to be managed by the programmer.

Another major stumbling block is that resources for x86 masm assembly are rather sparse. You can’t simply google an error message and find a Stackoverflow answer. You need to look at textbooks and references and figure out the problem yourself.

What makes Assembly fun?

Assembly can really be satisfying to program, once you overcome the initial learning curve. Getting a grasp of how a processor functions is valuable. At such a low level, there is huge flexibility with what you can do. You will improve your binary arithmetic and really own data sizes. You can make highly efficient processes. There were points while writing assembly that I truly felt in touch with the processor, in a way I haven’t with any other language. I became one with my computer.

Setting up the Development Environment

Except for the very end of the course, I had one groupmate, Luke, who was, fortunately, as interested in conquering Assembly as I was. We diverged from what the professor recommended and created what is, in my opinion, a better system for writing single-file Assembly programs.

The basic development environment is available here on my github.

The professor recommended we use Visual Studio Community Edition to develop assembly. While VS will assemble, and you can find a plugin for syntax highlighting, it’s slow, and it’s annoying to create a whole new project when you just want to assemble a 50 line piece of code to test something. I found it way more straightforward to develop in VSCode.

However, you do need to get VS Community Edition to get the assembler. Luke worked out how to get the actual Assembler out of the Visual Studio files. I can’t share the Assembler itself in the linked development environment, due to Microsoft’s restrictions, but the instructions for where to find the MASM assembler and linker are in the Github readme on the Development environment we created. That repo and readme should provide everything you need to get a VSCode Assembly Development Environment up and running in less than half an hour.

Including the Irvine libraries

We used the textbook Assembly Language for x86 Processors by Kip Irvine as a textbook. I highly recommend purchasing the Irvine textbook. It is easy to follow, much better than any free resources I found online. You can find it on Irvine’s website.

You’ll also want the Irvine libaries, which are available on his Github, along with sample projects. Here’s the zip of the Irvine libraries. I did include this in the development environment but please note, they are only licensed for educational purposes.

The libraries wrap Windows functions to, for example, change the color or move the cursor in the console, read input, or create a popup window. They are very convenient for creating actual functional console applications. And the Irvine textbook provides detailed instructions for using them.

This convenient website catalogs most of the functions, structures, and macros that Irvine provides.

Floor the gas on studying early

I realized early that Assembly was tough and there was a lot of new information to understand. So for the first 2 or 3 weeks, I studied every day and I studied efficiently. I read the textbook and created hundreds of Anki cards from it. I reviewed all the cards daily. I wrote small programs to test things out. Anki is so efficient at slamming information into your memory; I can’t recommend it enough.

Here are my flashcard decks. You will of course have to download Anki, the magnificent open source flash card program. Feel free to use them in your own studies:

After the initial push, I didn’t have to review the cards much. I’d glance through them before a quiz or test, and I created one last deck to shore up some areas before the final exam, but that initial push set me up for success and made the rest of the class a lot easier.

The cards are flawed in numerous ways but they’re free, and, considering the lack of study materials out there, are probably some of the better study resources you’ll find for beginning MASM anywhere on the internet.

Thoughts on Code Style

The projects we created for this course ran several thousand lines. This quickly becomes very difficult to manage. Having good style is, in my opinion, critical for navigating the code file as it grows.

Over the course of the semester, I developed a system for code style that worked out really well for us.

Descriptions for Procedures

Create descriptions following every procedures explaining what registers they use for parameters and what registers they alter as ‘return’ values. This is critical for being able to easily re-use procedures.

Here’s an example of a simple procedure that takes the al register as a parameter and returns in that same register. The most common source of bugs, in my experience, was unintentionally altering registers.

    capitalize_if_letter proc
        ; ---------------------------------------------------------------------------
        ; capitalize_if_letter
        ;
        ; changes al to a capital letter, if it's a letter. Otherwise stays the same.
        ;   RECEIVES    al: a character to test
        ;   RETURNS     al: a character, capitalized if a letter
        ; ---------------------------------------------------------------------------
        cmp al, 'a'
        jl finish_up
        cmp al, 'z'
        jg finish_up
        and al, 11011111b
        finish_up:
        ret
    capitalize_if_letter endp

Don’t miss an opportunity to add semantic information

Use every opportunity to add semantic information to code.

- Labels should be long and desciptive
- Identifiers should be long and descriptive
- Procedure names should be long and descriptive

Here’s an example of how long names and compartmentalizing into many procedures can create readable code, even in x86 assembly:

    broadcast_until_done proc
        ; -------------------------------------------------------------
        ; Creates the starting packet and initiates the broadcast loop.
        ; -------------------------------------------------------------
        pushad
        call Clrscr
        call reset_network
        call reset_metrics_and_create_initial_packet
        while_there_are_active_packets:
            cmp active_packets, 0
            jle no_more_active_packets
            call print_time 
            call transmit_all_nodes
            call increment_time
            call print_time
            call receive_all_nodes
            jmp while_there_are_active_packets
        no_more_active_packets:
            call print_final_metrics
        call ReadChar
        call Clrscr
        popad 
        ret
    broadcast_until_done endp

Use indent-based code folding.

Indent everything so that it can fold, and so you can read it.

Instructions inside procedures should be tabbed over
Tab over the areas that will be looped, as if it were a code block
Compartmentalize as much as possible into procedures
Group like procedures and create visually obvious dividers between sections of procedures. Have the procedures tabbed more than the dividers to allow for folding on the dividers.

Here’s an example of how a 2743 line project can be displayed in just 35 lines due to folding on indentation levels. The Print Procedures section is unfolded, as is the Print Errors subsection. I made frequent use of the VS Code shortcut ctrl + k ctrl + 0 to refold everything after making edits in a particular location.

A cropped screenshot of a large assembly project in VSCode, partially unfolded

Use high-level pseudcode in comments

Especially when you’re debugging, use pseudocode similar to a high level language in adjacent comments to clarify what you’re doing. This is how I tracked down most bugs. I also used it whenever I wanted my group mate to be able to quickly understand what a procedure I wrote was doing.

I imitated C++ style code, with an addition. When describing a register in the comments, I add a double colon (::) and then a semantic name describing the data that register is holding on that particular line. That way, I could track what was supposed to be going on in registers as I worked.

Here’s an example of a heavily commented procedure.

  ; ========================== Loop Procedures
    .data
        str_node_print_transmit byte " Processing Transmit Queue for node : ", 0
    .code
    transmit_all_nodes proc
        ; ----------------------------------------------------------------------------------------
        ; Loops over all the nodes, printing their names. Calls node_transmit method on each node.
        ; ----------------------------------------------------------------------------------------
        pushad

        mov cl, 0                               ; cl::nodes_iterated = 0
        mov esi, offset network                 ; esi::index = &nodes
        mov edx, offset str_node_print_transmit ; edx::str_node = &str_node_print

        do_while_nodes_left_to_process:
            call printString                    ; print(edx::str_node)
            mov al, (NODE ptr [esi]).id         ; al::node_id = esi::&node->id
            call printChar                      ; print(al::node_id)

            call db_node_print_queues

            call printCrlf                           ; print('\n')
            call node_transmit
            movzx ebx, (NODE ptr[esi]).n_connections  ; ebx::TRANSFER_BUFFER_count = esi::&node->n_connections
            add esi, sizeof NODE                ; esi::end_of_node = esi::index + sizeof(NODE)              
            imul ebx, sizeof CONNECTION         ; ebx::node_TRANSFER_BUFFERs_size = ebx::cnxt_count * sizeof(CONNECT)   
            add esi, ebx                        ; esi::&next_node = esi::&node + ebx::node_TRANSFER_BUFFERs_size         
            add cl, 1                           ; ecx::nodes_iterated += 1                                 
            cmp cl, nodes_count                 ; if(cl::nodes_iterated > nodes_count)                       
            jge finish_up                       ;    break;
            jmp do_while_nodes_left_to_process  ; else continue;

        finish_up:

        popad
        ret 
    transmit_all_nodes endp

Additional Resources

Here are the resources I found most useful while coding.

One of my projects

I am including the third project I did for this class with this post. It was an “operating system simulator”. There are two lists of ‘jobs’, a run list and a hold list. Each job has a priority and a run time. When the step command is executed, jobs in the run list have their time decremented until they are completed. You can execute the command help in the program for a list of available commands. Jobs do not actually do anything real. The program was more about exploring input parsing handling variable input arguments, which was accomplished with a COMMAND and COMMAND_PARAM structure.

A cropped screenshot of the console output of Operating System Simulator

I think it is a good example of the code style I have described in this post, and I hope you find it interesting.

Download it here: OS_simulator_project.asm

Final Thoughts

I don’t know how many pure x86 MASM Assembly projects I’ll create in the future. But I definitely expect to read plenty of Assembly while debugging C and C++ code. C++ also allows for in-lining assembly and I look forward to the first opportunity I have to make a procedure blazing fast by in-lining.

I also believe that knowing MASM will translate into other Assembly languages. At some point, maybe I’ll program a microcontroller to do something nifty. Maybe I’ll try to learn ARM. Having these fundamentals from MASM will make such a transition much easier.

In general, I learned a lot about computer architecture from this course and I’m a much stronger programmer because of it.

Table of Contents