Assembly Language – Command Line Parsing Part 2

So, we’ve already seen how the stack works and how we can read the name of the currently executing binary off the stack. Now it’s time to actually parse command line arguments that come after the name of the binary. Reading the arguments is a little bit more complicated because we do not know in advance how many there will be.

Luckily for us, the kernel gives us a little help. Before execution of our program begins the kernel reads all the space separated arguments after the binary name. It puts these arguments into one chunk of contiguous memory as null terminated strings. Then it puts the address of this piece of memory on the stack. Finally, it puts the number of command line arguments it saw onto the stack.

So, the first thing we do is read the number of command line arguments off the stack. Then we read the address of the start of the strings. Then we iterate over this block of memory, printing each string as we see it. We know how many arguments to expect, so when we have seen that many we quit.

OK, it’s time to look at the code!

.equ NULL, 0

.section .data
NEW_LINE: .byte 10
.section .text

.globl _start
_start:

popq %rbx

decq %rbx

cmpq $0, %rbx
je exit_with_error

popq %r13 # The name of the currently executing binary
popq %r13

movq $0, %r8   # number of nulls seen so far
movq $0, %r9   # number of characters since last null 
movq $0, %r10  # number of characters up to last null
movq $0, %r12  # number of characters seen so far

loop_start:
movq (%r13,%r12,1),%rax

incq %r9
incq %r12

cmp $NULL,%al
jne loop_start 

movq %r9, %rdx
movq %r13, %rsi
addq %r10, %rsi
movq $1, %rax
movq $1, %rdi
syscall

movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall

addq %r9, %r10
movq $0,%r9

incq %r8
cmpq %r8,%rbx
je exit

jmp loop_start

exit:

movq $60, %rax
movq $0, %rdi
syscall

exit_with_error:
movq $60, %rax
movq $-1, %rdi
syscall

First we pop the number of command line arguments off the stack into the rbx register. We decrement this value with decq because it will include the name of the binary itself. We check that there are a non-zero number of command line arguments. Then we pop the binary name, which we won’t be using. After that we pop the memory address of the actual arguments into r13.

Next we set up four different registers as counters we will use when looping over the command line arguments. As the strings in memory are null-terminated we can keep track of them via null characters. So, the counters are: the number of nulls we have seen so far, the number of characters we have seen since the last null, the number of characters that came before the last null and the total number of characters we have seen overall.

movq $0, %r8   # number of nulls seen so far
movq $0, %r9   # number of characters since last null 
movq $0, %r10  # number of characters up to last null
movq $0, %r12  # number of characters seen so far

Now the loop itself begins. The first part of this loop indexes into the memory location beginning at r13 until we see a null character, incrementing r9 and r12 as we go.

loop_start:
movq (%r13,%r12,1),%rax

incq %r9
incq %r12

cmp $NULL,%al
jne loop_start 

Whenever we do see a null character we proceed to the next section. This is where we write the current string to the terminal.

movq %r9, %rdx
movq %r13, %rsi
addq %r10, %rsi
movq $1, %rax
movq $1, %rdi
syscall

movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall

The register r9 contains the number of characters since the last null, so that is the length of the current string. The memory address of the start of this block of memory is in r13, the number of characters that we saw up to the last null are in r10, so the memory address of the start of this string is the sum of those two values. We also print a newline so make our output a little prettier.

Once we have outputted the current string, we update our other counters and check to see if we have read all the command line arguments.

addq %r9, %r10
movq $0,%r9

incq %r8
cmpq %r8,%rbx
je exit

jmp loop_start

First we update the value in r10 to contain the number of characters up to the null we have just seen by adding on the value in r9. Then we reset r9 to zero. Register r8 keeps track of the number of nulls we have seen so far, so we increment it and compare against rbx. If they are equal, we jump straight to the exit. Otherwise we jump back to the start of the loop and carry on.

So, we now know how to parse our command lines. The kernel also copies the current environment variables into memory and leaves a pointer to them on the stack. These are a little bit harder to parse, so we will ignore them for now.

Leave a Reply

Your email address will not be published. Required fields are marked *