So, we’ve already seen how the stack works and how we can read the name of the currently executing binary off the stack. Now it’s time to actually parse command line arguments that come after the name of the binary. Reading the arguments is a little bit more complicated because we do not know in advance how many there will be.
Luckily for us, the kernel gives us a little help. Before execution of our program begins the kernel reads all the space separated arguments after the binary name. It puts these arguments into one chunk of contiguous memory as null terminated strings. Then it puts the address of this piece of memory on the stack. Finally, it puts the number of command line arguments it saw onto the stack.
So, the first thing we do is read the number of command line arguments off the stack. Then we read the address of the start of the strings. Then we iterate over this block of memory, printing each string as we see it. We know how many arguments to expect, so when we have seen that many we quit.
OK, it’s time to look at the code!
.equ NULL, 0
.section .data
NEW_LINE: .byte 10
.section .text
.globl _start
_start:
popq %rbx
decq %rbx
cmpq $0, %rbx
je exit_with_error
popq %r13 # The name of the currently executing binary
popq %r13
movq $0, %r8 # number of nulls seen so far
movq $0, %r9 # number of characters since last null
movq $0, %r10 # number of characters up to last null
movq $0, %r12 # number of characters seen so far
loop_start:
movq (%r13,%r12,1),%rax
incq %r9
incq %r12
cmp $NULL,%al
jne loop_start
movq %r9, %rdx
movq %r13, %rsi
addq %r10, %rsi
movq $1, %rax
movq $1, %rdi
syscall
movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall
addq %r9, %r10
movq $0,%r9
incq %r8
cmpq %r8,%rbx
je exit
jmp loop_start
exit:
movq $60, %rax
movq $0, %rdi
syscall
exit_with_error:
movq $60, %rax
movq $-1, %rdi
syscall
First we pop the number of command line arguments off the stack into the rbx
register. We decrement this value with decq
because it will include the name of the binary itself. We check that there are a non-zero number of command line arguments. Then we pop the binary name, which we won’t be using. After that we pop the memory address of the actual arguments into r13
.
Next we set up four different registers as counters we will use when looping over the command line arguments. As the strings in memory are null-terminated we can keep track of them via null characters. So, the counters are: the number of nulls we have seen so far, the number of characters we have seen since the last null, the number of characters that came before the last null and the total number of characters we have seen overall.
movq $0, %r8 # number of nulls seen so far
movq $0, %r9 # number of characters since last null
movq $0, %r10 # number of characters up to last null
movq $0, %r12 # number of characters seen so far
Now the loop itself begins. The first part of this loop indexes into the memory location beginning at r13
until we see a null character, incrementing r9
and r12
as we go.
loop_start:
movq (%r13,%r12,1),%rax
incq %r9
incq %r12
cmp $NULL,%al
jne loop_start
Whenever we do see a null character we proceed to the next section. This is where we write the current string to the terminal.
movq %r9, %rdx
movq %r13, %rsi
addq %r10, %rsi
movq $1, %rax
movq $1, %rdi
syscall
movq $1, %rdx
movq $NEW_LINE, %rsi
movq $1, %rax
movq $1, %rdi
syscall
The register r9
contains the number of characters since the last null, so that is the length of the current string. The memory address of the start of this block of memory is in r13
, the number of characters that we saw up to the last null are in r10
, so the memory address of the start of this string is the sum of those two values. We also print a newline so make our output a little prettier.
Once we have outputted the current string, we update our other counters and check to see if we have read all the command line arguments.
addq %r9, %r10
movq $0,%r9
incq %r8
cmpq %r8,%rbx
je exit
jmp loop_start
First we update the value in r10
to contain the number of characters up to the null we have just seen by adding on the value in r9
. Then we reset r9
to zero. Register r8
keeps track of the number of nulls we have seen so far, so we increment it and compare against rbx
. If they are equal, we jump straight to the exit. Otherwise we jump back to the start of the loop and carry on.
So, we now know how to parse our command lines. The kernel also copies the current environment variables into memory and leaves a pointer to them on the stack. These are a little bit harder to parse, so we will ignore them for now.