With all Macs now shipping with Intel microprocessors, you’d think it would be easy to port assembly code written for Windows to OS X. Bzzt. Unfortunately, not. The problem lies with OS X’s calling conventions on 32-bit Intel machines.

Calling conventions are agreed upon standards at the assembly level, dictating how registers and the stack are used for passing parameters and returning results. Basically, all code in a single program must agree on the same convention, otherwise you have a communication breakdown and bad things happen. You wouldn’t want one function assuming a parameter is in one register calling another function, expecting it to be in another. Unfortunately, OS X deviates from standard Windows (cdecl) calling conventions.

The biggest difference is that the stack must be 16-byte aligned at the point of a function call. They wanted to do this because all Macs, coming late to to the Intel party, can assume they’re running on a processor with SSE, Intel’s vector instruction set, similar in concept to the PowerPC AltiVec. OS X uses SSE for all floating point math by default, for a performance boost. SSE, however, has some instructions that require 16-byte alignment. Apple just said, screw it, we have no legacy code, let’s just require the stack to be 16-byte aligned so we can use SSE.

This is a very subtle change. If you’re just calling a function with 1 argument on Windows (with cdecl conventions), you just push the argument, call the function, and pop the argument:

pushl arg1
call _function
addl $4, %esp // Pop arg1

On OS X, you have to make sure the stack is 16-byte aligned, by adding padding bytes, if necessary:

// Assume 16-byte alignment right here
subl $12, %esp // Padding bytes
pushl arg1
call _function
// Still 16-byte aligned, 12 for padding plus 4 for arg1
addl $16, %esp // Pop 16 bytes for arg1 and padding

So, if you’re porting Windows code that doesn’t care about alignment, you basically have to pour through all of your code and change how each function is being called to make sure the proper padding is being added. This is a very daunting task and quite error prone. No wonder they cancelled VBA on Intel machines.

Now this is where it gets weird. If you don’t align to 16-bytes, your code may not break. If the called function does not use SSE, it doesn’t really care if the stack is 16-byte aligned, and it’ll just work. The padding is added before the parameters, and from the called function’s perspective, arg1 is in the same place on the stack either way. And since most functions don’t use floating point, your code may just work. For a while, at least. Then all of a sudden, you get weird a run time exception. Most likely, an illegal instruction, or EXC_BAD_INSTRUCTION in gdb terminology. The offending instruction will most likely be:

movdqa %xmm0,32(%esp)

Here’s one of those SSE instructions that requires 16-byte alignment, for performance reasons. In the above case, if the stack pointer (%esp) isn’t aligned to a 16-byte address, you get an exception. The reason why you usually see this instruction, is that it’s part of a function prolog that’s saving the SSE registers on the stack. %xmm0 just happens to be the first register the compiler likes to save, first.

So, unfortunately, you really do need to go through pain to port Windows code to the 16-byte aligned callilng conventions of OS X. The good news is that OS X calling conventions work on Windows. It just wastes a few bytes on the stack.

Stay tuned for part 2, where I tell you I just lied, and show you an easier way.