Posts Tagged ‘x86-64’

Unaligned access

Misalignment is not an error (only incurs a performance penalty) on x86 processors except for a few new instructions added in recent years. MOVDQA, for example, is an SSE2 instruction requiring alignment on 16-byte boundaries.

Textbooks have normally taught us we get a bus error if a CPU which disallows unaligned access actually encounters one.

But we observe a Linux process passing misaligned addresses to MOVDQA receives SIGSEGV (segmentation fault) instead of SIGBUS (bus error), on both ia32 and x86-64.

laptop /tmp $ cat a.c
int main ()
{
    char X[32];
    asm ("pxor %%xmm0,%%xmm0; movdqa %%xmm0,%0" : "=m"(X[1]) :: "xmm0");
    return 0;
}
laptop /tmp $ gcc -msse2 a.c
laptop /tmp $ ./a.out
Segmentation fault
laptop /tmp $ kill -l $?
SEGV

x86-64 (and ia32 beginning 80486SX) supports disallowing any misaligned access*. In that case, a normal instruction raises SIGBUS, but instructions which inherently requires alignment (e.g. MOVDQA) still raises SIGSEGV. It’s not so consistent.

* It is normally disabled. To enable it, set the AC bit in FLAGS:

pushf
or $0x40000,(%esp) (or %rsp on x86-64)
popf

Tags: , , ,

Linux’s vsyscall

It is obvious that querying the current time can in no way be done completely in userspace. However, strace does not record any system call used by the time function in Linux x86_64.

Let’s disassemble glibc:

$ objdump -d /lib64/libc-2.9.so | fgrep -A5 '<time>:'
000000000008a510 <time>:
   8a510:       48 83 ec 08             sub    $0x8,%rsp
   8a514:       48 c7 c0 00 04 60 ff    mov    $0xffffffffff600400,%rax
   8a51b:       ff d0                   callq  *%rax
   8a51d:       48 83 c4 08             add    $0x8,%rsp
   8a521:       c3                      retq

It seems glibc is redirecting the function call to something fixed at virtual address 0xffffffffff600400. But what is there?

Then I found out it was the so-called vsyscall (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the syscall instruction and is therefore ignored by strace.

The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the virtual memory[1].

There are currently 3 vsyscalls in Linux x86_64: gettimeofday, time and getcpu. Their locations in the virtual memory can be found with the VSYSCALL_ADDR macro defined in /usr/include/asm/vsyscall.h:

#ifndef _ASM_X86_VSYSCALL_H
#define _ASM_X86_VSYSCALL_H

enum vsyscall_num {
    __NR_vgettimeofday,
    __NR_vtime,
    __NR_vgetcpu,
};

#define VSYSCALL_START (-10UL << 20)
#define VSYSCALL_SIZE 1024
#define VSYSCALL_END (-2UL << 20)
#define VSYSCALL_MAPPED_PAGES 1
#define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr))

#endif /* _ASM_X86_VSYSCALL_H */

NOTE: We do not need to use vsyscalls explicitly. The corresponding glibc wrappers (for getcpu, it’s sched_getcpu) already take advantage of them.

[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..

Tags: ,

Scilab

In my Core 2 running 64-bit Linux, it insists on compiling one of the Fortran sources with “gfortran -march=athlon64 -mfpmath=sse -msse2 -m3dnow …”, regardless of the FFLAGS environment variable. (Other Fortran files honor FFLAGS, nevertheless…)

It does not matter much if you insist on use “-march=athlon64”, but “-m3dnow” is really a problem since Intel never ever supported 3DNow!

Tags: , ,

mov %edi, %edi

Here is a simple C function:

long foo (unsigned a, unsigned b)
{
    return ((long)b<<32)|a;
}

Compile it with an x86-64-targeted GCC with proper optimizations enabled (-O2 for example), you get the following instructions (in AT&T-style assembly):

foo:
        movq    %rsi, %rax
        mov     %edi, %edi
        salq    $32, %rax
        orq     %rdi, %rax
        ret

Pay attention to the red line. Literally it means assigning the value of register edi to register edi. Five years ago, anybody would agree this instruction does nothing like nops. But in an x86-64 system, this is not the case.

In x86-64 assembly, any instruction with a 32-bit register as its destination zeroes the higher 32 bits of the corresponding 64-bit register at the same time. Consequently, the function of ‘mov %edi, %edi’ is zeroing bits 32 to 63 of register rdi while leaving the lower 32 bits (i.e., register edi) unchanged.

One may want to rewrite it with a more intuitive and instruction:

andq $0xffffffff, %rdi

But this does NOT assemble! Because $0×00000000ffffffff is not representable in signed 32-bit format, but 64-bit immediates are currently allowed only in mov instructions whose destination is a general-purpose register (such a mov is usually explicitly written as movabsq). So if one must use and, one need something like this:

movl $0xffffffff, %eax
andq %rax, %rdi

Remember the zeroing rule for operations on 32-bit registers, so ‘movl $0xffffffff, %eax’ is equivalent to ‘movabsq $0xffffffff, %rax’…

X86-64 assembly really is too ugly, at least in this sense…

Reference
[1] Gentle Introduction to x86-64 Assembly

Tags: ,