Linux’s vsyscall
By chys on January 26th, 2009It is obvious that querying the current time can in no way be done completely in userspace. However, strace does not record any system call used by the time function in Linux x86_64.
Let’s disassemble glibc:
$ objdump -d /lib64/libc-2.9.so | fgrep -A5 '<time>:' 000000000008a510 <time>: 8a510: 48 83 ec 08 sub $0x8,%rsp 8a514: 48 c7 c0 00 04 60 ff mov $0xffffffffff600400,%rax 8a51b: ff d0 callq *%rax 8a51d: 48 83 c4 08 add $0x8,%rsp 8a521: c3 retq
It seems glibc is redirecting the function call to something fixed at virtual address 0xffffffffff600400. But what is there?
Then I found out it was the so-called vsyscall (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the syscall instruction and is therefore ignored by strace.
The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the virtual memory[1].
There are currently 3 vsyscalls in Linux x86_64: gettimeofday, time and getcpu. Their locations in the virtual memory can be found with the VSYSCALL_ADDR macro defined in /usr/include/asm/vsyscall.h:
#ifndef _ASM_X86_VSYSCALL_H
#define _ASM_X86_VSYSCALL_H
enum vsyscall_num {
__NR_vgettimeofday,
__NR_vtime,
__NR_vgetcpu,
};
#define VSYSCALL_START (-10UL << 20)
#define VSYSCALL_SIZE 1024
#define VSYSCALL_END (-2UL << 20)
#define VSYSCALL_MAPPED_PAGES 1
#define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr))
#endif /* _ASM_X86_VSYSCALL_H */
NOTE: We do not need to use vsyscalls explicitly. The corresponding glibc wrappers (for getcpu, it’s sched_getcpu) already take advantage of them.
[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..
Related posts:

