Posts Tagged ‘Linux’
Unaligned access
By chys on December 26th, 2009Misalignment is not an error (only incurs a performance penalty) on x86 processors except for a few new instructions added in recent years. MOVDQA, for example, is an SSE2 instruction requiring alignment on 16-byte boundaries.
Textbooks have normally taught us we get a bus error if a CPU which disallows unaligned access actually encounters one.
But we observe a Linux process passing misaligned addresses to MOVDQA receives SIGSEGV (segmentation fault) instead of SIGBUS (bus error), on both ia32 and x86-64.
laptop /tmp $ cat a.c
int main ()
{
char X[32];
asm ("pxor %%xmm0,%%xmm0; movdqa %%xmm0,%0" : "=m"(X[1]) :: "xmm0");
return 0;
}
laptop /tmp $ gcc -msse2 a.c
laptop /tmp $ ./a.out
Segmentation fault
laptop /tmp $ kill -l $?
SEGV
x86-64 (and ia32 beginning 80486SX) supports disallowing any misaligned access*. In that case, a normal instruction raises SIGBUS, but instructions which inherently requires alignment (e.g. MOVDQA) still raises SIGSEGV. It’s not so consistent.
* It is normally disabled. To enable it, set the AC bit in FLAGS:
pushf
or $0x40000,(%esp)(or%rspon x86-64)
popf
Floating point exception
By chys on December 1st, 2009It is already confusing enough that “floating point exception” may mean “division by zero” in integral arithmetic. It turns out it can also mean “overflow” in some cases, as in the following program (it’s difficult in C, so I had to use assembly):
#include <asm/unistd.h>
.code:
.globl _start
_start:
mov $1, %eax
mov $1, %edx
div %eax
mov $__NR_exit_group, %eax
int $0x80
(Type “gcc -m32 -nostdlib a.S” to compile and link.)
In this program, EDX:EAX (0x100000001) divided by ECX (0x1) cannot be represented in 32-bit integer and thus it is an overflow. X86 CPUs raise a “division by zero” interruption (int 0) in such cases, and “division by zero” is displayed as “floating point exception” in Linux…
PS. The same assembly program in Intel style:
.code
.startup
MOV EAX,1
MOV EDX, 1
DIV EAX
MOV EAX, __NR_exit_group
INT 80H
Ack is a good alternative to grep
By chys on September 6th, 2009When I grep -r something in a Subversion-controlled directory, I get a lot of results under the .svn.
In this sense, Git is better than Subversion. (Git creates only one .git directory, and stores data in some compressed format which gets ignored by grep -r.)
So I have switched to ack when working with Subversion. Ack is written in Perl, claimed to be “aimed at programmers with large trees of heterogeneous source code.”
I can’t PGO compile Firefox
By chys on July 17th, 2009A normal build of Firefox for Linux is reportedly even slower than the Win32 binary running under Wine.
The reason is reportedly that the pre-compiled binary for Windows uses PGO (profile guided optimization), which is usually not enabled under Linux. Sure, the fact that GCC does not generate as efficient codes as VC may also be a reason.[1]
Firefox also supports PGO in Linux. However, I failed at this (3.5.1). The profile-generating binary always segfaults.
Other people have encountered the same problem, even with the official PKGBUILD from Arch Linux. It is said to be a compiler problem.
Well, gave up. Maybe I’ll try it again some time later, with a more “stable” version of GCC probably.
[1] This statement only applies to the 32-bit platform. It seems GCC does a very good job on x86-64.
Profile-guided optimization is a relatively new feature. GCC began supporting it starting version 4.0; Microsoft VC 2005; and Intel C/C++/Fortran 9 (?).
A typical PGO-enabled building requires three steps:
(1) Build a profile-generating binary;
(2) Run the binary, which automatically collects useful data – branch probability, etc.
(3) Rebuild the program, using the data (“profile”) from Step 2.
With PGO, Internet Explorer reportedly gains an improvement of 8%, and Firefox 11% in JavaScript.
Tags: browser, dev, GCC, Linux, optimization
install vs. cp; and mmap
By chys on May 8th, 2009If we hand write a Makefile, we should always stick to install instead of using cp for the installation commands. Not only is it more convenient, but it does things right (cp does things wrong).
For example, if we attempt to update /bin/bash, which is currently running, with “cp ... /bin/bash”, we get a “text busy” error. If we attempt to update /lib/libc.so.6 with “cp ... /lib/libc.so.6”, then we either get “text busy” (in ancient versions of Linux) or breaks each and every running program within a fraction of a second (in recent versions of Linux). install does the thing right in both situations.
The reason why cp fails is that it simply attempts to open the destination file in write-only mode and write the new contents. This causes problem because Linux (and all contemporary Unices as well as Microsoft Windows) uses memory mapping (mmap) to load executables and dynamic libraries.
The contents of an executable or dynamic library are mmap’d into the linear address space of relevant processes. Therefore, any change in the underlying file affects the mmap’d memory regions and can potentially break programs. (MAP_PRIVATE guarantees changes by processes to those memory regions are handled by COW without affecting the underlying file. On the contrary, POSIX leaves to implementations whether COW should be used if the underlying file is modified. In fact, for purpose of efficiency, in Linux, such modifications are visible to processes even though MAP_PRIVATE may have be used.)
There is an option MAP_DENWRITE which disallows any modification to the underlying file, designed to avoid situations described above. Executables and dynamic libraries are all mmap’d with this option. Unfortunately, it turned out MAP_DENYWRITE became a source of DoS attacks, forcing Linux to ignore this option in recent versions.
Executables are mmap’d by the kernel (in the execve syscall). For kernel codes, MAP_DENYWRITE still works, and therefore we get “text busy” errors if we attempt to modify the executable.
On the other hand, dynamic libraries are mmap’d by userspace codes (for example, by loaders like /lib/ld-linux.so). These codes still pass MAP_DENYWRITE to the kernel, but newer kernels silently ignores this option. The bad consequence is that you can break the whole system if you think you’re only upgrading the C runtime library.
Then, how does install solve this problem? Very simple – unlinking the file before writing the new one. Then the old file (no longer present in directory entries but still in disk until the last program referring to it exits) and the new file have different inodes. Programs started before the upgrading (continuing using the old file) and those after the upgrading (using the new version) will both be happy.
Dynamic library symlinks
By chys on April 20th, 2009Happened to be asked – so a chance to write it down.
Take giflib as an example: It installs three .so files in /usr/lib: libgif.so, libgif.so.4, libgif.so.4.1.6.
The third file is the true library, and the first two are both symlinks to the third. It apparently is because that we want to allow multiple versions to coexist in one system that we append the version to the filename.
It is easy to understand why we need libgif.so (otherwise “gcc ... -lgif” is going to fail.)
The number appended to the second symlink, namely 4 here, is the ABI version. It does not have to be part of the full version, though it usually is. (glibc, the most important library in a working Linux system, is an exception.)
When linking a program against the library, we specify -lgif in the command line, and the linker would follow the symlink and find libgif.so.4.1.6. However, the library name recorded in the executable is libgifso.4 instead. (This name is specified by -Wl,-soname when making the library.) Consequently, if we later make a binary-compatible upgrade to the library and remove the older version, the executable still works. But if the upgrade is a major one (potentially binary-incompatible but still source-level compatible), we must either keep the older library or recompile the executable. If we don’t make these symlinks and simply use one file libgif.so, we will have hard-to-debug segmentation faults instead of missing-library error messages in such cases.
[Gentoo] Logrotating emerge.log
By chys on April 4th, 2009It seems the reason why emerge.log is not logrotate‘d is that log analyzers (qlop and genlop) expect a full log. I’m not sure about qlop, but since I use only genlop, it seems okay to add it to /etc/logrotate.d – genlop supports reading logs from multiple files and also from compressed files (gzip and bzip2).
Create a logrotate configuration file. /etc/logrotate.d/emergelog:
/var/log/emerge.log {
compresscmd /bin/bzip2
uncompresscmd /bin/bunzip2
compressext .bz2
rotate 100
create 660 portage portage
delaycompress
daily
size 2M
}
I configured my logrotate to compress using lzma by default, but genlop recognizes only gz and bz2, so I need to explicitly specify bzip2 (or gzip) here.
A wrapper for genlop is also necessary. /usr/local/bin/genlop:
#!/bin/bash
f=()
for x in /var/log/emerge.log*; do
f=("${f[@]}" -f "$x")
done
exec /usr/bin/genlop "${f[@]}" "$@"
Ext4 data loss
By chys on March 19th, 2009I, too, have experienced data losses in ext4 partitions.
There was some problem with my X that hangs the system about once a week (I was upgrading my X system and drivers too aggressively) so sometimes I had to hard reboot my computer (even magic SysRq does not respond in such cases). I lost most or all of my KDE settings after the reboots, and for one time also all my Thunderbird settings. This never happened until I migrated to ext4..
So I converted all my ext4′s back to ext3 and downgraded the kernel to 2.6.27 which is considered by Gentoo as stable. Loss of KDE settings is no big deal, but I really don’t want to have a kernel bug erase my codes, or homework that is due tomorrow, or diary I’ve kept for eight years.
Anyway, the performance of ext4 is really good..
References – Other recent reports of ext4 data losses:
[1] https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
[2] http://www.h-online.com/open/Ext4-data-loss-explanations-and-workarounds–/news/112892
[3] http://www.h-online.com/open/Possible-data-loss-in-Ext4–/news/112821
[4] http://cookinglinux.cn/ext4-lose-data.html (Chinese)
Linux’s vsyscall
By chys on January 26th, 2009It is obvious that querying the current time can in no way be done completely in userspace. However, strace does not record any system call used by the time function in Linux x86_64.
Let’s disassemble glibc:
$ objdump -d /lib64/libc-2.9.so | fgrep -A5 '<time>:' 000000000008a510 <time>: 8a510: 48 83 ec 08 sub $0x8,%rsp 8a514: 48 c7 c0 00 04 60 ff mov $0xffffffffff600400,%rax 8a51b: ff d0 callq *%rax 8a51d: 48 83 c4 08 add $0x8,%rsp 8a521: c3 retq
It seems glibc is redirecting the function call to something fixed at virtual address 0xffffffffff600400. But what is there?
Then I found out it was the so-called vsyscall (virtual system call) mechanism, which Linux used as an effort to make certain system calls as fast as possible. This does not involve the syscall instruction and is therefore ignored by strace.
The vsyscalls are part of the kernel, but the kernel pages containing them are executable with userspace privileges. And they’re mapped to fixed addresses in the virtual memory[1].
There are currently 3 vsyscalls in Linux x86_64: gettimeofday, time and getcpu. Their locations in the virtual memory can be found with the VSYSCALL_ADDR macro defined in /usr/include/asm/vsyscall.h:
#ifndef _ASM_X86_VSYSCALL_H
#define _ASM_X86_VSYSCALL_H
enum vsyscall_num {
__NR_vgettimeofday,
__NR_vtime,
__NR_vgetcpu,
};
#define VSYSCALL_START (-10UL << 20)
#define VSYSCALL_SIZE 1024
#define VSYSCALL_END (-2UL << 20)
#define VSYSCALL_MAPPED_PAGES 1
#define VSYSCALL_ADDR(vsyscall_nr) (VSYSCALL_START+VSYSCALL_SIZE*(vsyscall_nr))
#endif /* _ASM_X86_VSYSCALL_H */
NOTE: We do not need to use vsyscalls explicitly. The corresponding glibc wrappers (for getcpu, it’s sched_getcpu) already take advantage of them.
[1] I really hate Microsoft’s use of the term ‘virtual memory’ to refer to swapping files in disks! It once confused me so much..
Difference between dup(0) and open(“/dev/fd/0″,…);
By chys on January 14th, 2009I believe APUE (2nd ed.; Sec. 3.16) is not correct.
APUE says fd = open("/dev/fd/0", mode); is equivalent to fd = dup (0);, and mode is completely ignored. It seems this is the case in Solaris, but wrong in Linux. (I don’t have access to other Unices at this moment.)
A test program:
01 #include <unistd.h>
02 #include <fcntl.h>
03
04 int main ()
05 {
06 close (0);
07 printf ("%dn", open ("a.txt", O_RDONLY)); // Should be 0
08 //int f2 = open ("/dev/fd/0", O_WRONLY);
09 int f2 = dup(0);
10 printf ("%dn", f2);
11 write (f2, "Hello worldn", 12);
12 return 0;
13 }
Let’s run the program with an empty a.txt. Certainly the write function in Line 11 is going to fail.
Now, let’s comment out Line 9 and uncomment line 8 and try it again.
First I ran it in Solaris, the write call still failed. The behavior is like what APUE tells us.
Try it again in Linux – It was successful!
It seems that in Linux, /dev/fd/0 is considered by open as nothing but a normal symlink to a.txt. So it returns a completely new descriptor instead of a duplicate of the old.
Let’s try it again with a shell script:
rm -f a.txt touch a.txt exec 0<a.txt exec 3>/dev/fd/0 echo 'Hello world' >&3 cat a.txt
Run it in Linux (with DASH or BASH): Both outputed ‘Hello world’.
Run it in Solaris (with Bourne shell and BASH): Both failed, outputting nothing (Bourne shell) or failing with ‘Bad file number’ (BASH).
Conclusion:
(1) Solaris handles /dev/fd/.. specially, as APUE tells us;
(2) Linux simply consider /dev/fd/0 a symlink to the actual file.
(I’ll try later how Linux handles open("/dev/fd/0",mode) if the descriptor is an anonymous pipe or socket or something else that a normal symlink is unable to link to.
Kernels used in the above tests:
Linux: Linux desktop 2.6.28-gentoo #4 SMP Mon Jan 12 17:39:23 CST 2009 x86_64 Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz GenuineIntel GNU/Linux
Solaris: SunOS caesar 5.8 Generic_117350-51 sun4u sparc SUNW,Ultra-80 Solaris

