Posts Tagged ‘Linux’
A problem with pipes in Python 3
By chys on November 14th, 2011The most disturbing change from Python 2 to 3 definitely is not the print() function; nor that some functions which used to return lists now return iterators; nor the removal of __cmp__; but the transition to Unicode.
I’m completely supportive of the transition per se, but I’m disappointed that they’re trying to compel us to use Unicode by dropping useful functionalities for byte-streams/8-byte strings. For example, bytes has no format or % in Python 3.
I have some code like this:
proc = subprocess.Popen((....), stdout=subprocess.PIPE) for line in proc.stdout: ...
I found that, on Linux, this code snippet is almost 10 times slower in Python 3 than in Python 2. Then I strace‘d the code and found Python 3 is passing length 1 to read, incurring thousands of times more system calls than Python 2. Are you kidding me? I was forced to use something like proc.stdout.read(...).
I understand this is not the direct result of the transition to Unicode, but it is somehow related.
A hack to strace -f
By chys on March 25th, 2011
I have a multithreaded program which I would like to strace for debugging purpose. My program sometimes calls (fork and exec) an external program, which in turn calls a setuid program.
Because my program is multithreaded, I cannot omit the “-f” flag (also trace child threads and processes) when using strace. And because all children, including the setuid program, are traced, setuid fails. (Yes, I am aware that strace claims it is possible to trace setuid programs, but the trick does not work for me, probably because the setuid program is not directly executed by strace.)
Fortunately, the clone system call has many useful flags. It works fine for me when I substitute calls to fork() with:
(pid_t) syscall (__NR_clone, CLONE_UNTRACED|SIGCHLD, NULL);
(Yes, SIGCHLD, not CLONE_SIGCHLD. It’s not a typo.)
I guess there may be better solutions, without modifying the program being traced?
Tags: dev, Linux, multithread
Concatenate PDF files in Linux
By chys on December 26th, 2010Some people recommend using convert or gs. However, there is a major problem with them – all text and vector graphics become raster graphics.
pdftk (PDF ToolKit) is a better solution – it keeps text and vector graphics. We just have to use this command:
pdftk *.pdf cat output result.pdf
PDF Shuffler, written in Python and based on poppler, also does the trick, and has a nice GUI.
However, there are drawbacks for both pdftk and PDF Shuffler:
- pdftk only supports ASCII filenames. So it’s a bit inconvenient for non-English users like me.
- PDF Shuffler is way too slow. I tried concatenating several files (approx. 1000 pages), and it kept running for more than 10 minutes before I hit Ctrl-C; pdftk finished the same task in just a few seconds.
Tags: Linux
Convert PDF to images in Linux
By chys on December 11th, 2010Just use convert, the universal image converter shipped with ImageMagick.
convert a.pdf a.png
And we get as many PNG files as there are pages in the PDF. They converted files are named a-0.png, a-1.png, …
We can also use it the other way around:
convert a.jpg b.png c.gif abc.pdf
This will combine the three images into one PDF file. Very flexible.
man 3 sleep
By chys on October 17th, 2010The man page of sleep(3) used to say:
sleep() makes the calling process sleep until seconds seconds have elapsed or a signal arrives which is not ignored. [Old version; color added]
Now it says:
sleep() makes the calling thread sleep until seconds seconds have elapsed or a signal arrives which is not ignored. [New version]
Alas, they finally did the right thing – we have waited for at least 5 years. This change happened some time between versions 3.23 and 3.26.
There is indeed no reason to use “process” anymore, at least since the emergence of Linux 2.6 and NPTL. As far as I know, this very sentence has confused many newbie Linux programmers who are not familiar with the history of Linux multithreading, and has led some to firmly believe there is no “real” thread or threads are actually processes under Linux, a statement which was probably right for the obsolete implementation LinuxThreads, but definitely wrong today.
Tags: Linux, multithread
Switch from syslog-ng to rsyslog
By chys on July 16th, 2010I just think it is funny for a package as fundamental as system logger to depend on glib (part of the GTK+ project). (Of course, I don’t mean glib is a bad library. I’m not implying anything like that.)
Of course there are other reasons, but they are less important:
- Many distributions also prefer rsyslog to syslog-ng because it’s more powerful and scalable. [1] [2]
- Several people complained against the performance of syslog-ng [3], although the performance of syslog is normally not an issue for individual users.
- Third, I don’t understand why the author of syslog-ng prefers listening on
/dev/logas a stream socket (local counterpart of TCP) instead of a datagram one (local counterpart of UDP). (I know it can be changed in the configuration file.) None of the common issues of UDP is present in local datagram sockets: too short length limit; out of order; unreliability. On the contrary, syslog needs unidirectional channels carrying boundary-maintained messages, for which nothing can be more suitable. - Finally, licensing issues, upstream responsiveness, etc. They are less important for individual users, though.
Sure, syslog-ng also has its advantages. For example, I like the format of its configuration files.
CPU Frequency Governor
By chys on May 21st, 2010Kernel documentation recommends “conservative” for laptops, citing latency reasons.
However, Intel explicitly recommends “ondemand” in its powertop. So does at least one Intel kernel developer.
OK. I have been using “conservative.” I decide to switch to “ondemand” from now on.
NULL can be a valid address
By chys on April 21st, 2010It is only a convention to consider NULL (0) as an invalid pointer. Technically, the operating system or hardware does not really care if a pointer is zero or not, although operating systems may restrict the use of valid null pointers as they may be a security hole.
Consider this program:
#include <stdio.h>
#include <sys/mman.h>
int main ()
{
int *p = mmap (0, 4096, PROT_READ|PROT_WRITE,
MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
*p = 2554;
printf ("p=%p; *p=%d\n", p, *p);
return 0;
}
It attempts to make NULL (0) a valid address and then write to it. On Linux, it runs without error as root, but crashes as a normal user.
Note: Calling mmap with NULL as its first argument usually means the kernel will choose an address. However, if MAP_FIXED is also specified, it instead instructs the kernel to use the very address 0. Only privileged processes are allowed to do so; a non-privileged process only gets EPERM (Permission denied).
This also explains why MAP_FAILED is equal to (void *)-1 instead of NULL.
Removed some distribution-specific patches
By chys on March 24th, 2010With all due respect to Gentoo developers, I really hate the patches they made for coreutils, especially the one to have uname parse /proc/cpuinfo.
The result is that uname -a displays more info, specifically the mode of the CPU, on Gentoo than other Linux distributions. Generally this is not a bad thing. But I do have concerns:
(1) In my view the utility uname should remain a simple wrapper of the homonym system call. If CPU/vendor info really needs to be returned by uname, it is better to add it to kernel. This is also part of the reason why upstream rejected this patch.
(2) If I am used to finding CPU info from uname, I will likely forget the more orthodox method (cat /proc/cpuinfo). A job interviewer may not be that patient to listen to my explanation about the patch.
Extract Deb files from command line
By chys on February 2nd, 2010Debian and its derivatives use the .deb format to distribute their packages. To extract them, use ar – Yes, the very program we programmers use to make static libraries.
ar x sudo_1.6.9p17-2_i386.deb
Or we can directly extract things from data.tar.gz contained in the .deb file:
ar p sudo_1.6.9p17-2_i386.deb data.tar.gz | tar -xzf -
No longer a user of Debian GNU/Linux, I still have to remember how to extract .deb files. I frequently need to cross-compile a 64-bit version of my program on a 32-bit system, and vice versa; but I don’t want to cross-compile by myself so many libraries on which my program depends. Instead, I find it a good idea to download a right .deb file from the Debian Packages Repository and pick out the .so files.
Tags: compression, Debian, Gentoo, Linux
