Posts Tagged ‘Linux’

A problem with pipes in Python 3

The most disturbing change from Python 2 to 3 definitely is not the print() function; nor that some functions which used to return lists now return iterators; nor the removal of __cmp__; but the transition to Unicode.

I’m completely supportive of the transition per se, but I’m disappointed that they’re trying to compel us to use Unicode by dropping useful functionalities for byte-streams/8-byte strings. For example, bytes has no format or % in Python 3.

I have some code like this:

proc = subprocess.Popen((....), stdout=subprocess.PIPE)
for line in proc.stdout:
    ...

I found that, on Linux, this code snippet is almost 10 times slower in Python 3 than in Python 2. Then I strace‘d the code and found Python 3 is passing length 1 to read, incurring thousands of times more system calls than Python 2. Are you kidding me? I was forced to use something like proc.stdout.read(...).

I understand this is not the direct result of the transition to Unicode, but it is somehow related.

Tags: ,

A hack to strace -f

I have a multithreaded program which I would like to strace for debugging purpose. My program sometimes calls (fork and exec) an external program, which in turn calls a setuid program.

Because my program is multithreaded, I cannot omit the “-f” flag (also trace child threads and processes) when using strace. And because all children, including the setuid program, are traced, setuid fails. (Yes, I am aware that strace claims it is possible to trace setuid programs, but the trick does not work for me, probably because the setuid program is not directly executed by strace.)

Fortunately, the clone system call has many useful flags. It works fine for me when I substitute calls to fork() with:

(pid_t) syscall (__NR_clone, CLONE_UNTRACED|SIGCHLD, NULL);

(Yes, SIGCHLD, not CLONE_SIGCHLD. It’s not a typo.)

I guess there may be better solutions, without modifying the program being traced?

Tags: , ,

Concatenate PDF files in Linux

Some people recommend using convert or gs. However, there is a major problem with them – all text and vector graphics become raster graphics.

pdftk (PDF ToolKit) is a better solution – it keeps text and vector graphics. We just have to use this command:

pdftk *.pdf cat output result.pdf

PDF Shuffler, written in Python and based on poppler, also does the trick, and has a nice GUI.

However, there are drawbacks for both pdftk and PDF Shuffler:

  • pdftk only supports ASCII filenames. So it’s a bit inconvenient for non-English users like me.
  • PDF Shuffler is way too slow. I tried concatenating several files (approx. 1000 pages), and it kept running for more than 10 minutes before I hit Ctrl-C; pdftk finished the same task in just a few seconds.

Tags:

Convert PDF to images in Linux

Just use convert, the universal image converter shipped with ImageMagick.

convert a.pdf a.png

And we get as many PNG files as there are pages in the PDF. They converted files are named a-0.png, a-1.png, …

We can also use it the other way around:

convert a.jpg b.png c.gif abc.pdf

This will combine the three images into one PDF file. Very flexible.

Tags: ,

man 3 sleep

The man page of sleep(3) used to say:

sleep() makes the calling process sleep until seconds seconds have elapsed or a signal arrives which is not ignored. [Old version; color added]

Now it says:

sleep() makes the calling thread sleep until seconds seconds have elapsed or a signal arrives which is not ignored. [New version]

Alas, they finally did the right thing – we have waited for at least 5 years. This change happened some time between versions 3.23 and 3.26.

There is indeed no reason to use “process” anymore, at least since the emergence of Linux 2.6 and NPTL. As far as I know, this very sentence has confused many newbie Linux programmers who are not familiar with the history of Linux multithreading, and has led some to firmly believe there is no “real” thread or threads are actually processes under Linux, a statement which was probably right for the obsolete implementation LinuxThreads, but definitely wrong today.

Tags: ,

Switch from syslog-ng to rsyslog

I just think it is funny for a package as fundamental as system logger to depend on glib (part of the GTK+ project). (Of course, I don’t mean glib is a bad library. I’m not implying anything like that.)

Of course there are other reasons, but they are less important:

  • Many distributions also prefer rsyslog to syslog-ng because it’s more powerful and scalable. [1] [2]
  • Several people complained against the performance of syslog-ng [3], although the performance of syslog is normally not an issue for individual users.
  • Third, I don’t understand why the author of syslog-ng prefers listening on /dev/log as a stream socket (local counterpart of TCP) instead of a datagram one (local counterpart of UDP). (I know it can be changed in the configuration file.) None of the common issues of UDP is present in local datagram sockets: too short length limit; out of order; unreliability. On the contrary, syslog needs unidirectional channels carrying boundary-maintained messages, for which nothing can be more suitable.
  • Finally, licensing issues, upstream responsiveness, etc. They are less important for individual users, though.

Sure, syslog-ng also has its advantages. For example, I like the format of its configuration files.

Tags: ,

CPU Frequency Governor

Kernel documentation recommends “conservative” for laptops, citing latency reasons.

However, Intel explicitly recommends “ondemand” in its powertop. So does at least one Intel kernel developer.

OK. I have been using “conservative.” I decide to switch to “ondemand” from now on.

Reference

Tags: ,

NULL can be a valid address

It is only a convention to consider NULL (0) as an invalid pointer. Technically, the operating system or hardware does not really care if a pointer is zero or not, although operating systems may restrict the use of valid null pointers as they may be a security hole.

Consider this program:

#include <stdio.h>
#include <sys/mman.h>

int main ()
{
    int *p = mmap (0, 4096, PROT_READ|PROT_WRITE,
        MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    *p = 2554;
    printf ("p=%p; *p=%d\n", p, *p);
    return 0;
}

It attempts to make NULL (0) a valid address and then write to it. On Linux, it runs without error as root, but crashes as a normal user.

Note: Calling mmap with NULL as its first argument usually means the kernel will choose an address. However, if MAP_FIXED is also specified, it instead instructs the kernel to use the very address 0. Only privileged processes are allowed to do so; a non-privileged process only gets EPERM (Permission denied).

This also explains why MAP_FAILED is equal to (void *)-1 instead of NULL.

Tags: , ,

Removed some distribution-specific patches

With all due respect to Gentoo developers, I really hate the patches they made for coreutils, especially the one to have uname parse /proc/cpuinfo.

The result is that uname -a displays more info, specifically the mode of the CPU, on Gentoo than other Linux distributions. Generally this is not a bad thing. But I do have concerns:

(1) In my view the utility uname should remain a simple wrapper of the homonym system call. If CPU/vendor info really needs to be returned by uname, it is better to add it to kernel. This is also part of the reason why upstream rejected this patch.

(2) If I am used to finding CPU info from uname, I will likely forget the more orthodox method (cat /proc/cpuinfo). A job interviewer may not be that patient to listen to my explanation about the patch.

Tags: ,

Extract Deb files from command line

Debian and its derivatives use the .deb format to distribute their packages. To extract them, use ar – Yes, the very program we programmers use to make static libraries.

ar x sudo_1.6.9p17-2_i386.deb

Or we can directly extract things from data.tar.gz contained in the .deb file:

ar p sudo_1.6.9p17-2_i386.deb data.tar.gz | tar -xzf -

No longer a user of Debian GNU/Linux, I still have to remember how to extract .deb files. I frequently need to cross-compile a 64-bit version of my program on a 32-bit system, and vice versa; but I don’t want to cross-compile by myself so many libraries on which my program depends. Instead, I find it a good idea to download a right .deb file from the Debian Packages Repository and pick out the .so files.

Tags: , , ,