Posts Tagged ‘dev’

ELF Symbol table

I’ve recently been doing some hacking (for food). I need to modify some object files (.o) generated by GCC before they’re passed to linker. (Using elfutils)

I thought I was doing everything right, but ld (GNU linker) insisted on crashing with segmentation fault. Finally, eu-elflint told me what was wrong – I was unaware of the ELF requirement that LOCAL symbols must come before GLOBAL symbols in a symtab (symbol table), and that sh_info is the boundary between LOCAL and GLOBAL symbols.

Many thanks to authors of elfutils, especially for their eu-elflint!

Tags: , , , ,

A hack to strace -f

I have a multithreaded program which I would like to strace for debugging purpose. My program sometimes calls (fork and exec) an external program, which in turn calls a setuid program.

Because my program is multithreaded, I cannot omit the “-f” flag (also trace child threads and processes) when using strace. And because all children, including the setuid program, are traced, setuid fails. (Yes, I am aware that strace claims it is possible to trace setuid programs, but the trick does not work for me, probably because the setuid program is not directly executed by strace.)

Fortunately, the clone system call has many useful flags. It works fine for me when I substitute calls to fork() with:

(pid_t) syscall (__NR_clone, CLONE_UNTRACED|SIGCHLD, NULL);

(Yes, SIGCHLD, not CLONE_SIGCHLD. It’s not a typo.)

I guess there may be better solutions, without modifying the program being traced?

Tags: , ,

Acquire and Release Semantics

The concept of acquire and release semantics is important for multi-threaded programs that run on more than one physical core or processor. MSDN has a clear and concise explanation of then.

Consider the following code example:

a++;
b++;
c++;

From another processor’s point of view, the preceding operations can appear to occur in any order. For example, the other processor might see the increment of b before the increment of a.

[T]he InterlockedIncrementAcquire routine uses acquire semantics to increment a variable. If you rewrote the preceding code example as follows:

InterlockedIncrementAcquire(&a);
b++;
c++;

other processors would always see the increment of a before the increments of b and c.

Likewise, the InterlockedIncrementRelease routine uses release semantics to increment a variable. If you rewrote the code example once again, as follows:

a++;
b++;
InterlockedIncrementRelease(&c);

other processors would always see the increments of a and b before the increment of c.

The operation of acquiring a lock must have acquire semantics; and the operation of releasing a lock must have release semantics. This is probably where they get their names.

Tags: ,

The Tuesday following the first Monday in November

Martin Luther King Day falls the third Monday of January; Daylight Saving Time begins on the second Sunday of March; Father’s Day falls on the third Sunday in June; …

To represent these dates in a program, the intuitive method is to use tuples (3,1,1), (2,0,3), (3,0,6), respectively. But probably this is not the best idea. The number of representable observations is limited. For instance, we cannot represent Election Day (Tuesday following the first Monday of November) or Memorial Day (last Monday of May).

Nevertheless, it is still possible to represent all the above in three small integers, by replacing the first integer in the tuple with the earliest possible day of month. For instance,

Martin Luther King Day = the Monday between January 15 and 21, represented by (15,1,1)

Election Day = the Tuesday between November 2 and 8, represented by (2,2,11)

Thanksgiving (before 1939) = last Thursday of November = the Thursday between November 24 and 30, represented by (24,4,11)

Thanksgiving (modern) = fourth Thursday of November = the Thursday between November 22 and 28, represented by (22,4,11)

Tags:

PDP-endian

Today I checked <endian.h> and found an unfamiliar line:

#define __PDP_ENDIAN 3412

I knew there were little-endian machines (e.g. all Intel CPUs*) and big-endian machines (e.g. PowerPC†), but was really unaware of the so-called PDP-endian. It’s word-wise big-endian, and within a word‡, little-endian.

char c[5] = {};
*(int32_t *)c = 0x61626364;
puts (c);

ASCII supposed, this program segment produces “ABCD” on a big-endian machine, and “DCBA” on a little-endian machine. It seems, on a PDP-11, it should output “BADC”.

Little-endian is friendly to programmers (in some sense). Big-endian is intuitive. But what’s PDP-endian for?

* Itanium supports both little-endian and big-endian. Windows and Linux for Itanium both use little endian.
† PowerPC supports both little-endian and big-endian. Macintosh used big endian.
‡ The term “word” here stands for two bytes. This usage is believed to be wrong though used by Intel, but I just don’t find a good substitute.

Tags:

Ack is a good alternative to grep

When I grep -r something in a Subversion-controlled directory, I get a lot of results under the .svn.

In this sense, Git is better than Subversion. (Git creates only one .git directory, and stores data in some compressed format which gets ignored by grep -r.)

So I have switched to ack when working with Subversion. Ack is written in Perl, claimed to be “aimed at programmers with large trees of heterogeneous source code.”

Tags: ,

To Risk Life or Job

From my[confined]space

Code

A comment says:

On the other hand, if you make your code confusing as hell, you will have much better job security.


I have been assuming some maniac serial killer would be maintaining it while doing my recent (private, but to be public) project, but still find my own codes could be confusing.

Anyway, I am glad that this did not happen until it reached about 70 files and 8,000 lines – the threshold used to be 2 files and 300 lines. The project now has about 110 files and 14,000 lines.

Tags: ,

I can’t PGO compile Firefox

A normal build of Firefox for Linux is reportedly even slower than the Win32 binary running under Wine.

The reason is reportedly that the pre-compiled binary for Windows uses PGO (profile guided optimization), which is usually not enabled under Linux. Sure, the fact that GCC does not generate as efficient codes as VC may also be a reason.[1]

Firefox also supports PGO in Linux. However, I failed at this (3.5.1). The profile-generating binary always segfaults.

Other people have encountered the same problem, even with the official PKGBUILD from Arch Linux. It is said to be a compiler problem.

Well, gave up. Maybe I’ll try it again some time later, with a more “stable” version of GCC probably.

[1] This statement only applies to the 32-bit platform. It seems GCC does a very good job on x86-64.


Profile-guided optimization is a relatively new feature. GCC began supporting it starting version 4.0; Microsoft VC 2005; and Intel C/C++/Fortran 9 (?).

A typical PGO-enabled building requires three steps:

(1) Build a profile-generating binary;
(2) Run the binary, which automatically collects useful data – branch probability, etc.
(3) Rebuild the program, using the data (“profile”) from Step 2.

With PGO, Internet Explorer reportedly gains an improvement of 8%, and Firefox 11% in JavaScript.

Tags: , , , ,

Premature optimization

is the root of all evil, according to Donald Knuth.

Prof. Knuth wrote in his paper Structured Programming with Go To Statements:

Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

From time to time, I knew I should have first used a naïve algorithm, not beginning optimizing until I was sure they worked, but I have always been violating the rule.

Tags:

liblzmadec sucks

Both the executable lzma and the library liblzmadec are included in the package lzma-utils (in Gentoo).

If I compress a file a.txt like this:

lzma a.txt

then liblzmadec can decompress a.txt.lzma perfectly.

However, if I compress it like this:

lzma < a.txt > a.txt.lzma

Then liblzmadec fails.


LZMA is a compression algorithm whose application rate has been rapidly growing in the past year. In most cases, it has significantly better compression ratio compressed to bzip2 and gzip, and decompresses significantly faster. However, its compression process is extremely slow (probably 10 times longer the time than bzip2).

One of the algorithms supported by 7zip is LZMA. In Linux we usually use LZMA Utils instead.

GNU provided the tarball of some versions of coreutils in LZMA (in addition to the traditional GZIP), although they have recently switched to xz.

Tags: ,