Archive for March, 2009

Gentoo begins to mark gcc-4.3 stable

Already done in amd64. It seems it’s going to happen to all others arches very soon.

I have been waiting for this for more than six months…

GCC 4.3 eliminated some implicit inclusions among headers, and therefore has caused many compilation errors – most notably missing <cstdlib> and <cstring>. (It’s not GCC’s fault; it’s the coders’.)

I switched my default compiler from 4.2.4 to 4.3.2 just a few days ago, so I’m not a real hacker – hackers* always live on the bleeding edge. I reported only two bugs exposed by GCC 4.3 – should have been more had I switched earlier..

* A hacker is different from a cracker! Those who illegally and/or immorally crack computer systems or proprietary software should be called crackers.

Tags: ,

VIM sessions

This post basically provides the same info as Reference [1] does.

I’m tired of typing tabe many times every time I continue with a project that has many files.

Sure, there is no reason for an omnipotent editor* not to support sessions. Command :mksession (mks) saves the session info to a file (Session.vim by default) that can be source‘d the next time we use VIM. Tell VIM what should or should not be saved by using set sessionoptions.

I’m too lazy to type all these commands once and again, so I added the following lines to ~/.vimrc:

set sessionoptions=sesdir,folds,tabpages
com SL source Session.vim
com SX call SessionSaveAndExit()
function SessionSaveAndExit()
    wa
    mks!
    qa
endfunction

Every time I decide to continue with working, I start VIM and type :SL. When I’m finished, I type :SX and VIM saves all my files and exits.

I dislike to define too many keyboard shortcuts since I frequently press the wrong keys and have no idea what’s just happened (fortunately we are using VIM instead of the original Vi and we can undo and redo easily), so I use com (abbr. command) instead of nmap.

* I don’t mean to offend Emacsers – Emacs is an OS rather than an editor…

References
[1] lambda.oasis: Vim Sessions
[2] VIM on-line help, of course (e.g. type :help sessionoptions in VIM)

Tags:

Ext4 data loss

I, too, have experienced data losses in ext4 partitions.

There was some problem with my X that hangs the system about once a week (I was upgrading my X system and drivers too aggressively) so sometimes I had to hard reboot my computer (even magic SysRq does not respond in such cases). I lost most or all of my KDE settings after the reboots, and for one time also all my Thunderbird settings. This never happened until I migrated to ext4..

So I converted all my ext4′s back to ext3 and downgraded the kernel to 2.6.27 which is considered by Gentoo as stable. Loss of KDE settings is no big deal, but I really don’t want to have a kernel bug erase my codes, or homework that is due tomorrow, or diary I’ve kept for eight years.

Anyway, the performance of ext4 is really good..

References – Other recent reports of ext4 data losses:
[1] https://bugs.edge.launchpad.net/ubuntu/+source/linux/+bug/317781/comments/45
[2] http://www.h-online.com/open/Ext4-data-loss-explanations-and-workarounds–/news/112892
[3] http://www.h-online.com/open/Possible-data-loss-in-Ext4–/news/112821
[4] http://cookinglinux.cn/ext4-lose-data.html (Chinese)

Tags: , ,

Every C programmer should learn some assembly

I am more convinced of this now.

One of the most frequently asked questions in C is the difference between a pointer and an array. A newbie in C often finds it “mission impossible” to differentiate between the following four variable types:
char p1[][8] = { "Hello", "world" };
char *p2[8] = { "Hello", "world" };
char (*p3)[8] = p1;
char **p4 = p2;

And it really is difficult to explain it clearly in a few words. However, if one knows some assembly, one can check the assembly listing generated by an assemblera compiler and at least the difference between p1 and p2 should be straightforward:

p1:
    .string "Hello"
    .zero 2
    .string "world"
    .zero 2
.LC0:
    .string "Hello"
.LC1:
    .string "world"
p2:
    .long .LC0
    .long .LC1
p3:
    .long p1
p4:
    .long p2

(I prefer the AT&T-style assembly)

I feel so lucky that I had learned some assembly used in NES before starting C. So for me “pointer” has always been a very natural concept and surely different from an array. Many poor freshmen undergrads had to begin with C++ without any knowledge in assembly or C or even any other language – I would have been crazy had I been under such a situation.

Tags: ,

UTF-8

UTF-8 is known for being self-synchronizing (self-segregating) by design. Therefore it is very robust against occasional errors. If one byte is accidentally missing in a string encoded in GB18030, it can happen that the whole string becomes broken and unreadable. However, for UTF-8, any bad byte breaks only one character.

For programmers, self-synchronization can mean more than just robustness, for example:

We know that, generally speaking, strstr cannot be used for strings in multi-byte encodings (the final byte of one character and the first byte of the next can happen to match the needle) – we have to either convert them to wchar_t‘s and then use wcsstr, or use a more complicated substring search algorithm that takes care of multi-byte characters (Microsoft’s _mbsstr, for example).

However, for UTF-8 strings, strstr is absolutely safe and works as expected, so long as the two parameters are both valid UTF-8. It is not difficult to figure out.

Tags: ,