Posts Tagged ‘GCC’
Counterpart of __assume in GCC
By chys on July 24th, 2010Microsoft Visual C++ has a keyword __assume, which is used to pass a hint to the compiler for optimization. It can be useful when optimizing hotspots in a program. For example, if we know a loop will run at least once, we can add an __assume hint:
__assume (0 < n);
for (i=0; i < n; i++)
/* .... */
This eliminates the comparison of n against zero before entering the loop body for the first time, without changing it to a do-while structure, which is usually less preferable than a for loop. Another typical use is to tell the compiler the default branch of a switch-case statement is unreachable.
GCC does not have a counterpart. But now that GCC 4.5.0 introduced __builtin_unreachable, we can do everything in GCC what we do with __assume in MSVC. My tests show the following macro works:
#define __assume(cond) do { if (!(cond)) __builtin_unreachable(); } while (0)
(The “do { ... } while (0)” statement is just a small trick in defining macros. Actually it’s been so well known and widely used that it can hardly be called a trick anymore.)
GCC itself will use C++
By chys on June 1st, 2010It’s been announced on their mailing list that a subset of C++ will be allowed in the implementation of GCC itself.
In general, I think they’re doing the right thing. The only concern I have is, by doing so, whether they’re making bootstrapping or porting more complicated, which I will probably be doing frequently. Are they still going to allow bootstrapping on a system with only a C compiler, or are they assuming a C++ compiler is nowadays as fundamental as a C compiler?
Tags: GCC
What exactly “-march=native” means
By chys on April 12th, 2010
GCC 4.2 introduced “-march=native”. With this option, GCC automatically optimizes for local computer.
I used to think GCC would simply replace it with an appropriate “-march=....” based on results of CPUID, so, for example, it would be equivalent to “-march=core2” on a Core 2 CPU.
Actually GCC does more than I guessed. We can observe what GCC actually replaces it with:
- Run
gcc -march=native -c -o /dev/null -x c -in one terminal;
- Open another terminal and type
ps af | grep cc1Then we will see how carefully GCC detects our CPU.
I tried this using GCC 4.4.1 on a computer equipped with a Core i3 CPU, and it turns out “-march=native” means
-march=core2 -mcx16 -msahf -mpopcnt -msse4.2 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=256 -mtune=core2
on this computer. It says:
- We can use SSE4.2 instructions,
CMPXCHG16B,SAHF/LAHF(in 64-bit mode),POPCNT, in addition to all instructions available on a Core 2; - The size of L1 and L2 caches are 32KiB and 256KiB, respectively; each L1 cache line is 64 bytes;
- Use instructions that are most suitable for Core 2 (Sure Core i3 is different from Core 2, but it is too new to have specific optimization support in compilers).
So, the conclusion: always use “-march=native” when compiling codes for local use.
Tags: GCC
An Rvalue Reference Issue
By chys on November 13th, 2009I’m now convinced it was way too premature to try to take advantage of C++0x features (r-value references, etc.) in tiary (if the compiler supports).
With GCC 4.3.4, even the following innocent function leads to segmentation fault:
#include <string>
#include <utility>
std::string && my_move (std::string &str)
{
std::string && tmp = std::move (str);
return tmp;
}
In GCC 4.4, this function simply casts the non-const lvalue-reference parameter to an r-value reference and returns it, which I think is correct. In 4.3, however, tmp refers to a temporary object on stack, move-constructed from str.
Then I replaced std::string with std::list<int> and tried again. This time, GCC (4.3.4) itself segfaults. Ooops..
std::hash<std::string>
By chys on October 4th, 2009TR1 requires std::tr1::hash (std::hash in C++0x) to be instantiable for integer/floating point types, std::string and std::wstring. (C++0x added std::error_code, std::thread:id, std::bitset, std::u16string, std::u32string, and std::vector<bool>.)
But for strings, every call to std::hash<string>::operator()(std::string) incurs an unnecessary copy construction, which can be expensive in implementations where std::basic_string does not use COW.
Developers of GCC are apparently aware of this, and they added specializations std::hash<const std::string &> and std::hash<const std::wstring &> starting from GCC 4.3.
However, I still guess we cannot easily benefit from this since we will need to write something like this:
std::unordered_set<std::string, std::hash<const std::string &>>
(In C++0x it’s no longer required to insert a space between the two larger-than characters.)
Too ugly and inconvenient to use, unless our program is really time critical.
I can’t PGO compile Firefox
By chys on July 17th, 2009A normal build of Firefox for Linux is reportedly even slower than the Win32 binary running under Wine.
The reason is reportedly that the pre-compiled binary for Windows uses PGO (profile guided optimization), which is usually not enabled under Linux. Sure, the fact that GCC does not generate as efficient codes as VC may also be a reason.[1]
Firefox also supports PGO in Linux. However, I failed at this (3.5.1). The profile-generating binary always segfaults.
Other people have encountered the same problem, even with the official PKGBUILD from Arch Linux. It is said to be a compiler problem.
Well, gave up. Maybe I’ll try it again some time later, with a more “stable” version of GCC probably.
[1] This statement only applies to the 32-bit platform. It seems GCC does a very good job on x86-64.
Profile-guided optimization is a relatively new feature. GCC began supporting it starting version 4.0; Microsoft VC 2005; and Intel C/C++/Fortran 9 (?).
A typical PGO-enabled building requires three steps:
(1) Build a profile-generating binary;
(2) Run the binary, which automatically collects useful data – branch probability, etc.
(3) Rebuild the program, using the data (“profile”) from Step 2.
With PGO, Internet Explorer reportedly gains an improvement of 8%, and Firefox 11% in JavaScript.
Tags: browser, dev, GCC, Linux, optimization
Gentoo begins to mark gcc-4.3 stable
By chys on March 30th, 2009Already done in amd64. It seems it’s going to happen to all others arches very soon.
I have been waiting for this for more than six months…
GCC 4.3 eliminated some implicit inclusions among headers, and therefore has caused many compilation errors – most notably missing <cstdlib> and <cstring>. (It’s not GCC’s fault; it’s the coders’.)
I switched my default compiler from 4.2.4 to 4.3.2 just a few days ago, so I’m not a real hacker – hackers* always live on the bleeding edge. I reported only two bugs exposed by GCC 4.3 – should have been more had I switched earlier..
* A hacker is different from a cracker! Those who illegally and/or immorally crack computer systems or proprietary software should be called crackers.
GCC #pragma pack bug
By chys on December 1st, 2008#pragma pack is accepted by many C/C++ compilers as the de facto standard grammar to handle alignment of variables.
However there is an old bug in GCC, reported many times, the first of which was in 2002, still not fixed now.
#include <cstdio>
using namespace std;
#pragma pack(1)
template <typename T>
struct A {
char a;
int b;
};
A<int> x;
#pragma pack(2)
A<char> y;
#pragma pack(4)
A<short> z;
int main()
{
printf ("%d %d %dn", sizeof x,sizeof y,sizeof z);
return 0;
}
This gives 5 6 8 instead of 5 5 5 as we may expect. (VC++ and ICC both give the more reasonable 5 5 5.)
This example is not very bad. Even worse is, that this bug can damage programs that use STL. Here is an example:
a.cpp:
#include <cstdio>
#include <map>
using namespace std;
#pragma pack(1)
void foo (const map<short,const char*> &x)
{
for (map<short,const char*>::const_iterator it=x.begin();
it!=x.end(); ++it)
printf ("%d %sn", it->first, it->second);
}
b.cpp:
#include <map>
using namespace std;
void foo (const map<short,const char *> &);
int main()
{
map<short, const char *> x;
x[0] = "Hello";
x[1] = "World";
foo (x);
}
Compile a.cpp and b.cpp separately and link them together. This program segfaults if compiled with GCC, but works well with ICC or VC++.
In conclusion, for better portability and/or reliability, never use #pragma pack unless absolutely necessary. If really unavoidable, always push and pop immediately before and after the structure definition. (If the program is intended to be compiled by GCC/ICC only, it is better to use the more reliable GCC-specific __attribute__((__packed__)).)
PS. It seems Sun CC (shipped with Solaris) also has this bug. It fails for the first example here, but for the second it works well. I don’t know how manages to align pair<short,const char *> correctly…

