Posts Tagged ‘C/C++’

Integer division

C89 and C++98 say the result of an integer division where the divisor and/or dividend is negative is implementation defined. This reflects that early hardware implemented integer divisions differently.

According to C89/C++98, we may have either (-3)/2 == -1 (round toward zero) or (-3)/2 == -2 (round toward negative infinity).

It appears round toward zero has become the overwhelming de facto standard now, adopted by both hardware and software vendors. Now both C and C++ explicitly require round toward zero in their new standards (C99 and C++2011*).

Division of negative integers has always been a complicated problem. Fortran mandated the same round-toward-zero mode much earlier than C/C++; so did Java. Python, on the other hand, has required round toward -∞ (i.e. (-3)//2 == -2) from its beginning. Everybody, nevertheless, agrees that a/b*b + a%b == a should always hold.

* C++0x has yet to be officially approved. Hopefully it will be approved within this year and known as C++2011. I’m using this name prematurely.

Tags: , ,

Reference to array

I recall Microsoft uses this trick in some of its headers. I believe it’s something like this:

template <size_t _Size> inline
char *strcpy (char (&dst)[_Size], const char *src)
{
    strcpy_s (dst, _Size, src);
    return dst;
}

Tags:

const in C and C++

The const keyword has “constant” and “read-only variable” semantics in C++, but only “read-only variable” semantics in C.

To illustrate this difference, try compiling the following code:

const int x = 2;
int y[x];

This is not legal C, because x is not a compile-time constant semantically.

But it is legal C++, because const also has constant semantics and thus x is a compile-time constant.

Tags:

NULL can be a valid address

It is only a convention to consider NULL (0) as an invalid pointer. Technically, the operating system or hardware does not really care if a pointer is zero or not, although operating systems may restrict the use of valid null pointers as they may be a security hole.

Consider this program:

#include <stdio.h>
#include <sys/mman.h>

int main ()
{
    int *p = mmap (0, 4096, PROT_READ|PROT_WRITE,
        MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    *p = 2554;
    printf ("p=%p; *p=%d\n", p, *p);
    return 0;
}

It attempts to make NULL (0) a valid address and then write to it. On Linux, it runs without error as root, but crashes as a normal user.

Note: Calling mmap with NULL as its first argument usually means the kernel will choose an address. However, if MAP_FIXED is also specified, it instead instructs the kernel to use the very address 0. Only privileged processes are allowed to do so; a non-privileged process only gets EPERM (Permission denied).

This also explains why MAP_FAILED is equal to (void *)-1 instead of NULL.

Tags: , ,

The clock() function

The ISO C standard specifies that

The clock function determines the processor time used.

It is clear that the result should be processor time instead of wall-clock time (real time).

It turns out that clock() in Microsoft C does return the wall-clock time instead of processor time.

I do understand Microsoft probably were not intentionally trying to violate the standard. It is meaningless to talk about processor time in DOS (nor does DOS provide any mechanism to measure processor time, afaik), and many programs used clock() to measure real time even if there were lots of system calls, disk accesses, etc. (which would make processor time significantly differ from real time in time-sharing systems). Probably Microsoft intended to maintain this “compatibility.” But is this really necessary? They could have corrected this either during the migration from single-task DOS to time-sharing Windows, or from 16-bit Windows 3 to 32-bit Windows 95/NT – just one more “incompatibility,” compared to other huge differences, not so important, was it?

Tags: , , ,

An Rvalue Reference Issue

I’m now convinced it was way too premature to try to take advantage of C++0x features (r-value references, etc.) in tiary (if the compiler supports).

With GCC 4.3.4, even the following innocent function leads to segmentation fault:

#include <string>
#include <utility>

std::string && my_move (std::string &str)
{
    std::string && tmp = std::move (str);
    return tmp;
}

In GCC 4.4, this function simply casts the non-const lvalue-reference parameter to an r-value reference and returns it, which I think is correct. In 4.3, however, tmp refers to a temporary object on stack, move-constructed from str.

Then I replaced std::string with std::list<int> and tried again. This time, GCC (4.3.4) itself segfaults. Ooops..

Tags: , , ,

std::hash<std::string>

TR1 requires std::tr1::hash (std::hash in C++0x) to be instantiable for integer/floating point types, std::string and std::wstring. (C++0x added std::error_code, std::thread:id, std::bitset, std::u16string, std::u32string, and std::vector<bool>.)

But for strings, every call to std::hash<string>::operator()(std::string) incurs an unnecessary copy construction, which can be expensive in implementations where std::basic_string does not use COW.

Developers of GCC are apparently aware of this, and they added specializations std::hash<const std::string &> and std::hash<const std::wstring &> starting from GCC 4.3.

However, I still guess we cannot easily benefit from this since we will need to write something like this:

std::unordered_set<std::string, std::hash<const std::string &>>

(In C++0x it’s no longer required to insert a space between the two larger-than characters.)

Too ugly and inconvenient to use, unless our program is really time critical.

Tags: , , ,

null pointer to member

The most straightforward implementation of a pointer to member is to store the offset:

struct Struct
{
    int a;
    int b;
};

The internal value of &Struct::a is 0 and &Struct::b is sizeof(int).

This leads to the illusion that &Struct::a, a valid pointer, is equal to a null pointer. Not only does this violate the standard, but also breaks many codes existing in practice.

The current (pre-0x) orthodox method of defining a cast from user-defined class to bool is not using operator bool (which leads to some unwanted consequences), but something like this:

class MyClass
{
   /* ....... */
   struct BooleanConvert { int val; };
   operator int BooleanConvert::* () const
   {
       return ( /* true */ ) ? &BooleanConvert::val : 0;
   }
};

It is hard to read, but the idea is simple: return a valid pointer (to member) if true, a null pointer (to member) if false.

Q: It seems both pointers evaluate to 0. How do they distinguish them?

A: In practice, a null pointer to member is represented, internally, by a value of -1 instead of 0!

So,

#include <cstdio>
#include <cstddef>
using namespace std;

struct Struct
{
    int a;
};

int main()
{
    union
    {
        int Struct::*ptr;
        void * val;
    };
    ptr = 0;
    printf ("%p\n", val);
}

the above program would output 0xffffffff (32-bit systems).

Fortunately, this does not annoy. We can safely ignore this detail (except when writing a compiler, of coz..) It seems that, except using unions or brutal memory access, there is no way to convert pointers to member to/from integers and normal pointers. (C-style casts, reinterpret_cast, etc. all reject such conversions.)

Tags:

String literals

Have type const char [] in C++, but are, as an exception to the general rule, allowed to convert to type char * (though deprecated).

Have type char [] in C (even in C99 which has introduced the const keyword), but it is undefined behavior to modify them. (The standard explicitly allows storing them in read-only memory, and overlapping identical string literals.)

It seems to me that it is also possible to modify the C standard to align with C++ without altering the behavior of any existing C code. (On the other hand, we cannot modify the C++ definition of the type of string literals without affecting existing codes.)

Tags:

Rvalue reference

The new feature in C++0x was rather confusing to me until yesterday when I suddenly realized that my codes could be more efficient if we had rvalue references.

In my understanding, the main practical use of rvalue references is to eliminate spurious copies by introducing a “move” semantics in addition to the existing “copy” semantics.

Suppose we have a map object: map<int,SomeComplexType> my_map;

The most intuitive statement to add something to it is my_map[key] = value;.

In current C++, a copy assignment must be triggered here, potentially unnecessary and expensive. (“Copy” semantics.)

If value will not be used later (esp. it’s a temporary object), we may want to “move” instead of “copy” it into the map. (“Move” semantics.)
[Sure, we can use value.swap (my_map[key]); if swapping is efficient (e.g. STL strings & containers). But this is rather unreadable.]

In C++0x, with rvalue references, we can distinguish them easily:

  1. Use "copy" semantics in SomeComplexType::operator = (const SomeComplexType &);
  2. Use "move" semantics in SomeComplexType::operator = (SomeComplexType &&); (Should we call it a "move assignment"?)

Now the compiler automatically chooses between the "copy" or "move" semantics for my_map[key] = value;, depending on whether value is an rvalue or not.

It is also possible to force the "move" semantics: my_map[key] = std::move (value);

What std::move does is accept either an lvalue or rvalue reference, and return it as an rvalue reference.


Microsoft Visual C++ supports, as a non-standard extension, binding temporary objects to non-const (lvalue) references. This extension cannot substitute rvalue references:

string a = "Hello";
string b = a;

If we use move semantics in string::string (string &), then a will be empty after b's construction. This usually is not what we desire.


Again, my main concern about C++0x is that it's going to be too complicated to learn.


Reference:
A Brief Introduction to Rvalue References

Tags: ,