Posts Tagged ‘C/C++’
const in C and C++
By chys on May 31st, 2010
The const keyword has “constant” and “read-only variable” semantics in C++, but only “read-only variable” semantics in C.
To illustrate this difference, try compiling the following code:
const int x = 2; int y[x];
This is not legal C, because x is not a compile-time constant semantically.
But it is legal C++, because const also has constant semantics and thus x is a compile-time constant.
Tags: C/C++
NULL can be a valid address
By chys on April 21st, 2010It is only a convention to consider NULL (0) as an invalid pointer. Technically, the operating system or hardware does not really care if a pointer is zero or not, although operating systems may restrict the use of valid null pointers as they may be a security hole.
Consider this program:
#include <stdio.h>
#include <sys/mman.h>
int main ()
{
int *p = mmap (0, 4096, PROT_READ|PROT_WRITE,
MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
*p = 2554;
printf ("p=%p; *p=%d\n", p, *p);
return 0;
}
It attempts to make NULL (0) a valid address and then write to it. On Linux, it runs without error as root, but crashes as a normal user.
Note: Calling mmap with NULL as its first argument usually means the kernel will choose an address. However, if MAP_FIXED is also specified, it instead instructs the kernel to use the very address 0. Only privileged processes are allowed to do so; a non-privileged process only gets EPERM (Permission denied).
This also explains why MAP_FAILED is equal to (void *)-1 instead of NULL.
The clock() function
By chys on November 26th, 2009The ISO C standard specifies that
The clock function determines the processor time used.
It is clear that the result should be processor time instead of wall-clock time (real time).
It turns out that clock() in Microsoft C does return the wall-clock time instead of processor time.
I do understand Microsoft probably were not intentionally trying to violate the standard. It is meaningless to talk about processor time in DOS (nor does DOS provide any mechanism to measure processor time, afaik), and many programs used clock() to measure real time even if there were lots of system calls, disk accesses, etc. (which would make processor time significantly differ from real time in time-sharing systems). Probably Microsoft intended to maintain this “compatibility.” But is this really necessary? They could have corrected this either during the migration from single-task DOS to time-sharing Windows, or from 16-bit Windows 3 to 32-bit Windows 95/NT – just one more “incompatibility,” compared to other huge differences, not so important, was it?
An Rvalue Reference Issue
By chys on November 13th, 2009I’m now convinced it was way too premature to try to take advantage of C++0x features (r-value references, etc.) in tiary (if the compiler supports).
With GCC 4.3.4, even the following innocent function leads to segmentation fault:
#include <string>
#include <utility>
std::string && my_move (std::string &str)
{
std::string && tmp = std::move (str);
return tmp;
}
In GCC 4.4, this function simply casts the non-const lvalue-reference parameter to an r-value reference and returns it, which I think is correct. In 4.3, however, tmp refers to a temporary object on stack, move-constructed from str.
Then I replaced std::string with std::list<int> and tried again. This time, GCC (4.3.4) itself segfaults. Ooops..
std::hash<std::string>
By chys on October 4th, 2009TR1 requires std::tr1::hash (std::hash in C++0x) to be instantiable for integer/floating point types, std::string and std::wstring. (C++0x added std::error_code, std::thread:id, std::bitset, std::u16string, std::u32string, and std::vector<bool>.)
But for strings, every call to std::hash<string>::operator()(std::string) incurs an unnecessary copy construction, which can be expensive in implementations where std::basic_string does not use COW.
Developers of GCC are apparently aware of this, and they added specializations std::hash<const std::string &> and std::hash<const std::wstring &> starting from GCC 4.3.
However, I still guess we cannot easily benefit from this since we will need to write something like this:
std::unordered_set<std::string, std::hash<const std::string &>>
(In C++0x it’s no longer required to insert a space between the two larger-than characters.)
Too ugly and inconvenient to use, unless our program is really time critical.
null pointer to member
By chys on July 19th, 2009The most straightforward implementation of a pointer to member is to store the offset:
struct Struct
{
int a;
int b;
};
The internal value of &Struct::a is 0 and &Struct::b is sizeof(int).
This leads to the illusion that &Struct::a, a valid pointer, is equal to a null pointer. Not only does this violate the standard, but also breaks many codes existing in practice.
The current (pre-0x) orthodox method of defining a cast from user-defined class to bool is not using operator bool (which leads to some unwanted consequences), but something like this:
class MyClass
{
/* ....... */
struct BooleanConvert { int val; };
operator int BooleanConvert::* () const
{
return ( /* true */ ) ? &BooleanConvert::val : 0;
}
};
It is hard to read, but the idea is simple: return a valid pointer (to member) if true, a null pointer (to member) if false.
Q: It seems both pointers evaluate to 0. How do they distinguish them?
A: In practice, a null pointer to member is represented, internally, by a value of -1 instead of 0!
So,
#include <cstdio>
#include <cstddef>
using namespace std;
struct Struct
{
int a;
};
int main()
{
union
{
int Struct::*ptr;
void * val;
};
ptr = 0;
printf ("%p\n", val);
}
the above program would output 0xffffffff (32-bit systems).
Fortunately, this does not annoy. We can safely ignore this detail (except when writing a compiler, of coz..) It seems that, except using unions or brutal memory access, there is no way to convert pointers to member to/from integers and normal pointers. (C-style casts, reinterpret_cast, etc. all reject such conversions.)
Tags: C/C++
String literals
By chys on July 5th, 2009Have type const char [] in C++, but are, as an exception to the general rule, allowed to convert to type char * (though deprecated).
Have type char [] in C (even in C99 which has introduced the const keyword), but it is undefined behavior to modify them. (The standard explicitly allows storing them in read-only memory, and overlapping identical string literals.)
It seems to me that it is also possible to modify the C standard to align with C++ without altering the behavior of any existing C code. (On the other hand, we cannot modify the C++ definition of the type of string literals without affecting existing codes.)
Tags: C/C++
Rvalue reference
By chys on June 28th, 2009The new feature in C++0x was rather confusing to me until yesterday when I suddenly realized that my codes could be more efficient if we had rvalue references.
In my understanding, the main practical use of rvalue references is to eliminate spurious copies by introducing a “move” semantics in addition to the existing “copy” semantics.
Suppose we have a map object: map<int,SomeComplexType> my_map;
The most intuitive statement to add something to it is my_map[key] = value;.
In current C++, a copy assignment must be triggered here, potentially unnecessary and expensive. (“Copy” semantics.)
If value will not be used later (esp. it’s a temporary object), we may want to “move” instead of “copy” it into the map. (“Move” semantics.)
[Sure, we can use value.swap (my_map[key]); if swapping is efficient (e.g. STL strings & containers). But this is rather unreadable.]
In C++0x, with rvalue references, we can distinguish them easily:
- Use "copy" semantics in
SomeComplexType::operator = (const SomeComplexType &); - Use "move" semantics in
SomeComplexType::operator = (SomeComplexType &&);(Should we call it a "move assignment"?)
Now the compiler automatically chooses between the "copy" or "move" semantics for my_map[key] = value;, depending on whether value is an rvalue or not.
It is also possible to force the "move" semantics: my_map[key] = std::move (value);
What std::move does is accept either an lvalue or rvalue reference, and return it as an rvalue reference.
Microsoft Visual C++ supports, as a non-standard extension, binding temporary objects to non-const (lvalue) references. This extension cannot substitute rvalue references:
string a = "Hello";
string b = a;
If we use move semantics in string::string (string &), then a will be empty after b's construction. This usually is not what we desire.
Again, my main concern about C++0x is that it's going to be too complicated to learn.
Reference:
A Brief Introduction to Rvalue References
wprintf(“%s”,…)
By chys on June 28th, 2009Microsoft and GNU interprets %s differently in the wide-string version of the printf-family functions (wprintf, etc.)
Microsoft: “when used with wprintf functions, specifies a wide-character string.”
C99 and GNU: “If no l modifier is present: The const char * argument is expected to be a pointer to an array of character type (pointer to a string).”
Fortunately, both accept “%ls” for wide strings.
Unfortunately, the only supported format specifier for multi-byte (narrow) strings in C99 is “%s”, which Microsoft interpret differently.
Fortunately, the specifier that Microsoft recommends for multi-byte strings, “%hs”, is also accepted by many other C libraries, though undocumented. Such acceptance is very reasonable – the unknown prefix It seems such acceptance is necessary in order to strictly conform to the wording of C99.h is simply ignored. (I tested it with GNU and Solaris C libraries.)
Microsoft wprintf |
GNU wprintf |
C99 | |
%s |
Wide | Narrow | Narrow |
%S |
Narrow | Wide (deprecated) | |
%hs |
Narrow | Narrow (undocumented) | |
%ls |
Wide | Wide | Wide |
To draw a conclusion:
- Everybody agrees that, in
wprintf, “%ls” specifies a wide string. (I’m not sure whether VC6 supports it.) - There is no consensus on the specifier for multi-byte strings. The best practical choice is “%hs”.
This table and conclusion also apply to the “%c” family.
Tags: C/C++, portability
I hate the “c…” headers
By chys on June 19th, 2009What’s the reason for using <cstdio> instead of <stdio.h>? Merely to pretend more standard compliant?
Framers of the C++ standard probably wished to “clean” the global namespace by pulling everything into std. Unfortunately, many implementations (Microsoft, GNU, etc.) instead put all those symbols in both the global and std namespaces, rendering this argument invalid in practice.
Even more unfortunately, a few other well-known implementations (e.g. Solaris) actually followed the standards.
Actually I lost some points in a course for exactly this reason, in which the TA failed to compile on Solaris my program which compiled well on Linux. In that program I included <cstdio> but forgot to pretend std:: to two printf’s. Since then, I have always been using <name.h> rather than <cname> though “deprecated.”
To write strictly conforming programs, we need to remember what symbols are macros and what are not. The C++ standard lists those symbols which are symbols:
[Note: the names defined as macros in C include the following:
assert,errno,offsetof,setjmp,va_arg,va_end, andva_start. -end note]
They had the rarely-used setjmp here, but omitted three very important ones which the C standard says should be macros. Let’s look into the header stdio.h provided by glibc:
/* Standard streams. */
extern struct _IO_FILE *stdin; /* Standard input stream. */
extern struct _IO_FILE *stdout; /* Standard output stream. */
extern struct _IO_FILE *stderr; /* Standard error output stream. */
/* C89/C99 say they're macros. Make them happy. */
#define stdin stdin
#define stdout stdout
#define stderr stderr
They’re not in std either.
Tags: C/C++
