10/25/2010

Integer Variable Length in C/C++ on Different Platforms

While writing codes for multiple platform (in terms of both OS and CPU Arch), making the code independent of the exact byte size of each integer type in c/c++ on each specific platform becomes a typical challenging problem.

What about the ANSI C standard regarding this problem?

The standard defines 5 standard integer types:
- unsigned char - short int - int - long int - long long int

It also defines some limitations on these types in limits.h

But it didn't say explicitly on the exact byte size of each type.

Common understanding on the standard is that it requires:
sizeof(short int) <= sizeof(int) <= sizeof(long int)  sizeof(long long int)
So how about popular compiler's documentation on this?

Visual Studio 10 has an article on MSDN describes the exact size of each integer type.

From that article we can see:
sizeof(short int) = 2
sizeof(int) = 4
sizeof(long int) = 4
sizeof(long long int) = 8

And these constrains are true on both 32/64 bit platforms.

To help programmer aware of the exact size of integer types they are using, vs 10 introduces some other integer types:
__int8, __int16, __int32, __int64 and their unsigned counter parts.

In fact, ANSI c99 also defined those fixed width integer types in stdint.h
uint8_t/int8_t
uint16_t/int16_t
uint32_t/int32_t
uint64_t/int64_t

To scanf() and printf()? The format string for these types are defined in the standard header - inttypes.h. For example, this is inttypes.h for visual studio.

And here is a good summary on how to use format strings to deal with integer types in c/c++


[Reference]
1. stdint.h in C99
2. Integer Types in VS10
3. ANSI C99 Spec
4. Integers in C99

10/14/2010

Alexa and Its Ranking List

I recently read some material talking about the page view ranking of some web sites. It said that the source of the ranking data is alexa.com.

It's good to know source of referenced data but what's the confidence of these data source?

I had a look that website to learn how the ranking list is generated. Here is my understanding:

1. Alexa’s traffic rankings are based on data collected from Alexa Toolbar and other, diverse sources over a rolling 3 month period.

2. A site’s ranking is based on a combined measure of Reach and PageViews.
- Reach is determined by the number of unique Alexa users who visit a site on a given day.
- PageViews are the total number of Alexa user URL requests for a site. However, multiple requests for the same URL on the same day by the same user are counted as a single pageview.

3. Sites with relatively low traffic will not be accurately ranked by Alexa. Traffic rankings of 100,000+ should be regarded as not reliable. Conversely, the closer a site gets to #1, the more reliable its rank. Since Alexa only uses sampled data from all Alexa Toolbar and Alexa Toolbar in fact is just a small portion of the whole Internet user.

So it seems that the ranking list should not be so authoritative as very few people uses its toolbar. But why it gets so popular and important for many VCs? I guess it's mainly due to the lack of other better solutions.

The better data provider should be web browser vendors like Microsoft, Mozilla and Google. But obviously, they are not willing to share with community the data they collected for privacy concerns and potential legal issues.

[Reference]
1. How Reliable Are Your Traffic Ranking?
http://www.alexa.com/faqs/?p=139

2. How are Alexa’s traffic rankings determined?
http://www.alexa.com/faqs/?p=134

3. About the Alexa Traffic Rankings
http://www.alexa.com/help/traffic_learn_more