On User Credentials for Web Site

There are several critical password leak events at the end of 2011 that happened in some leading Chinese internet companies, such as CSDN (leading technology community), Tianya (leading discussion community) and RenRen (leading social networking). These leakages have big impact to many Chinese internet users’ daily web life. So as a technical guy, I did some investigation and make some summaries here to avoid such disasters if I were the product owner and also to make my Internet account more secure.

Part I – Technical Background

Plain Tex v.s. Hash Text
- Store plain text of a password is dangerous in case of user data leakage, but it seems that almost all popular web sites do store it. At least it’s true in China.
- Hashing is a way to transform the plain text into some meaningless (for people) strings that are almost impossible to covert back to original text. It’s more secure than plain text in terms of storing user password.
- Typical Hash algorithms are: MD5, SHA1, SHA256, SHA512, SHA-3
Attacking Hash
- With ideal hash algorithm, it’s impossible to convert hashed text back to original text directly, but people can accomplish this using dictionary or brute-force based approaches
- Dictionary: attacker can precompute the hash value of popular passwords using some specific hash algorithm and compare the output with hashed text
- Brute-force: enumerate/compute all possible password and compare it with hashed text
Defense Hash Attacking
- Defense Dictionary based Attack
* Using multiple hash functions together: there are only a few popular hash algorithms, so pre-computing and storing popular passwords’ dict are cheap. But if you uses multiple hash functions in some order, attacking will become very slow and will not be practically due to huge potential result space. Alternatively, you can also hash plain password text multiple times using the same algorithm.
* Write your own hash function, thus the attacker can’t do the pre-computation.
* Add salt to plain user password before hashing it to secure text.
- Defense Brute-force based Attack
* Adopt heavy hashing function, for example, the BCRYPT algorithm.
* Write your own hash algorithm.
Rainbow Table
- It’s a variant of the naive dictionary based hash/encryption attacking that reduces spaces to store precomputed dict with the cost of more CPU during precomputing and looking up.
- It’s based on the idea of hash chain:chain a series of text with hashing/reduction, store just the head and tail, intermediate texts can be computed during looking up.
- Rainbow table further improved hash chain’s collision problem by adopting different reduction function in each position in the chain.
- Detailed description can be found at: wikipedia on Rainbow Table.
Salt for Hashing
- Essentially, it’s just a simple trick to avoid simple/popular password text by adding some extra value to original plain text before hashing it.
- In fact, adding salt during hashing is a form of multiple hashing.
- Salt can be static (a fixed value) or dynamic (generated from plain password text).

Part II – End User’s Perspective

Given previous knowledge, how to make password more secure as an end user?
- Avoid short password
Short password is easy to attack using either dict or brute-force based approach
- Avoid simple/popular password, there are some popular password listed in the reference section
Dict based attack can crack simple or popular password efficiently. This is why some website requires your password to contain some non-alphabetical characters
- Use different password for different web site
Otherwise, one weak web site may expose all your online assets to attackers. To better manage these large amount of passwords, you may consider defining some rules for them. For example:
* define some password base: tqbfjotlb (from: the quick brown fox jumps over the lazy dog)
* define a rule to change the base for specific site: gmailtqbfjotlb for gmail, csdntqbfjotlb for CSDN
- Change your password often
Change the previous two rules from time to time
- Adopt password management software
If it’s hard for you to track many passwords for different web sites, you can use popular password management software such as: keepass

Part III – Developer’s Perspective

Here I summarized some tips on user password related developing.
1. Writing your own hash function
It’s very challenge (if not impossible) to write an ideal hash function for encryption that meets the “ideal” criteria:
- no two different inputs have the same hash value
- infeasible to recover the input from the hash value
That’s probably one reason that there are very few hash algorithms for encryption. But you can write a sub-ideal (but it’s your own version, not known by others) algorithm based on a near-ideal one, such as MD5 and SHA1. One simple way to do this is write another hash function H before hashing it with MD5. And you can give up the first criteria but ensure the second one. To ensure the second one, you can do some loosely conversion, for example, drop the middle letter of the input text. Since you drop some information during the conversion, it’s infeasible to completely recover to the original input.
2. Enforce strict password rules
To avoid user using popular and simple password, web site developer may consider enforce some restricts on valid password:
- Enable black list filter, forbid popular passwords.
- Check password length, forbid short passwords.
- Invalid simple text, password should contains both lower case and upper case letter, numbers and other type of characters.
- Password should not contain user name information.
- Should not equal to previous passwords in history
3. On hashing algorithm
To avoid exposing the actual hashing algorithm, you can consider:
- Don’t adopt well known algorithm natively
- Combine multiple algorithm together
- Combine well known algorithm with your own hashing function
- Provide hashing with salt
4. Secure your transport channel
You always need some transport channel to send user provided name and password to your server. Ensuring these channels’ security is also very critical. To this end:
- Prefer https over http
- Consider client side (for example, in java script) encryption before transfer it to server side
5. Defense online cracking
Typical attacker will use computer programs rather than real human to try to login online web site. To tell whether the logining user is a computer program or a real human being, you can adopt CAPTCHA in your online system.
To avoid downgrading user experience, you can trigger the CAPTCHA only when suspicious.
- Adopt multiple channel verification
If current user has suspicious behavior, such as: too many incorrect inputs, not in normal location, interact too fast. Multiple channel verification can be triggered:
* user have to provide some secure code sent to his mobile phone or email.
* user need to wait for some time.
* user need to pass CAPTCHA test.
6. Adopt existing proven ID system
If you don’t want to touch all previous tedious stuff, you can consider adopt existing ID system that is proven to work well. There are many such system, such as: OpenID, OAuth and QQ Login service
7. Other authentication related developing tips:
For other website security issues, be careful about: SQL InjectionCross Site ScriptingSession Hijacking


1. Hashing algorithm:
About BCrypt
MD5, SHA1, SHA256, SHA512, SHA-3
Rainbow Table
2. Bad password list:
Top 500 bad passwords
Twitter password black list (see source code)
3. Handbook about web security:
The Google Browser Security Handbook
The Web Application Hacker’s Handbook
4. Web developers’ must know