12/22/2012

The Big Switch: from Edison to Google

I read the Chinese version, so I wrote this post in Chinese …
全书基本上分了两部分,第一部分主要是讲述电气化时代,对整个世界的影响,并以此引出计算机+互联网成为公用服务时对人类社会可能造成的影响;第二部分主要是介绍了时下的各种热门互联网服务及其发展趋势。总的说来,第一部分还比较有趣,其它部分伐善可陈,同类题材畅销书太多了。
电气化时代对人类的影响和贡献自是不必赘述,但爱迪生这个大发明家在其中的沉浮倒是值得思考:
- 爱迪生灵感十足,富有远见,不仅发明了白炽灯,还规划并建立了整个电气时代的生态环境。
- 在这个电气时代的生态环境中,爱迪生不仅从技术上提供了可行性,还在商业上建立了完整的体系。
- 但由于技术上过于相信直流电的作用,以及受商业模式的诱惑(小电厂越多,越能卖出更多的发电设备),对交流电在店里传输中的优势以及发电厂的大规模集中化趋势视而不见。
可以学习的教训:
- 如果你开发设计某个产品功能的目的是因为它可以更好地为你带来收益,那么很有可能你正走在错误的道路上。
- 产品设计的首要目的是以最正确的方式解决人们真实的需求,在此基础之上才能考虑盈利和赚钱。
拿最近三百大战来说,为什么百度的股价在360的出击下显得如此不堪一击?主要的原因还是百度的竞价排名模式被360死死盯住不放。竞价排名的模式显然严重影响了搜索引擎的使用体验以及道德上的正义性。
此书第二部分有部分内容基本是Long Tail 和 Micro Trend的大杂烩。其它比较值得注意的几点:
- 人类社会在逐步进入礼品经济时代,人类从事某项活动的动机并不完全取决于经济利益;而这,将会成为互联网服务提供商获取廉价数据的源头
- 在互联网时代,每个人所有的踪迹将无处可藏;在享受互联网便利的同时,我们将面临巨大的隐私泄露风险
- 人脑和互联网智能系统的结合,将引领人类进入更加激动人心的未来

最末,作者有段对技术变革大潮的感慨挺能引起共鸣:“所有技术的进步都会涉及两代人,而它的全部力量和影响,则要等到新时代的第二代完全成长起来,将创造出这些技术的老人们挤到历史的故纸堆里的时候才会显现。技术的进步就是这样,好像我们今天所有用的一切都是那么的理所当然。本世纪末,人类将不再拥有对没有电脑和互联网的生活的记忆,而我们将是带走这种最后记忆的人。”

11/22/2012

Blue Ocean Strategy

红海与蓝海
现存的市场由两种海洋所组成:即红海和蓝海。
红海代表现今存在的所有产业,也就是我们已知的市场空间;蓝海则代表当今还不存在的产业,这就是未知的市场空间。
针对不同的市场,有不同的经营战略。
红海战略
从当前已有市场中分一杯羹,最著名的是Michael Porter的竞争战略,其理论主要包含两个大的理论框架
Five Force Analysis(五力分析框架)
- Bargaining Power of Suppliers
- Bargaining Power of Buyers
- Threat of new Entrants
- Substitutes
- Rivalry
Three Generic Strategies(三大通用战略)
- Overall Cost Leadership
- Differentiation
- Focus
要在红海市场里取胜,能采取的战略不多:要么把成本控制到比竞争者更低的程度;要么在企业产品和服务中形成与众不同的特色,让顾客感觉到你提供了比其他竞争者更多的价值;要么企业致力于服务于某一特定的市场细分、某一特定的产品种类或某一特定的地理范围。
蓝海战屡
- 开拓新的市场空间,避免与现有竞争对手血拼
对于蓝海战略,主要包含一个价值创新分析框架、一个重新定义市场边界的方法论还有一个分析制定战略的步骤。
战略分析框架主要包括战略布局图和四步动作框架:
- 战略布局图(Strategy Canvas) :由某一市场上的主要竞争要素和该市场里的产品/服务在各项要素上的得分构成。也可以认为一个战略布局图是由某个市场的一系列产品和服务的价值曲线(Value Curve)构成。
- 四步动作框架:减少没有价值的要素,降低价值弱化的要素,增加愈发重要的要素,创造尚未出现的要素
在分析制定价值曲线的时候,一个良好曲线的标准是:
重点突出,这样才能降低成本,突出优势
另辟蹊径,这样才能体现与同类商品的区分度
主题信服,这样才能给购买者带来真正的价值
战略布局图上每种产品都有一个价值曲线,发掘创造符合蓝海战略的价值曲线的过程,也叫做价值创新(Value Innovation),create value for both producer and consumer in a totally new defined market, ignore competitors in existing market, focus on yourself and your product/service。
重建市场边界,则是改变现有市场范围,发掘更多目标客户的过程,主要的方法有:
- 跨越他择市场(Alternative Market),成为他择产品的替代品
- 跨越市场分组(Market Group: High/Low End),融合面向不同市场划分的功能
- 跨越买方链(Buyer),直达真正的消费者
- 跨越互补产品和服务,整合强大生态链
- 跨越功能和感官界限,为现有系统增加、减少功能与感官上的特点
- 跨越时间,拥抱甚至参与创造即将到来的新趋势
战略指定步骤
一个典型的战略评估、制定周期:衡量客户效用->制定战略价格->评估目标成本->解决接受障碍
一个蓝海战略必须要为目标客户提供卓越的效用,创造市场需求,让客户产生购买的动机;
制定合理的价格,让目标客户获得合理的性价比,才能产生真实的市场交易;定价因素:效用的卓越程度、竞争门槛、当前市场价格范围;
从预定价格和期望利润率推导出目标成本,并想方设法评估目标成本的可行性,而不是相反的过程;
考虑来自雇员、伙伴和公众的对新战略的阻碍因素并提前做好应对策略,保证战略的顺利执行
在蓝海战略中的定价往往是一种战略活动,不可在这方面因成本因素做妥协,常见的压缩成本的方法:简化运营、技术创新、与供应商建立伙伴关系。如果成本实在无法压缩,还可以采取价格创新策略,比如:由出卖拥有权到出卖使用权;由收取现金到拥有客户股票;将实体现货变成期货交易

9/02/2012

Autopilot: Automatic Data Center Management

Managing large scale data center automatically without too much human involving is always a challenging task. Industrial  giants such as Google and Microsoft are pioneer in this area and very little information is leaked about how they handling such problems. But in 2007,  Michael Isard of Microsoft Research wrote a paper entitled Autopilot: Automatic Data Center Management which describes the technology that Windows Live and Live Search services have used to manage their server farms. This is a great opportunity to look at how industrial giant manage tens of thousands of machines using software.
Design Principle
- Fault tolerant, any component can fail at any time, the system must be reliable enough to continue automatically with some proportion of its computers powered down or misbehaving
- Simplicity, simplicity is as important as fault-tolerance when building a large-scale reliable, maintainable system. Avoid unnecessary optimization, and unnecessary generality.
Datacenter layout
A typical application‖ rack might contain 20 identical multi-core computers, each with 4 direct-attached hard drives. Also in the rack is a simple switch allowing the computers to communicate locally with other computers in the rack, and via a switch hierarchy with the rest of the data center.
Finally each computer has a management interface, either built in to the server design or accessed via a rack-mounted serial concentrator.
The set of computers managed by a single instance of Autopilot is called a cluster.
Autopilot architecture
Autopilot consists of three sub systems
- Hardware management, including machine/switch/router state maintain, auto error repair, os provisioning etc.
- Deployment, automatically deploy application and data to specified machines in a data center.
- Monitoring, monitor the state of device and service inside the data center, collect performance counter and user friendly display UI.
Autopilot Architecture
Hardware Management
- Main responsibility of Device Manager, it maintains a replicated state for each device in the data center
- It makes decision to reboot, re-image or retire a physical machine/switch/router
- It periodically discover new machine through  the special management interface, either built in to the server design or accessed via a rack-mounted serial connector
- It automate the OS installation process through Provisioning Service
- It automate the error repair process using a Repair Service
- It collection device state from various Watchdog Service
 Deployment
- Machine is assigned to a machine function, which indicates what role it plays and what kind of services will run on it
- Machine is also assigned to a scale unit, which is a machine collection that serves as application/os update unit
- Each machine is responsible for running a list of application/autopilot service and this list is stored as service manifest file. Multiple version of manifest file can be stored in a machine, only one is active, others are kept for switch to active or rollback when upgrading failed
- Device manager maintains the manifest file list of each machine in the cluster and its corresponding active version
- Deployment service is a multi-node service which stores all the application/data files listed in the service manifest. These files are synced from external building system.
- Autopilot operator trigger new code deployment by a single command to Device Manager. DM then update service manifest of specified machines accordingly and kick each machine to start to sync bits from deployment service and run them. Machine in the cluster then sync the manifest file and download specified application/data to local disk and start them.
- In normal case, each machine periodically query DM what manifest should be on its local disk. It will fetch one from deployment service if needed manifest files are missing
Monitoring
-  Watchdog, it constantly probe the status of other service/machine and report it back to device manager. Autopilot provides some system wide watchdog, but application developer can build their own ones as long as these service knows how to talk to DM about device status
- Performance counters  are used to record the instantaneous state of components, for example a time-weighted average of the number of requests per second being processed by a particular server.
- The Collection Service forms a distributed collection and aggregation tree for performance counters. It can generate a centralized view of the current state of the cluster‘s performance counters with a latency of a few seconds.
- All collected information is stored in a center SQLServer for fast and complex querying by end user. These data is exposed to application developer and operator through a http based service called cockpit service.
- Besides global view of status of the data center, cockpit is also responsible for access some resources (for example, application/data/log files)
- Predefined status query and abnormal result are combined to form an alter service. It can send out email and even phone call when some critical situations happen.
Reference

8/19/2012

Lessons learned from Tencent’s Wechat by Xiaolong Zhang

The presentation is divided into 5 parts:
- Wechat History
- On User
- On Requirement
- On Design
- On Interaction

Highlight points on User
- People is lazy, let them do/click less in order to reach some goal
- People likes fashion, do something really cool to attract them
- People lacks of patience, do not let them read manual or tip
- People’s time is fragmented, do not give them some task that needs lots of continues time
- People gets stupid when they are in mass, treat them somewhat stupid without too much judge
- People is emotional, they seeks for inner satisfaction, for the feeling of being
- People likes uncertainty, the has  lots of curiosity to unknown stuff
- People is social animal, they want to know more people
- Know your user from psychological perspective

Highlight points on Requirement
- Product is designed to satisfy some desire that lives in people’s heart and daily life
- Satisfy user, don’t put too much moral judge in your product design
- Purify and abstract all the feedback got from end user, don’t just do what user tells you literally
- Try to get to know your target user from weibo/forum etc.
- Revolutionary product comes out when the society changed
- Different people usually has some common requirement, that’s the most important thing you need to work on
- Associate feature requirement with psychological desire, people is emotional
- Think in large scale and massive group for social product/feature
- Focus on few but vital scenarios, ignore other trivial stuff
- Polling/survey can only help you improving existing feature, can’t help you  on new product/feature
- Feature requirement comes from solving problems form you and your friends

Highlight points on Design
- Evolve your product gradually, you can’t design a perfect product at the first hit, every product has it’s own life cycle
- Products that has clear DNA will survive longer time
- Design the product structure first, and then focus on detail
- Categorizing, make things clean and clear
- Loving abstraction, make things simple and easy
- Design from scenario, not feature list
- Be careful about over design
- Drop feature that won’t makes you and user exciting
- Responsiveness is the king of user experience
- Ship feature gradually, don’t move too fast, change too much in one step
- Give user the rights to choose, core + plugin
- Respect your user: protect their privacy,  save their temporary input, broadcast message signed by real name, not “system administrator”
- One thing for all, not one version for one zone
- Design for user, user is the major role, not design itself
- Makes things as nature as possible, don’t makes people think
- Hide technology from common user
- Focus, less is more

Highlight points on Interaction
- UI serves for feature
- Makes it simple and clean
- Each screen has its own topic
- Hide numbers

Some Comments- Emphasized too much on the importance of product manager. Most of the time, whether a product will succeed (especially in China) depends on what product you are going to do and what’s the platform you can leverage. For social product, existing user data and connection is the most important thing.
- Product manager is not god. God determines everything, but product manager should design product as desired (explicitly or implicitly) by user.
- Too many critics on competitor’s product design, but those features are what I (as a normal user) think Wechat should add.
- Wechat is the most successful product in the market, but what’s the real reason? Because it’s different feature design? The only reason I think is that it’s backed by Tencent, which has large mount of QQ user and its binding to QQ friends.
- He said a lot about avoiding “over design”, but also talks about too much on active design.
- The presentation lacks of something called “无为而治”, whether a product will success or not, what the final running system will look like is not only determined by how product manager design it, but also by how people interact with it.
- He should also thanks to QQ user data, to the competitors, to the creator of kik, to the great mobile Internate  time.

5/26/2012

The evolution of QZone architecture

Problem Scale at Today- 550M active users
- x10M peak online users
- B scale daily Page View
- Peta scale UGC data
- 100B daily requests
Qzone 1.0 – 3.0 (0 ~ 1M online user, 2004 – 2006)
  • Architecture
    • Special Windows Client (embedded html)
    • Apache + Cache + MySql
      * App/CGI calls different data service to cook a result page for user request
    • One ISP one service cluster
      * Users from Telecom/Netcom are served by different dedicated servers
      * App calls data service in the same ISP
  • Problem (v1/v2)
    • special client -> hard to debug
    • web server is not scalable
    • 30~40 nodes, max up to 500k online user
  • Solution (v3)
    • Rich Client
      • move some logic from server to client
      • Client is ajax based, server logic is simplified
    • Dynamic/Static separation
      • Static data is hosted by light weight web server qHttpd
      • 100x performance improve
    • Web server optimization
      • Replace apache with qzHttp for dynamic logic
      • 3x performance improve
    • Main page caching
      • Staticlize and cache elements of main page
      • Elements are updated periodically or on-demand
Qzone 4.0-5.0 (1M ~ 10M online user)
  • ISP separation problem: dynamic data
    • All dynamic services are hosted within one ISP
    • Other ISP works as proxy to call these services
      • Dedicated network connection between proxy and service
    • User in other ISP don’t call services in other ISP directly
  • ISP separation problem: static data
    • Static:Dynamic ~ 10:1, adopt  CDN solution
    • Redirect static request to ISP specific static data server according to client IP information
      • By Qzone app logic using client Ip info
      • Previously using DNS to do redirection which causes lots of problem
      • Due to local DNS setting problem
  • Improve user experience
    • Improve critical service’s availability
      • Replicated core service
    • Do lossy service for non-critical service
      • Skip some service if time out
      • Can also use default value if failed/time out
    • Fault tolerant design from backend service to client script
      • Default value at client
      • LVS for Qzone web server
      • L5(F5?) for internal critical service
    • Control time-out time for the whole request processing
      • Kind of real time scheduling algorithm
  • Incremental Release
    • Release new feature to end user from small scope to larger scope
    • Team internal dogfood
    • Whitelist (invited) user test
    • Company wide dogfood
    • Vip external user
    • Roll out globally
Qzone 6.0+ ( ~100M online user)
  • Open platform
    • App/Platform separation
    • iFrame based app model
    • App’s dev/test/deploy is totally separated from Qzone platform
    • Separation of concerns and parallel evolving path
  • GEO replication – handle IDC failure
    • One IDC for write
    • Multiple IDCs for read
    • Dedicated synchronization protocol
  • Monitoring
    • Bandwidth/Latency/Error monitoring
    • Problem locating
Comments
  1. All contents are very general, not too many details
  2. Not touched the core problem: how to scale, how to partition so large scale data
  3. Single IDC write will cause service availability problem in case of disaster unless reconfiguration is supported
Reference

3/05/2012

Network Address Translation, Network Partitioning and Virtual Private Network


I – Virtual Private Network

A VPN is a private network that uses a public network (usually the Internet) to connect remote sites or users together. The VPN uses “virtual” connections routed through the Internet from the business’s private network to the remote site or employee. By using a VPN, businesses ensure security – anyone intercepting the encrypted data can’t read it.

There are two types of VPNs:
- Remote access VPN, it enables users working at home or on the road to access a server on a private network using the infrastructure provided by a public network, such as the Internet. From the user’s perspective, the VPN is a point-to-point connection between the computer (the VPN client) and an organization’s server.



Diagram from Microsoft Article on VPN

- Site-to-site VPN, it enables organizations to have routed connections between separate offices or with other organizations over a public network while helping to maintain secure communications. A routed VPN connection across the Internet logically operates as a dedicated wide area network (WAN) link. When networks are connected over the Internet, as shown in the following figure, a router forwards packets to another router across a VPN connection. To the routers, the VPN connection operates as a data-link layer link.

Diagram from Microsoft Article on VPN

Most VPNs rely on tunneling to create a private network that reaches across the Internet. Tunneling is the process of placing an entire packet within another packet before it’s transported over the Internet. That outer packet protects the contents from public view and ensures that the packet moves within a virtual tunnel. Tunneling requires three types of protocols:
- Passenger protocol : the original data (IPX, NetBeui, IP) that is carried over
- Encapsulating protocol : the protocol (GRE, IPsec, L2F, PPTP, L2TP) that is wrapped around the original data
- Carrier protocol : the protocol over which the information (in passenger protocol) is traveling

Reference:
http://computer.howstuffworks.com/vpn.htm/printablehttp://www.cisco.com/application/pdf/paws/14106/how_vpn_works.pdfhttp://technet.microsoft.com/en-us/library/cc779919(v=WS.10).aspx


II – Network Address Translation


NAT is the process of modifying IP address information in IP packet headers while in transit across a traffic routing device. It’s widely used for two purposes:
- Security Firewall, hide private network and outside host can’t access internal host directly
- Alleviating IP v4 address exhaustion, share the same public IP address among many internal hosts



How NAT works from Cisco

Types of NAT
- Static NAT: one to one mapping, IP is changed, but port not changed
- Dynamic NAT: M to N mapping, IP is changed, but port not changed
- Overloading: M to one mapping, IP and port is changed, but IP is always changed to the same one

Communicating between hosts both behind NAT
- Port Forward: translate the target ip/port of a IP packet to a new destination when NAT router processing incoming packets
- Nat Traversal: it enables any two nodes behind NAT device to communicate with each other

NAT traversal is a broad area where many technology and protocols are invented to solve the same problem. Traditional methods require the help of a third party host with public address, some recent technology requires the two communicating parties only.

One example is UDP hole punching: each host behind NAT communicate with a public server first to establish an address translation entry in their corresponding NAT router, the server then tell them about the remote peer’s ip/port information and now these two hosts can talk using UDP directly. This assumes that NAT router will not change the ip/port mapping when its established.

Reference:
1. Anatomy: A Look Inside Network Address Translators (PDF version)
2. How NAT works @ HowStuffWorks3. RFC 1631 – The IP Network Address Translator (NAT)4. Autonomous NAT Traversal5. RFC5128 - State of Peer-to-Peer (P2P) Communication across Network Address Translators (NATs)6. Skype Communication Protocol Internals7. UDP hole punching

III – Network Partitioning


A network partitioning failure splits the network into two or more disjoint parts. Processes of a network based application within the same part can communicate with each other, but they can’t communicate with processes located in other parts. It may caused by a failure of a cross boundary router device. Due to the tree structure network infrastructure, such failure only causes network partitioning, not whole network communication failure.

Strategies to handle network partitioning
- Replication, replicate data/process to several locations to tolerant network partitioning failure. To implement replication and keep consistency among replicas, typical required technologies are: multicast group communicating, message order ensuring and group membership management.
- Replicated data/process is good at serving immutable operations. To support mutable operations among replica while consistency is maintained (or making reconciliation easy to implement) when network partition happens, some other technologies are invented, such as: disjointed transaction, commutative transaction and time stamped transaction.
- Stop/Join policy, if partial unavailability is acceptable, we can design a system where disconnected (to primary process) non-primary process should stop serving when partitioning happen. These stopped process will join the serving group when connection is restored and then go through the new process joining group to catch up the updates happened during network partitioning.
- Fulfill transaction, in this technology, non-primary process will not stop serving, it will accept mutation request and put them into a queue. These non-primary process will catch up with primary and then apply queued requests after connection is restored.

Reference

1. Surviving Network Partitioning IEEE, 1998

1/08/2012

On User Credentials for Web Site

There are several critical password leak events at the end of 2011 that happened in some leading Chinese internet companies, such as CSDN (leading technology community), Tianya (leading discussion community) and RenRen (leading social networking). These leakages have big impact to many Chinese internet users’ daily web life. So as a technical guy, I did some investigation and make some summaries here to avoid such disasters if I were the product owner and also to make my Internet account more secure.

Part I – Technical Background


Plain Tex v.s. Hash Text
- Store plain text of a password is dangerous in case of user data leakage, but it seems that almost all popular web sites do store it. At least it’s true in China.
- Hashing is a way to transform the plain text into some meaningless (for people) strings that are almost impossible to covert back to original text. It’s more secure than plain text in terms of storing user password.
- Typical Hash algorithms are: MD5, SHA1, SHA256, SHA512, SHA-3
Attacking Hash
- With ideal hash algorithm, it’s impossible to convert hashed text back to original text directly, but people can accomplish this using dictionary or brute-force based approaches
- Dictionary: attacker can precompute the hash value of popular passwords using some specific hash algorithm and compare the output with hashed text
- Brute-force: enumerate/compute all possible password and compare it with hashed text
Defense Hash Attacking
- Defense Dictionary based Attack
* Using multiple hash functions together: there are only a few popular hash algorithms, so pre-computing and storing popular passwords’ dict are cheap. But if you uses multiple hash functions in some order, attacking will become very slow and will not be practically due to huge potential result space. Alternatively, you can also hash plain password text multiple times using the same algorithm.
* Write your own hash function, thus the attacker can’t do the pre-computation.
* Add salt to plain user password before hashing it to secure text.
- Defense Brute-force based Attack
* Adopt heavy hashing function, for example, the BCRYPT algorithm.
* Write your own hash algorithm.
Rainbow Table
- It’s a variant of the naive dictionary based hash/encryption attacking that reduces spaces to store precomputed dict with the cost of more CPU during precomputing and looking up.
- It’s based on the idea of hash chain:chain a series of text with hashing/reduction, store just the head and tail, intermediate texts can be computed during looking up.
- Rainbow table further improved hash chain’s collision problem by adopting different reduction function in each position in the chain.
- Detailed description can be found at: wikipedia on Rainbow Table.
Salt for Hashing
- Essentially, it’s just a simple trick to avoid simple/popular password text by adding some extra value to original plain text before hashing it.
- In fact, adding salt during hashing is a form of multiple hashing.
- Salt can be static (a fixed value) or dynamic (generated from plain password text).

Part II – End User’s Perspective


Given previous knowledge, how to make password more secure as an end user?
- Avoid short password
Short password is easy to attack using either dict or brute-force based approach
- Avoid simple/popular password, there are some popular password listed in the reference section
Dict based attack can crack simple or popular password efficiently. This is why some website requires your password to contain some non-alphabetical characters
- Use different password for different web site
Otherwise, one weak web site may expose all your online assets to attackers. To better manage these large amount of passwords, you may consider defining some rules for them. For example:
* define some password base: tqbfjotlb (from: the quick brown fox jumps over the lazy dog)
* define a rule to change the base for specific site: gmailtqbfjotlb for gmail, csdntqbfjotlb for CSDN
- Change your password often
Change the previous two rules from time to time
- Adopt password management software
If it’s hard for you to track many passwords for different web sites, you can use popular password management software such as: keepass

Part III – Developer’s Perspective


Here I summarized some tips on user password related developing.
1. Writing your own hash function
It’s very challenge (if not impossible) to write an ideal hash function for encryption that meets the “ideal” criteria:
- no two different inputs have the same hash value
- infeasible to recover the input from the hash value
That’s probably one reason that there are very few hash algorithms for encryption. But you can write a sub-ideal (but it’s your own version, not known by others) algorithm based on a near-ideal one, such as MD5 and SHA1. One simple way to do this is write another hash function H before hashing it with MD5. And you can give up the first criteria but ensure the second one. To ensure the second one, you can do some loosely conversion, for example, drop the middle letter of the input text. Since you drop some information during the conversion, it’s infeasible to completely recover to the original input.
2. Enforce strict password rules
To avoid user using popular and simple password, web site developer may consider enforce some restricts on valid password:
- Enable black list filter, forbid popular passwords.
- Check password length, forbid short passwords.
- Invalid simple text, password should contains both lower case and upper case letter, numbers and other type of characters.
- Password should not contain user name information.
- Should not equal to previous passwords in history
3. On hashing algorithm
To avoid exposing the actual hashing algorithm, you can consider:
- Don’t adopt well known algorithm natively
- Combine multiple algorithm together
- Combine well known algorithm with your own hashing function
- Provide hashing with salt
4. Secure your transport channel
You always need some transport channel to send user provided name and password to your server. Ensuring these channels’ security is also very critical. To this end:
- Prefer https over http
- Consider client side (for example, in java script) encryption before transfer it to server side
5. Defense online cracking
- Adopt CAPTCHA
Typical attacker will use computer programs rather than real human to try to login online web site. To tell whether the logining user is a computer program or a real human being, you can adopt CAPTCHA in your online system.
To avoid downgrading user experience, you can trigger the CAPTCHA only when suspicious.
- Adopt multiple channel verification
If current user has suspicious behavior, such as: too many incorrect inputs, not in normal location, interact too fast. Multiple channel verification can be triggered:
* user have to provide some secure code sent to his mobile phone or email.
* user need to wait for some time.
* user need to pass CAPTCHA test.
6. Adopt existing proven ID system
If you don’t want to touch all previous tedious stuff, you can consider adopt existing ID system that is proven to work well. There are many such system, such as: OpenID, OAuth and QQ Login service
7. Other authentication related developing tips:
For other website security issues, be careful about: SQL InjectionCross Site ScriptingSession Hijacking

[Reference]


1. Hashing algorithm:
About BCrypt
MD5, SHA1, SHA256, SHA512, SHA-3
Rainbow Table
2. Bad password list:
Top 500 bad passwords
Twitter password black list (see source code)
3. Handbook about web security:
The Google Browser Security Handbook
The Web Application Hacker’s Handbook
4. Web developers’ must know