7/31/2011

Baidu Tieba Architecture

An architect of Baidu, who is being in charge  of the technology of Tieba product, gave a brief introduction on the back end technologies of this famous web application in the June activity of Baidu Salon.
Here are some notes on this speech:

Part I – Application Scale

1. Not just simple plain forum, but also photo/video/gaming
2. Includes front end, storage, anti-spamming, searching and mining
3. Numeric facts
- Bs of topics
- 10Bs of posts
- 10Ms of posts for single hot topic
- Ps of video data
- 100K+ QPS from client web browser
- 10K+ per second update message forwarding (I doubt this number)
- 100s service

Part II – Backend Technology: lightweight framework

For 80% common situations
1 MySQL
- prefer InnoDB than MyIASM with some modification (on disk writing pattern, with 10x perf gain)
- application optimization:
* avoid joining (by break normalization?)
* auxiliary index
* data locality
- single node numbers
* Ks of QPS
* 100Gs of Data
- mySql clustering
* master/slave for write/read separation
* home brewed request dispatcher (for easier programming and load balancing)
2 Cache
- hit ratio around 80%
- 10k ~ 100k QPS
- multiple granularity (page, picture, data item etc.)
- challenge: cache updating, writing request pressure
3 Flash Disk
- 5x – 10x perf gain without extra effort
- huge improve on random access
- size limitation: 500G (SSD) vs 10T (HDD)

Part III – Backend Technology: Heavyweight Infrastructure

For 20% rare scenarios
1. Partitioning
- Virtual, partitioned by application
** topic and post are separated
** relationship(list) and content are separated
- Horizontal, partitioned by key
2. Message Queue
- Reliable multicast communication system
- Handling mutation requests (I guess)
- Peak tps:100K+ (really?)
- It can only solve updating reliability problem, but seems that the speaker claims it also solves the scalability problem
3. In house storage node
- speedup by transforming random write to batch/append write
- memory patch: (background merge [mem + disk] in my understanding)
- write ahead logging for reliability
- highly optimized for application
4. In house distributed KV store
- for video storage
- replication (driven by MQ) for reliability
- append only
- Peta bytes scale

Part IV Backend Technology – Clustering Management

1. Most website are basically SOA architecture
- 100+ standalone small services
- service orchestration for single user request
2. Challenges in this architecture
- service/data upgrading
- failure handling
- performance variation
3. Service Management
- service metadata center management
- service registration and notification
- hide service cluster from application caller
- auto failure handling and load balancing
- (why service notification but not just try and ask registry if failed?)

Part V – Summary

It seems that there is nothing new in the presentation, all related technologies are well known. But its value lies in the fact that it gave us a high level overview of how today’s various famous web service is implemented and many numeric facts about this product.

[Reference]

0. Baidu Salon site
1. Speaker introduction and video
2. Speech ppt

7/13/2011

Google Plus – the Inside Out

The Wired magazine recently published a great story on the origination and development of Google+. The author has many inside information about Google and the Google+ product, so the story contains many useful and insightful information about Google’s people-centric movement. I noted some of my understanding and comments here:

1. What’s Google+ ?

Basically, Google+ is Google’s social networking initiative to turn this algorithm-centric giant to be more people-centric. Currently, it consists of the following major components:
- Stream: Stuffs that are shared by people you care about (circled by you in G+ world), very similar to what twitter/facebook provide.
- Spark: Stuffs that pushed to you by Google according to your specified interesting.
- Hangout: Web based multi-user video chat service.
- Circle: A multi-dimensional way to organize your online social networks.

But this is just the very basic introduction, Google+ is more than just those even in today’s service. More detail follows.
2. Why Google+ ?

The Google+ is big product (or product umbrella) mainly driven by Vic Gundotra, SVP of Google Social Division, a former general manager at Microsoft in charging .Net/Live developer ecosystem.

The major driving forces of Google’s social efforts come from:
- Challenges from other pioneers such as Facebook. Facebook refused to open its content and connection data to Google while it gets more and more popular. People in Google worry that Facebook may use those valuable user contributed data to build a even better people-centric search engine that beats Google.
- Internet paradigm shift. The Internet and application in it become more and more people centric, which is not the same as when Google’s founded:

“The internet is nothing but software fabric that connects the interactions of human beings, every piece of software is going to transformed by this primacy of people and this shift.” -Gundotra, SVP of Google social
3. The History of Google’s Social Efforts

January, 2004, Google launched it’s social networking service – Orkut, developed as spare time project by Orkut Büyükkökten while working at Google.

2007, Google start a initiative called Open Social to establish a open standard for social applications and platforms.

2009, a social networking based communication tool called Wave was introduced during Google I/O.

2009, a twitter like product called Buzz is integrated into Gmail.

Non of them had been considered as a successful product, but Google’s social networking efforts continues.

March 2010, only a month after the Buzz debacle, Google’s head of operations, Urs Hölzle, sent out an e-mail evoking Bill Gates’s legendary 1995 Internet Tidal Wave missive to Microsofties. Hölzle acknowledged that fundamental way people use the internet has changed. He did started some social networking related projects within Google and his memo became known as the Urs-Quake.

May 2010, 50 of Google’s top people gathered together to discuss the challenges faced by the search giant. Amit Singhal, one of the company’s most respected search engineers, urged that Google dramatically expand its focus to create a hub of personalization and social activity.

The Google leadership team adopted Singhal’s suggestion and code named the projects as: Emerald Sea. Gundotra made a pitch to lead the Emerald Sea project, and got the nod. Bradley Horowitz became his co-leader and collaborator.





Google VP of product management Bradley Horowitz (L) and Vic Gundotra, Senior vice president of social for Google. (from [1])
4. The Birth of Google+

- It got started just after the May meeting, and covered 18 current Google products, with almost 30 teams working in concert.
- It produced a working prototype 100 days after the May meeting (August 2010).
- It became ready for dogfood around October 2010.
- It got its first 50 users by email invitation, 600+ in around one hour, 90% of Google employe within one day during dogfood.
- The first round of dogfood feedback is not very positive due to lacking of tutorial and feature complication – hard to comprehend and hard to use .
- It is refactored and re-conceptualized according to feedback: some features are delayed to future release, some are separated out as other standalone features, such as the +1 button.
- It rolled out the second round dogfood with selected people within Google in Spring, 2011 and got positive feedback.
- It started its field test @ June 28, 2011, where external users can experience this product in invite-only way.
5. Feature Drill down and Insights

Stream – ordered shared items from you social graph. It’s a pretty typical social networking feature that is provided by twitter, facebook and weibo. But it has its uniqueness:
- It has no limitation on the word count of item, while Twitter/Weibo limits it to 140
- It has +1 button and can be commented with instant update to online readers
- It can be filtered by author groups, which is a very handy feature when you follow large amount of people

Spark – streamed items from Google according to the topics you explicitly specified. Sounds like a normal search query result page but Google had adjusted the filtering and ranking policy to make it more suitable for sharing in Google+ world. It favors more on fresh, social popular and visual items discovered from the web.

spark is the way Google try to understand your unique interests and feed you with related information. But it may also be the cover that hide the facts that Google is using the privacy related information from Gmail content and your search history to know more about your interests.

Circle/Sharing – offers a simple means of organizing one’s social network so that your sharing is micro-targeted: you organize your social network into various (maybe overlapped) circles and share items to specific circles. It may be the most important and also most controversial feature in Google+.

Some people said that it help them control who will see shared items but others said that it makes sharing action very complicated and the whole social network become very hard to manage and understand.

In my personal experience, it’s a over designed feature. I am forced to think/select what’s the target audience when I want to share something online, which break the famous UX design rule: DON’T MAKE ME THINK. And also, it’s very hard for a user to understand thoroughly exactly who will ultimately see the item I am going to share.

How many people on this planet has enough patient to fully understand this logic and exercise it each time when he want to share an interesting item?

The idea of circle and multiple social network is said to be the result of the following research result:

View more documents from Paul Adams

Google claims that it create the idea and concept of circle because it behaves exactly the same way as our real social experience. Let’s assume that it does behave exactly as real social activity, but will it better to behave the same as reality? I don’t think so. We spend more and more time on online social activities because it’s different (in positive way) from the boring real society. For me, I use various online social service because it’s more convenient for me to keep in touch with real friends and it’s more open and easier for me to get know more friends, especially those that aren’t available in real life. If the online society is the same as the real one, what’s the attractiveness of the online social service? I feel Google’s circle concept is making the online social more enclosed, more complicated to understand and master.

There are some other critics said “SNS just do what virtual world should do, let some other stuff happen in real world” and “in real life, the circle is not chosen when you want to convey some message, rather, you choose what to say when you are in different situation and different circle”.

I do admit that there are some situations that I didn’t want my message to be visible to some one in my social network. But it’s better to be fulfilled by a feature: selecting what’s the target user you want to hide your message/status from, I.E., you need to do minus rather than addition. Here, the minus operation is easier to understand and involves less thinking.

But circle is a good idea for streaming stuff filtering especially when you follow many people and they have different message updating cycle.

6. Misc

- “There are only a few emotions that can effect change at a large organization,” he (Gundotra) explains. “One is greed and another powerful one is fear.” Outright greed is gauche in the Googleplex, so Gundotra prepared a slide deck that mocked up challenges from Google’s competitors (notably, Facebook), illustrating how each company could turn Google upside down.

- Emerald Sea has been the rare initiative in Google where the company was not breaking ground but defensively responding to a competitor’s success. (One engineer has described this process as “chasing taillights,” noting that me-too-ism has never been a strength for Google.) It’s also, claims Gundotra, the most extensive companywide initiative in Google’s history.

- “We put the product to [dog food] before it was fully baked, before we hardened the system and polished it and knew what we were doing,” says Horowitz. “We had no getting-started screen, no intro video. It was hard for people to get their hands around what it is and how to begin interacting with it. It was as if Facebook had been in stealth mode for seven years and then launched in its entirety at once today — it would have been an overwhelming, hard-to-comprehend, hard-to-understand system. The feedback we was got was: Simplify.”

- No one expects an instant success. But even if this week’s launch evokes snark or yawns, Google will keep at it. Google+ is not a product like Buzz or Wave where the company’s leaders can chalk off a failure to laudable ambition and then move on. “We’re in this for the long run,” says Ben-Yair. “This isn’t like an experiment. We’re betting on this, so if obstacles arise, we’ll adapt.”

- Because of the pressure the stakes and the scale, Gundota insisted that Emerald Sea should be an exception to Google’s usual consensus-based management style.

- “This is a top-down mandate where a clear vision is set out, and then the mode of moving forward is that you answer to Vic,” Rick Klau told me last year. “If Vic says ‘That looks good,’ then it looks good.”

[Reference]