Morningstar’s Stock Tutorial

Personal Investment Choices
- Stock: 股票
- Bond: 债券
- Mutual Found: 共同基金
- Real Estate:不动产
- Bank Saving: 银行储蓄

Understanding Company, Stock/Shareholder and Bond/Creditor
- The main purpose of a Company is to take money from investors (creditors and shareholders) and generate profits on their investments.
- Creditors provide a company with debt capital (in terms of Bond), and Shareholders provide a company with equity capital (in terms of Share). Stock is an ownership interest in a company, while Bond, at their most basic, are loans. When you buy a bond, you become a lender to an institution, and that institution pays you interest. As long as the institution does not go bankrupt, it will also pay back the principal on the bond, but no more than the principal.
- Creditors are typically banks, bondholders, and suppliers. They lend money to companies in exchange for a fixed return on their debt capital, usually in the form of interest payments. Companies also agree to pay back the principal on their loans.
- Shareholders that supply companies with equity capital are typically banks, mutual or hedge funds, and private investors. They give money to a company in exchange for an ownership interest in that business. Unlike creditors, shareholders do not get a fixed return on their investment because they are part owners of the company. When a company sells shares to the public (in other words, “goes public” to be “publicly traded”), it is actually selling an ownership stake in itself.
The Great Compound Interest(复利)
- Compound Interest means making returned interest as investment and it can increase your money in a surprising rapid way. A simple way to know the time it takes for money to double is to use the rule of 72. For example, if you wanted to know how many years it would take for an investment earning 12% to double, simply divide 72 by 12, and the   answer would be approximately six years. The reverse is also true. If you wanted to know what interest rate you would have to earn to double your money in five years, then divide 72 by five, and the answer is about 15%.

Part II – Stock Market and Qualitative Corporate Analysis
Stock Index
-  A stock index is simply the price of a grouping or a composite of a number of different stocks, often with similar characteristics.
- Three of the most widely followed indexes are the Dow Jones Industrial Average, the S&P 500, and the Nasdaq Composite.
- The Dow Jones Industrial Average: it is composed of 30 large stocks from a wide spectrum of industries that are selected by the editors of The Wall Street Journal. It’s basically the average of the price of the 30 stocks, but had been adjusted a lot due to stock split like events.
- The S&P 500: it is a market capitalization weighted average stock index. The company list in maintained by the Standard & Poor’s company, a division of McGraw-Hill.
- The Nasdaq Composite: it is also a market-cap-weighted index, but it includes all companies listed in Nasdaq.

How to do Qualitative Analysis on Business? Ask and try to Answer questions:
- What is the goal of the business?
- How does the business make money?
- How well is the business actually doing?
- How well is the business positioned relative to its competitors?

Analyze Competitive Positioning of Business
- Find a business’s economic moat, which is a long-term competitive advantage that allows a company to earn oversized profits over time.
- Economic Moat Types:
Low Cost (due to scale or core technology)
High Switching Cost (user sticky)
Network Effect (ecosystem, scale/size matters, Matthew effect, winner takes over)
Intangible Assets (government approvals, brand names etc.)

How to Build Economic Moat?
- Creating real or perceived product differentiation
- Driving costs down and being a low-cost leader
- Locking in customers by creating high switching costs
- Locking out competitors by creating high barriers to entry or high barriers to success

Understand Strategic Positioning using Porter’s Five Forces
- Barriers to Entry. How easy is it for new firms to start competing in a market? Higher barriers are better.
- Buyer (Customer) Power. Similar to switching costs, what keeps customers locked in or causes them to jump ship if prices were to increase? Lower power is better.
- Supplier Power. How well can a company control the costs of its goods and services? Lower power is better.
- Threat of Substitutes. A company may be the best widget maker, but what if widgets will soon become obsolete? Also, are there cheaper or better alternatives?
- Degree of Rivalry. Including the four factors above, just how competitive is a company’s industry?

 Are companies beating one another bloody over every last dollar? How often are moats trying to be breached and profits being stolen away?

Porter’s five forces considered together can help you to determine whether a firm has an economic moat. The framework is particularly useful for examining a firm’s external competitive environment.

Part III – Accounting and Quantitative Corporate Analysis

一、财务恒等式:Assets – Liabilities = Equity  
二、Income Statement 与 Statement of Cash Flow的区别
- 根源在于Accrual Accounting(权责发生制)的会计原则,它要求companies to record revenue and expense when corresponding transactions occur, not when cash is exchanged
Operating Activities – 运营活动
Investing  Activities – 投资活动
Financing Activities – 融资活动
Capital Expenditure (a.k.a. CapEx) – 资本支出
Monetary Investment – 货币投资
Dilute – 稀释,用作除数的分母变大
Depreciation – 折旧摊销 (有形资产)
Amortization – 费用摊销 (无形资产)
Retained Interest – 未分配利润
Treasury Stock – 留存股票
- Efficiency
Inventory Turnover
Accounts Receivable Turnover
Accounts Payable Turnover
Asset Turnover
- Liquidity
Current Ration
Cash Ratio
- Leverage
Interest Coverage
- Profitability
Gross Margin
Operating Margin
Net Margin
Return on Assets
Return on Equity
- ROA (Return on Assets) = Net Income / Average Assets
= (Net Income / Revenue) * (Revenue / Average Assets)
= Net Margin * Assets Turnover
- ROE (Return on Equity) = Net Income / Average Equity
= (Net Income / Revenue) * (Revenue / Average Assets) * (Average Assets / Average Equity)
= Net Margin * Assets Turnover *  Assets-Equity Ratio
- ROIC (Return on Invested Capital) = Operating Profit After Tax / Invested Capital
= (Operating Profit * (1 – Tax Rate)) / (Assets – Excess Cash – Non-Interest-Bearing Current Liabilities)

Part IV Stock Investment Analysis and Strategies

Great company is different with great investment, your goal as an investor should be to find wonderful businesses, and invest in them at reasonable prices.

Company Valuation – determine the value of a company.
Measuring Business Value:
Market Capitalization = Outstanding Share Count * Share Price
Enterprise Value = Market Cap + Debt – Cash

There are actually two parts to the value of any business:
- The first part is the current value of all the business’s assets and liabilities, including buildings, employees, inventories, and so forth.
- The second part is the value of the profits the business is expected to make in the future.

There are two broad approaches to stock valuation. One is the ratio-based approach and the other is the intrinsic value approach:
- Valuation ratios compare the company’s market value with some financial aspect of its performance-earnings, sales, book value, cash flow, and so on.
- The ratio-based approach is the most commonly used method for valuing stocks, because ratios are easy to calculate and readily available. The downside is that making sense of valuation ratios requires quite a bit of context.
- Popular ratio-based measures:
Price / Sale
Price / Earning
Price / Book

Cash Return = (Free Cash Flow + Net Interest Expense) / (Enterprise Value)
- The other major approach to valuation tries to estimate what a stock should intrinsically be worth.
- A stock’s intrinsic value is based on projecting the company’s future cash flows along with other factors. You can compare this intrinsic or fair value with a stock’s market price to determine whether the stock looks underpriced, fairly valued, or overpriced.
- However, the main disadvantage is that estimating future cash flows and coming up with a fair value estimate requires a lot of time and effort.

Estimating a stock’s fair value, or intrinsic value using DCF model:
- The main idea behind a DCF model is relatively simple: A stock’s worth is equal to the present value of all its estimated future cash flows.
- Free cash flow represents the cash a company has left over after spending the money necessary to keep the company growing at its current rate.
- Many variables go into estimating those cash flows, but among the most important are the company’s future sales growth and profit margins.
- What cash flow to predicate and discount to present value? dividend payments -> free cash flow, because there are many firms that pay no dividends.
- How to do the discounting?
Present Value of Cash Flow in Year N =  CF at Year N / (1+ R)^N
CF = Cash Flow
R = Required Return (Discount Rate)
N = Number of Years in the Future
- The rate you would use to discount cash flows if using the “cash flow to the firm” method is actually a company’s weighted average cost of capital, or WACC. A company’s WACC accounts for both the firm’s cost of equity and its cost of debt, weighted according to the proportions of equity and debt in the company’s capital structure. Here’s the basic formula for WACC: (Weight of Debt) * (Cost of Debt)  +  (Weight of Equity)*(Cost of Equity)
- Computing the intrinsic value of a company: sum of all discounted (to present) future free cash flows

The math tricks behind DCF
- When counting the sum, we assume the company will generate cash flow all the time, but the growth number varies from near future to far future
- We usually assign specific growth ratios to 5-10 near future but assume equal (relatively) small ratio to long term growth
- Perpetuity Value: estimating the value of all cash flows after some specific year in one lump. It’s in fact the sum of geometric sequence with common ratio: (1+g)/(1+R)

Perpetuity Value = ( CFn * (1 +  g) ) / (R – g)
CFn = Cash Flow in the Last Individual Year Estimated
g = Long-Term Growth Rate
R = Discount Rate, or Cost of Capital
- Perpetuity value should also be discounted by to compute the present intrinsic value of a company

The main problem of DCF model to compute the intrinsic value is that, you have to determine many variable factors such as discount rate, growth rate for near future and long term. The estimating of these parameters are the real challenges for this model. To ensure your assumptions about these parameters make sense, you have to get familiar with those industries.

Stock Investing Strategy – Fat Pitch Strategy: Don’t Rush, Be Patient Till Enough Confident
- Locating Wide Moat Company
- Always Have a Margin of Safety
- Don’t Be Afraid to Hold Cash
- Don’t Be Afraid to Hold Relatively Few Stocks
- Don’t Trade Very Often

Investing Psychology: Mental Stuff that Leads to Investing Mistakes
- overconfidence
- selective memory
- self-handicapping
- loss aversion
- sunk cost
- anchoring: when estimating the unknown, we cleave to what we know
- confirmation bias
- mental accounting
- framing effect
- herding (羊群效应)

Part V Misc Tips and Great Investors
Portfolio Management
- diversification: don’t put your eggs in one basket
- if you own about 12 to 18 stocks, you have obtained more than 90% of the benefits of diversification, assuming you own an equally weighted
- don’t weight each stock equally in your portfolio if you want to outperform market index
- consider including mutual fund to cover area that you are not familiar with

- option is the right (but you can choose to exercise it or not at will when it expires) to sell (put option) or buy (call option) some thing (it’s stock for stock option) at a specific price (stated in the option contract)
- option makes shorting possible

Investing Tips
- Keep It Simple
- Have the Proper Expectations
- Be Prepared to Hold for a Long Time
- Tune Out the Noise
- Behave Like an Owner
- Buy Low, Sell High
- Watch Where You Anchor
- Remember that Economics Usually Trumps Management Competence
- Be Careful of Snakes
- Bear in Mind that Past Trends Often Continue
- Prepare for the Situation to Proceed Faster than You Think
- Expect Surprises to Repeat
- Don’t Be Stubborn (Stubborn VS Patient)
- Listen to Your Gut
- Know Your Friends, and Your Enemies
- Recognize the Signs of a Top
- Look for Quality
- Don’t Buy Without Value
- Always Have a Margin of Safety
- Think Independently

One Sentence Summary
- Invest in long term basing on quantitative and qualitative analysis, don’t speculate if you don’t want to rely on luck.


Notes – Massive-scale online collaboration

There is a popular presentation on ted Titled as massive scale online collaboration given by Luis von Ahn.

Luis is a well known computer scientist who focuses on so called human computation technologies. He is famous for his previous projects CAPTCHA and reCAPTCHA. In fact, the word CAPTCHA is coined by him for “Completely Automated Public Turing test to tell Computers and Humans Apart”  in the paper: CAPTCHA: Using Hard AI Problems for Security .

CAPTCHA is publicly well known since we should already encountered them many times in our daily web life. But reCAPTCHA is not so well known but in fact we should also had faced it many times and this technology is solving some hard AI problems every day.

The motivation behind reCAPTCHA is that, there is about 200M CAPTCHA inputs per day and each input spends a people 10 seconds around. This is really a huge time and intelligence waste, so Luis want to leverage such kind of resource to accomplish some useful work – solving AI problems that can be divided into 10 seconds small chunks.

Fortunately, there do be one such problem – book digitizing: scan real books and turn scanned pictures into text. There are already many OCR (optical character recognition) technologies to do this automatically. But they are not good enough, it’s said that for books older than 50 years ago, OCR can’t handle +30% of them. So we can divide those OCR task into small pieces (usually, one word per piece) and let people solve them while they are doing CAPTCHA on Internet, which is called reCAPTCHA.

How it works? Each time a CAPTCHA is requested, the system send two pictures to people. One picture contains word that the system already knows but the other are not, and the unknown one comes from books that need to be OCRed. When receiving feedback from human, the system check whether the first picture/text matches, if yes, it has some confidence that the second picture/text also matches. To handle those cases that the second pair failed to match, the system will send the same picture multiple times and use the most popular answer as the final result.

This works very well and the reCAPTCHA is acquired by Google in 2009. But Luis didn’t stop there, now he is introducing another great ideas called Duolingo. The problem Duolingo wants to solve is translate the web into different languages and the challenges for this are:
- Lack of bilinguals
- Lack of motivation

The way to solve this problem is:learning by doing for language learners.

When doing language translation exercises, the learners are given real world sentences that come from the web translation problem. This solution is pretty good because it can solve the web translation problem because there are so many language learner in this world, and it also has positive feedback look to solve the motivation challenge:
- Learn with real content, thus learners has good exercise to improve their skills
- Fair business model for language education, thus learners can learn for free since he had contributed some valuable stuff while learning

Luis called his problem as duolingo and I think this project is very promising and super attractive.


VLDB 2011 Trip Report

I attended VLDB 2011 during 29/08 ~ 01/09 in Seattle. Here are some brief observations and reports for this conference. Due to broad area coverage of VLDB, I just focus on SystemSearch and Transaction related materials.
(Disclaimer: due to long paper list and lacking of strong DB background, it may contain misunderstandings and personal biases, feel free to follow up and comment.)

=Basic Info=

VLDB is one of the top conferences in DB community (others are SIGMOD and ICDE) which focus on managing data and system for data management. For VLDB 2011, there are:
- 30 research sessions with 104 papers presented and 4 industrial sessions with 12 papers presented in 5 parallel tracks
- 8 out of 104 research papers are contributed by Microsoft people
- 31 out of 104 research papers are first authored by Chinese people (Domestic + Oversea)
o Mainland: 4
o Hong Kong: 4
o Singapore: 6
o Oversea: 17
Industrial Participation:
- Microsoft hosted the reception dinner on 30/08
- Google/Facebook/EMC had their recruiting/advertising booths at the conference site
Best Paper:
RemusDB: Transparent High Availability for Database Systems

=Topics and Trends=

Hot Topics:
- Graph and Social Data Management: 5 sessions (17 papers), 2 tutorials
- Big Data Analyzing and Infrastructure: 3+ sessions (12+ papers), 1 tutorial
- Streaming and Realtime data analyzing
Traditional system topics for DBMS:
Query Processing session covers:
o Partition the storage and querying processing of native XML database
o Use a new GroupJoin operator to speed up GroupBy and Join query
o Optimize Similarity Join using sensitive hashing
Transaction Processing session covers:
o Scale OLTP system on shared-everything architecture using logical + physical partitioning
o Recovery algorithm implementation and optimization in DBMS where data management and transactional functionalities are separated
o New transaction semantic and isolation level definition for cooperated traditional transactions
o Hyder’s optimistic concurrency control algorithm in fast network and storage settings
It’s amazing that we found there were several (distributed) systems related sessions, where several papers are highly related to some on-going projects in our group:
New Hardware Architecture, it covers
o Main memory based, column/row hybrid storage engine driven by application trace
o Compiling query statement directly into binary native code rather than iterator based execution plan
o Parallel B+ tree algorithm for many core processor
Cloud Computing and High Availability, it covers
o Database storage live migration
o High available Database based on reliable Virtual Machine
Distributed System (2 sessions), it covers
o Selectively partial replicating large-scale web databases
o Paxos based high available datastore
o DBMS like indexing on overlay network
MapReduce and Hadoop, it covers
o Adding data co-location optimization in Hadoop for column-oriented storage application
o Automatic optimize Hadoop program using code analyzing
- GPU-based Architectures and Column-Store Indexing, it covers
o Sparse matrix-vector multiplication by leveraging GPU
o Transaction execution on GPU
o List intersection and index compressing using GPU
So 13 out of 104 research papers in VLDB are in system style. It’s also amazing that all system related sessions are crowded with audiences and there are active Q/A after the presentation. While in other sessions that I happened to attend, there are relatively small numbers of attendees and the session is also pretty quiet. System related publication institutes cover CMU, IBM Research, Intel Research and Yahoo!.

=Notable Papers=

Here I only focus on system, search and transaction related papers.
- RemusDB: Transparent High Availability for Database Systems
Umar Farooq Minhas, Shriram Rajagopalan, Brendan Cully, Ashraf Aboulnaga, Kenneth Salem, Andrew Warfield
This work was rewarded as best paper in VLDB2011 and came from Waterloo University.
The paper proposed the idea of making DBMS high available by leveraging VM HA technology called Remus and doing some DBMS specific performance optimizations for it. The paper first explained why Remus can be used to do DBMS HA without breaking ACID properties and then discussed 4 (3 memory related, 1 network related) DBMS specific optimizations.
To reduce the size of checkpoint synced from active to standby node, they put page content diff, not the whole original page content to checkpoint since most modifications between consecutive checkpoint touch only part of a page.
To avoid checkpointing pages that can be read from disk, they also track disk read operations and put some metadata into checkpoint data and standby server can use these small size metadata to reconstruct those memory pages.
They also implemented an interface to let application developer mark pages not replicated explicitly but didn’t use it in this paper since it has negative performance impact for DBMS software.
The previous optimizations seem not very DBMS specific and are pretty general. Other applications can also benefit from it, so I think it should be optimization work for Remus.
Yet another optimization is DBMS specific: it leverage transaction semantic to avoid Remus’s TCP packet level protecting. In this optimization, the underlying VM only need to protect Transaction related message such as acknowledge to ABORT and COMMIT message from client. This will reduce the latency a lot for irrelevant messages, such as those that comprising the transaction itself.
The ideas seem simple and easy to understand, the results seem very good and the work is done in real world code base: XEN, MySQL and PostgreSQL. These are probably the reasons why it is voted as best paper, although the innovation and technical challenge are not that big in system guy’s eyes.
There are some obvious drawbacks for this work:
o First, it only works with VM, which has some overhead especially for DBMS like applications since it is very I/O sensitive. The paper didn’t mention the overhead of running DBMS inside VM
o It only compare performance with raw Remus, not with other HA technologies, such as MySQL HA cluster. Building HA DBMS using VM may not be the correct way compared with other alternatives.
o Remus’s checkpoint technology don’t have knowledge about the transaction running inside it, so the standby server is consistent with active server in system level, but not transaction level. I.E., the latest state of standby server may not be consistent in terms of ACID, so it can’t be used to serve reading requests under specific isolation level.
- PLP: Page Latch-free Shared everything OLTP
Ippokratis Pandis, Pınar Tozun, Ryan Johnson, Anastasia Ailamaki
This paper aims to improve the salability OLTP system on many core system by combining existing logical (shared everything) and physical (shared nothing) partitioning technologies. The idea seems pretty elegant and the work seems to be very solid in both system’s perspective and DBMS’s perspective.
The meanings of “shared everything” and “shared nothing” in this paper are not the same as in distributed/parallel DBMS settings. They are the technologies used to eliminating the contention bottleneck of OLTP system on many-core platform: the former term refers to the technology of assigning different range of the same shared table to different thread to avoid high level locking among OLTP threads and the later one refers to the technology of partitioning the underlying data pages of one table and assigning each partition to different database instance.
PLP combines these two technologies by a new design called Multi-Rooted B+ Tree:
o Each logical partition has its own sub B+ tree as index, which is similar to shared nothing design
o The underlying data pages are shared among all logical partitions, which is similar to shared everything design
o Transaction manager will divide each transaction into a DAG of tasks, each task is within partition boundary and assigned to dedicated thread for that partition
Thus, this new design remains the benefit of contention free among transaction threads, low cost of repartitioning/rebalancing and eliminated the need for distributed transaction for cross-partition transactions.
But the work is based on a research prototype called Shore-MT which is built by WISC/EPFL, if it’s on top of popular open source DBMS such as MySQL or PostgreSQL, the work will be more convincing and making bigger real world impact.
- Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore
Jun Rao (LinkedIn), Eugene Shekita (IBM Research – Almaden), Sandeep Tata (IBM Research – Almaden)
There was a paper talks about using Paxos to build reliable data store in NSDI 2011 and here comes the similar story for DB. But it’s not a transactional storage, just a key/value style structured storage. And also, the system architecture and the protocol are very similar to that of PacificA.
The system, which is called Spinnaker, is a replicated range partitioned reliable structured storage that providing put/get like operations. The master is based on Apache Zookeeper.
The replication protocol is essentially a combination of two phase commit, majority consensus and group commit. It differs with PacificA on that it only requires majority members’ ack before doing the real commit at partition leader node. Follower recovery is simple and straight forward – learning to catch up with leader state. As for leader recovery, it uses Zookeeper to choose the follower that has the highest prepare number as new leader.
And another trivial difference with PacificA is that it allows reading at follower nodes by providing so called time line consistency semantic.
Replication and consistency is always a hot topic in DB conferences, both VLDB and SIGMOD has dedicated session.
- Column Oriented Storage Techniques for MapReduce
Avrilia Floratou (University of Wisconsin-Madison), Jignesh Patel (University of Wisconsin-Madison), Eugene Shekita (IBM Research – Almaden), Sandeep Tata (IBM Research – Almaden)
This paper presented several techniques to build column oriented structured storage and analyze engine on top of Hapood.
One is for storage enhancement:
o One column is stored as one file in HDFS
o Multiple related columns’ files are collated together by adding a new data placement policy for HDFS
Second technology is lazy record deserialization. The author argued that modern analytical applications are processing more and more complex data types, deserializing them from byte stream is pretty expensive. But most applications only need to process part of the whole records. So they proposed an idea:
o Deserialize only small part of a complex object to determine whether the object is needed to be processed
o Fully deserialize objects that need to be processed
Another optimization is using some specific technology to compress record, for example dictionary based schema to compress text string.
These works seems very trivial and incremental improvement for Hadoop system.
- CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop
Mohamed Eltabakh (IBM Research – Almaden), Yuanyuan Tian (IBM Research – Almaden), Fatma Ozcan (IBM Research – Almaden), Rainer Gemulla (Max-Planck-Institut für Informatik), Aljoscha Krettek (IBM Germany), John McPherson (IBM Research – Almaden)
This paper adds a new data placement policy for HDFS in Hadoop and uses it to speed up Join and Sessionized query like log processing tasks.
They observed that many log processing jobs need to process data partitions from different HDFS files, so placing correlated data partitions from different files will speed up the processing since it eliminate many data shuffle and remote I/O.
And also, although it starts from different angle, this paper convers only part of the work in previous paper.
- Automatic Optimization for MapReduce Programs 
Eaman Jahani (University of Michigan), Michael Cafarella (University of Michigan), Christopher Ré (University of Wisconsin-Madison)
This paper proposed the idea of using static code analyzing to improve the performance of unmodified Hadoop jobs.
But this work only uses analyzing result to do storage related optimization:
o Using index to pre-prune useless record for mapping function by selection analyzing
o Pruning useless field for map/reduce by projection analyzing
o Application level compression optimization
They didn’t do any query plan wide optimization using code analyzing result, there seems to be a lot of promising future works here.
- Where in the World is My Data?
Sudarshan Kadambi (Bloomberg), Jianjun Chen (Yahoo!), Brian Cooper (Google), David Lomax (Yahoo!), Raghu Ramakrishnan (Yahoo!), Adam Silberstein (Yahoo!), Erwin Tam (Yahoo!), Hector Garcia-Molina (Stanford University)
This paper proposed an idea to replicate structured data table at record level rather than traditionally table/partition level. The basic reasoning behind this idea is that: most popular website contains global data, but access at different geographical sites have different focus of the global data. They call this idea as selective replication.
In their design, record replicas are divided into 3 types:
- Master, where write/update operation can be executed, asynchronously notify full replicas about the update/write
- Full, where read operation can be executed
- Stub, where only primary key is stored and R/W operations are forwarded to proper other replica
Given this setting, the paper focus on the placement of the 3 types of replicas and optimize it for bandwidth (forwarding/replicating) savings. They introduced static/dynamic placement policy and defined a language to specify replica placement constraints (for example, total replica, forced full replica sites etc)
The major difference between static and dynamic placement policies are that dynamic method can leverage historical access pattern data to adjust the placement and make better bandwidth cost. To reduce the bookkeeping cost of store/analyze historical access data, the dynamic policy is simplified as: promoting to a full replica when we see a read at a stub replica; demoting to a stub replica when full replica is notified to update but not read for a period of time.
Their experiment shows that in a 10% remote friend setting, bandwidth used can be improved by 2x.
The value of stub is that it can reduce one message round trip and avoid master hot spot in case that the placement is not optimal.
- Cross row transaction is not supported
- Require a primary key for each record
- Fast Set Intersection in Memory
Bolin Ding University of Illinois at UrbanaChampaign, Arnd Christian K¨onig Microsoft Research
This paper described a fast intersection algorithm for in memory set. The basic idea is: use machine word to encode set elements and use bitwise-AND to accomplish intersection. The author reported about 3x performance gain compared with inverted index based set intersection.
The main problems of this work are:
o it requires complicated and costly preprocessing and dynamically updating the set is also not easy
o it’s only applicable for in-memory big/small set intersection and the result scale should be small
- Efficiently Compiling Efficient Query Plan for Modern Hardware
Thomas Neumann (Technische Universität München)
This paper described a new DB query processing architecture that compiles query statement into machine code directly using LLVM. Current DB query processing is based on iterator model and the advantage of this model is the flexibility and pipelining. But the disadvantages are: 1, it will call next() for each record for each iterator, which results lots of function calls; 2, usually, the next() function calls are virtual functions, this makes the function call cost more expensive; 3, poor code locality for one query execution. So the author tried compiling query plan directly into machine code and the previous 3 drawbacks are eliminated.
But the problems of this approach are that the compiling cost, whether is LLVM code optimizer is good enough for DB query and how to adopt existing query optimization technologies in this method.
- Efficient Parallel Lists Intersection and Index Compression Algorithms using Graphics Processing Units
Naiyong Ao, Fan Zhang, Di Wu, Douglas S. Stones, Gang Wang, Xiaoguang Liu, Jing Liu, Sheng Lin
This work comes from Baidu-Nankai joint lab and it aims speeding up index encode/decode and serving by leveraging GPU. I am not familiar with GPU programming, so skip the content here.
Industrial Sessions:
Inspector Gadget: A Framework for Custom Monitoring and Debugging of Distributed Dataflows
o Yahoo! Reported a tool used to monitor and debug query processing dataflow.
Jaql: A Scripting Language for Large Scale Semistructured Data Analysis
o IBM reported a script language over hadoop called Jaql to do large scale semi-structured data analyze which is used in its InfoShpere product.
Tenzing – A SQL Implementation on the MapReduce Framework
o Google reported their HIVE copycat called Tenzing, which extended standard SQL with support for advance analysis. The paper also contains some MapReduce enhancement.
o This is probably the hottest paper in VLDB2011 and many famous DBMS gurus are crowded in the meeting room, probably due to the hot debate on Map/Reduce VS Parallel DBMS several years ago.
o Google adopted many technologies learned from Parallel DBMS and Dryad to improve Map/Reduce in order to build a low latency SQL compatible query analyzing engine, which “partially answered the previous debate” (Google presenter’s words).
- Citrusleaf: A Real-Time NoSQL DB which Preserves ACID
o A company called Citrusleaf reported their real-time NoSQL DB that supports ACID, which is called Citrusleaf and is also widely used in some of the world’s largest real-time bidding networks.


Baidu Tieba Architecture

An architect of Baidu, who is being in charge  of the technology of Tieba product, gave a brief introduction on the back end technologies of this famous web application in the June activity of Baidu Salon.
Here are some notes on this speech:

Part I – Application Scale

1. Not just simple plain forum, but also photo/video/gaming
2. Includes front end, storage, anti-spamming, searching and mining
3. Numeric facts
- Bs of topics
- 10Bs of posts
- 10Ms of posts for single hot topic
- Ps of video data
- 100K+ QPS from client web browser
- 10K+ per second update message forwarding (I doubt this number)
- 100s service

Part II – Backend Technology: lightweight framework

For 80% common situations
- prefer InnoDB than MyIASM with some modification (on disk writing pattern, with 10x perf gain)
- application optimization:
* avoid joining (by break normalization?)
* auxiliary index
* data locality
- single node numbers
* Ks of QPS
* 100Gs of Data
- mySql clustering
* master/slave for write/read separation
* home brewed request dispatcher (for easier programming and load balancing)
2 Cache
- hit ratio around 80%
- 10k ~ 100k QPS
- multiple granularity (page, picture, data item etc.)
- challenge: cache updating, writing request pressure
3 Flash Disk
- 5x – 10x perf gain without extra effort
- huge improve on random access
- size limitation: 500G (SSD) vs 10T (HDD)

Part III – Backend Technology: Heavyweight Infrastructure

For 20% rare scenarios
1. Partitioning
- Virtual, partitioned by application
** topic and post are separated
** relationship(list) and content are separated
- Horizontal, partitioned by key
2. Message Queue
- Reliable multicast communication system
- Handling mutation requests (I guess)
- Peak tps:100K+ (really?)
- It can only solve updating reliability problem, but seems that the speaker claims it also solves the scalability problem
3. In house storage node
- speedup by transforming random write to batch/append write
- memory patch: (background merge [mem + disk] in my understanding)
- write ahead logging for reliability
- highly optimized for application
4. In house distributed KV store
- for video storage
- replication (driven by MQ) for reliability
- append only
- Peta bytes scale

Part IV Backend Technology – Clustering Management

1. Most website are basically SOA architecture
- 100+ standalone small services
- service orchestration for single user request
2. Challenges in this architecture
- service/data upgrading
- failure handling
- performance variation
3. Service Management
- service metadata center management
- service registration and notification
- hide service cluster from application caller
- auto failure handling and load balancing
- (why service notification but not just try and ask registry if failed?)

Part V – Summary

It seems that there is nothing new in the presentation, all related technologies are well known. But its value lies in the fact that it gave us a high level overview of how today’s various famous web service is implemented and many numeric facts about this product.


0. Baidu Salon site
1. Speaker introduction and video
2. Speech ppt


Google Plus – the Inside Out

The Wired magazine recently published a great story on the origination and development of Google+. The author has many inside information about Google and the Google+ product, so the story contains many useful and insightful information about Google’s people-centric movement. I noted some of my understanding and comments here:

1. What’s Google+ ?

Basically, Google+ is Google’s social networking initiative to turn this algorithm-centric giant to be more people-centric. Currently, it consists of the following major components:
- Stream: Stuffs that are shared by people you care about (circled by you in G+ world), very similar to what twitter/facebook provide.
- Spark: Stuffs that pushed to you by Google according to your specified interesting.
- Hangout: Web based multi-user video chat service.
- Circle: A multi-dimensional way to organize your online social networks.

But this is just the very basic introduction, Google+ is more than just those even in today’s service. More detail follows.
2. Why Google+ ?

The Google+ is big product (or product umbrella) mainly driven by Vic Gundotra, SVP of Google Social Division, a former general manager at Microsoft in charging .Net/Live developer ecosystem.

The major driving forces of Google’s social efforts come from:
- Challenges from other pioneers such as Facebook. Facebook refused to open its content and connection data to Google while it gets more and more popular. People in Google worry that Facebook may use those valuable user contributed data to build a even better people-centric search engine that beats Google.
- Internet paradigm shift. The Internet and application in it become more and more people centric, which is not the same as when Google’s founded:

“The internet is nothing but software fabric that connects the interactions of human beings, every piece of software is going to transformed by this primacy of people and this shift.” -Gundotra, SVP of Google social
3. The History of Google’s Social Efforts

January, 2004, Google launched it’s social networking service – Orkut, developed as spare time project by Orkut Büyükkökten while working at Google.

2007, Google start a initiative called Open Social to establish a open standard for social applications and platforms.

2009, a social networking based communication tool called Wave was introduced during Google I/O.

2009, a twitter like product called Buzz is integrated into Gmail.

Non of them had been considered as a successful product, but Google’s social networking efforts continues.

March 2010, only a month after the Buzz debacle, Google’s head of operations, Urs Hölzle, sent out an e-mail evoking Bill Gates’s legendary 1995 Internet Tidal Wave missive to Microsofties. Hölzle acknowledged that fundamental way people use the internet has changed. He did started some social networking related projects within Google and his memo became known as the Urs-Quake.

May 2010, 50 of Google’s top people gathered together to discuss the challenges faced by the search giant. Amit Singhal, one of the company’s most respected search engineers, urged that Google dramatically expand its focus to create a hub of personalization and social activity.

The Google leadership team adopted Singhal’s suggestion and code named the projects as: Emerald Sea. Gundotra made a pitch to lead the Emerald Sea project, and got the nod. Bradley Horowitz became his co-leader and collaborator.

Google VP of product management Bradley Horowitz (L) and Vic Gundotra, Senior vice president of social for Google. (from [1])
4. The Birth of Google+

- It got started just after the May meeting, and covered 18 current Google products, with almost 30 teams working in concert.
- It produced a working prototype 100 days after the May meeting (August 2010).
- It became ready for dogfood around October 2010.
- It got its first 50 users by email invitation, 600+ in around one hour, 90% of Google employe within one day during dogfood.
- The first round of dogfood feedback is not very positive due to lacking of tutorial and feature complication – hard to comprehend and hard to use .
- It is refactored and re-conceptualized according to feedback: some features are delayed to future release, some are separated out as other standalone features, such as the +1 button.
- It rolled out the second round dogfood with selected people within Google in Spring, 2011 and got positive feedback.
- It started its field test @ June 28, 2011, where external users can experience this product in invite-only way.
5. Feature Drill down and Insights

Stream – ordered shared items from you social graph. It’s a pretty typical social networking feature that is provided by twitter, facebook and weibo. But it has its uniqueness:
- It has no limitation on the word count of item, while Twitter/Weibo limits it to 140
- It has +1 button and can be commented with instant update to online readers
- It can be filtered by author groups, which is a very handy feature when you follow large amount of people

Spark – streamed items from Google according to the topics you explicitly specified. Sounds like a normal search query result page but Google had adjusted the filtering and ranking policy to make it more suitable for sharing in Google+ world. It favors more on fresh, social popular and visual items discovered from the web.

spark is the way Google try to understand your unique interests and feed you with related information. But it may also be the cover that hide the facts that Google is using the privacy related information from Gmail content and your search history to know more about your interests.

Circle/Sharing – offers a simple means of organizing one’s social network so that your sharing is micro-targeted: you organize your social network into various (maybe overlapped) circles and share items to specific circles. It may be the most important and also most controversial feature in Google+.

Some people said that it help them control who will see shared items but others said that it makes sharing action very complicated and the whole social network become very hard to manage and understand.

In my personal experience, it’s a over designed feature. I am forced to think/select what’s the target audience when I want to share something online, which break the famous UX design rule: DON’T MAKE ME THINK. And also, it’s very hard for a user to understand thoroughly exactly who will ultimately see the item I am going to share.

How many people on this planet has enough patient to fully understand this logic and exercise it each time when he want to share an interesting item?

The idea of circle and multiple social network is said to be the result of the following research result:

View more documents from Paul Adams

Google claims that it create the idea and concept of circle because it behaves exactly the same way as our real social experience. Let’s assume that it does behave exactly as real social activity, but will it better to behave the same as reality? I don’t think so. We spend more and more time on online social activities because it’s different (in positive way) from the boring real society. For me, I use various online social service because it’s more convenient for me to keep in touch with real friends and it’s more open and easier for me to get know more friends, especially those that aren’t available in real life. If the online society is the same as the real one, what’s the attractiveness of the online social service? I feel Google’s circle concept is making the online social more enclosed, more complicated to understand and master.

There are some other critics said “SNS just do what virtual world should do, let some other stuff happen in real world” and “in real life, the circle is not chosen when you want to convey some message, rather, you choose what to say when you are in different situation and different circle”.

I do admit that there are some situations that I didn’t want my message to be visible to some one in my social network. But it’s better to be fulfilled by a feature: selecting what’s the target user you want to hide your message/status from, I.E., you need to do minus rather than addition. Here, the minus operation is easier to understand and involves less thinking.

But circle is a good idea for streaming stuff filtering especially when you follow many people and they have different message updating cycle.

6. Misc

- “There are only a few emotions that can effect change at a large organization,” he (Gundotra) explains. “One is greed and another powerful one is fear.” Outright greed is gauche in the Googleplex, so Gundotra prepared a slide deck that mocked up challenges from Google’s competitors (notably, Facebook), illustrating how each company could turn Google upside down.

- Emerald Sea has been the rare initiative in Google where the company was not breaking ground but defensively responding to a competitor’s success. (One engineer has described this process as “chasing taillights,” noting that me-too-ism has never been a strength for Google.) It’s also, claims Gundotra, the most extensive companywide initiative in Google’s history.

- “We put the product to [dog food] before it was fully baked, before we hardened the system and polished it and knew what we were doing,” says Horowitz. “We had no getting-started screen, no intro video. It was hard for people to get their hands around what it is and how to begin interacting with it. It was as if Facebook had been in stealth mode for seven years and then launched in its entirety at once today — it would have been an overwhelming, hard-to-comprehend, hard-to-understand system. The feedback we was got was: Simplify.”

- No one expects an instant success. But even if this week’s launch evokes snark or yawns, Google will keep at it. Google+ is not a product like Buzz or Wave where the company’s leaders can chalk off a failure to laudable ambition and then move on. “We’re in this for the long run,” says Ben-Yair. “This isn’t like an experiment. We’re betting on this, so if obstacles arise, we’ll adapt.”

- Because of the pressure the stakes and the scale, Gundota insisted that Emerald Sea should be an exception to Google’s usual consensus-based management style.

- “This is a top-down mandate where a clear vision is set out, and then the mode of moving forward is that you answer to Vic,” Rick Klau told me last year. “If Vic says ‘That looks good,’ then it looks good.”



Things Learned from “from big idea to thriving business in 8 short years”

Just read a great story from a programmer on building his own business little by little: My Startup Story: from Big idea to Thriving Business in 8 Short Years. Its greatness lies on not how big his business is, but on how his business gets bigger and bigger and on how he deal with various problems encountered in the long journey. Here are some of the lessons I learned from reading this story:

1. dream big but execute step by step;

2. make plan but adopt adjustment agilely;

3. solve real problems from users;

4. care about user feedback;

5. passion for technology;

6. continuously improving;

7. have business sense;

8. listen from others;

9. think beyond today and be aware of future crisis;

10. forget about yesterday’s success and dare to restart again;

11. retrospect on existing products and dig deeper;

12. build ecosystem


QConBeijing 2011 Slides

QCon Beijing 2011 had been hold from 08/04 ~ 10/04, it contains many great presentations about scalable website architecture, engineering management and performance tune tips. Here is the ppt download links: http://idning-ebook.googlecode.com/svn/trunk/Qconf2011/

If you can read Chinese, following is another list:






Memory Issues on Multicore Platform

On multi-core platform, pure computing is cheap since there are many processing unit and memory capacity may also not be problem since it's becoming larger and larger. But memory bandwidth remains the bottleneck all the time because it's a bus that is shared by all CPU cores. So efficient memory management is very critical for a scalable application on multicore CPU.

In this article I will point out some memory related problems regarding multicore architecture and also some solutions.

Part I - Memory Contention

Memory Contention means that different cores share a common data region(in main memory and cache) that needs to be synchronized among them. Synchronizing data among different cores has big performance penalty because bus traffic contention, locking cost and cache miss. To deal with such problem, there are two strategies:

1. Don't Share Writable State Among Cores

To minimize memory bus traffic, you should minimize core interactions by minimizing shared locations/data, even if the shared data is not protected by lock but some hardware level atomic instructions such as InterlockedExchangeAdd64 on win32 platform.

The patterns that tend to reduce lock contention also tend to reduce memory traffic, because it is the shared writable state that requires locks and generates contention. In practice, letting each thread work on its own local copy of the data and merging the data after all threads are done can be a very effective strategy.

Let's see two parallel versions of sum calculation program on an eight-core computer. One version uses a shared global variable protected by InterlockedExchangeAdd64() to track all intermediate results among all threads. The other version gives each thread a private partial sum variable that's not shared at all and the final sum is computed as the sum of all these partial sums.

From the console output we can see clearly that, the private partial sum solution is 20x faster than the other one.

Use Global - Total Sum is:49999995000000, used ticket:904.
Use Local - Total Sum is:49999995000000, used ticket:47.

So, even if we just share one variable protected by hardware atomic instructions, the performance penalty could be very significant.

The general rule for efficient execution on a single core is to pack data tightly, so that it has as small a footprint as possible. But on a multi-core processor, packing shared data can lead to a severe penalty from false sharing. Generally, the solution is to pack data tightly, give each thread its own private copy to work on, and merge results afterwards.

2. Avoid False Sharing introduced by Core Cache

Good performance depends on processors fetching most of their data from cache instead of main memory. For sequential programs, modern caches generally work well without too much thought, though a little tuning helps. The smallest unit of memory that two processors interchange is a cache line or cache sector.

Even if we follows the strategy 1 and let each thread access its private data/state, different thread on different cores may also share the same cache line. This is called "false sharing". Avoiding false sharing may require aligning variables or objects in memory on cache line boundaries. 

Let's use a parallel number increaser to see what's the performance penalty of false sharing. In the first version, each thread will modify some thread specific number variables, which are aligned together (so will be packed in the same cache line). In the second version, those variables are located on non-continuous places.

The performance related number would be:
Total Time:2012 (for first version)
Total Time:468 (for second version)

We can see that, false sharing introduced about 5x performance penalty. Avoiding false sharing may require aligning variables or objects in memory on cache line boundaries, so that each core accesses a private cache line that is not shared with others.

Part II - Heap Contention

Most developers manage memories using standard C library malloc/free or standard C++ library new/delete, some of them using OS APIs, for example, HeapAlloc()/HeapFree() on windows platform.

C/C++ standard memory management routines are implemented using platform specific memory management APIs, usually based on the concept of Heap. These library routines (whether is single thread version or multi-thread version) allocate/free memory resource on a single heap, which is usually called CRT heap. It's a global resource that is shared and contended among threads within a process.

This heap contention is one of the bottle neck of multi-threading applications that are memory intensive. The solution is to use thread local/private heap to do memory management, thus the resource contention is eliminated. On windows platform, this means that you need to create a dedicated heap using HeapCreate() for each thread and pass the returned heap handle to HeapAlloc()/HeapFree() functions.

Let's see this Global Heap Vs Local Heap example on Windows platform

On an 8-core system, perf test result using 8 threads are:
8 core time:59282, use global heap? true.
8 core time:20112, use global heap? false.

Using private heap will get around 3x perf gain.

- On windows platform, heap_no_serialization flag can be set when creating a heap, this means that there will be no synchronization cost when accessing it from multiple threads. But it turns out that setting this flag to thread private heap will be very slow on vista and later operating system.
- The reason is that in vista, Microsoft refactored the heap manager code, where some extra data structure and code are removed who is no longer part of the common case for handling heap API calls.
- Heap_no_serialization and some debug scenarios will disable Low Fragment Heap feature, who is now the de facto default policy for heaps and thus highly optimized.

Part III - Dynamic Creation/Free of C++ Object

Operator New/Delete are functions, which are the C++ version of malloc/free and responsible for create/release memory only. It has global version ::operator new and class level version (static member) class-name::operator new.

But New/Delete Operator will handle object construction and deconstruction besides memory management. It's a language operator just like +, - * / and others. New/Delete operator will call global operator new/delete or class specific operator new/delete if requested class has such operator functions.

In order to fully parallelize your application that may use some STL containers, you might need to write your own allocator to leverage thread private heap or some memory pools. Thus, your business logic is the same as single core version and contention bottle neck is eliminated at the same time.

Here is the example on writing your own operator new/delete and allocator.


Hehalem Architecture

Cache Organization and Memory Management of the Intel Nehalem Computer Architecture

Cross-Platform Get Cache Line Size

Understanding and Avoiding Memory Issues with Multi-core Processors

Thread/Data placement for better/consistent performance on Multi-Core/NUMA Achitecture

Parallel Memory Management(Allocate/Free) Intensive Applications on Multi-core system
English Version - http://www.codeproject.com/KB/cpp/rtl_scaling.aspx
Chinese Version - http://blog.csdn.net/arau_sh/archive/2010/02/22/5317919.aspx

Intel Guide for Developing Multithreaded Applications

Windows Heap Management/Performance

Memory Optimization for the entire C++ program

C++ Dynamic Memory Management Techniques

Understanding Operator New and Operator Delete

C++ Standard Allocator - Introduction and Implementation

Improve Performance by Allocator using Pooled Memory

Improving STL Allocators

Anatomy of the Linux slab allocator


Tips for Smart Pointers in C++

Part I - Brief Summary for Various Smart Pointers

1. auto_ptr
- RAII and transfer-of-ownership semantics based, but no shared-ownership
- Managed heap object will be owned by one and only one
- Assignment/Copy Construction will transfer ownership
- Can be compiled with STL containers, but wrong semantic

2. scoped_ptr
- RAII semantic based, but no shared-ownership, nor transfer-of-ownership semantics
- Managed heap object will be owned by one and only one pointer
- Assignment/Copy Construction are forbidden
- Can't be compiled with STL containers

3. shared_ptr
- Reference count based
- Managed heap object could be owned by multiple smart pointers
- Assignment/Copy Construction will add ownership
- To avoid memory leak, don't construct temporary shared_ptr object on function call parameter
- Can't construct a shared_ptr object from this pointer (Causes double deletion) 

4. intrusive_ptr
- Basically the same as shared_ptr
- Shared ownership of objects with an embedded reference count
- Can be constructed from an arbitrary raw pointer of type T *
- Try shared_ptr first, if  it isn't obvious whether intrusive_ptr better fits your needs

5. weak_ptr
- Just reference, no ownership, no RAII, no shared-ownership, no transfer of ownership
- Linked to a shared_ptr object and known by it
- Shared_ptr will reset weak_ptr when it decides to destroy the dynamic object owned by it
- It's a safe(no need to worry the dangling reference) way to reference a dynamic object but don't own it
- A nice feature of weak_ptr is that, it can access the internal state of corresponding shared_ptr object 

6. unique_ptr
- C++0x  introduced a new scoped_ptr like pointer: unique_ptr to replace auto_ptr.
- It hide assignment operator and copy constructor
- Transfer-of-ownership can be done using std::move() explicitly

These smart points are only suitable for single dynamic object, for object array, use other smart pointers whose name ended as "_array".
Part II - Tips for shared_ptr

1. shared_ptr VS weak_ptr
- shared_ptr owns some heap object
- weak_ptr points some heap object 

2. Handling this Pointer

It's safe to construct a shared_ptr object from a newly created heap object since it's not managed by any other shared_ptr object yet. But when you want to pass this pointer to a function that expects a shared_ptr object, you will encounter a tricky problem because most likely, the heap object is already created and managed by other shared_ptr objects.

The problem is that, in general, you can't create a shared_ptr from an existing raw pointer - the new shared_ptr you create won't "know" about the other instances that refer to the same object and you'll get multiple-deletes.

2.1. Use enable_shared_from_this from boost library

You can derive from enable_shared_from_this and then you can use "shared_from_this()" instead of "this" to spawn a shared pointer to your own self object.

How it's implemented?
- Add a weak_ptr member to point to an existing shared_ptr object that manages this object
- When shared_ptr object get constructed from raw pointer to a this kind of object, it will properly set the weak_ptr inside that object
shared_from_this() will construct a safe shared_ptr object from the weak_ptr member
- In boost shared_ptr implementation, the "sp_enable_shared_from_this()" function will get called in shared_ptr's constructor. In this function, if the passed in dynamic object derives from enable_shared_from_this, it will set the weak_ptr member using itself.

If you adopt this method, you should be careful not creating such object on stack. Because when creating object on stack, the object is not managed by any shared_ptr, so no shared_ptr's constructor gets called and the corresponding weak_ptr member won't get set properly.

2.2 If you know that your object is long lived, you can do the following:

struct null_deleter
template void operator()(T *) {}

Then in your code, just return a shared_ptr(this, null_deleter()).

3. Handling Null Valued shared_ptr Object.

When you are using shared_ptr in your code, sometimes you need a NULL equivalent stuff to represent a pointer that didn't point anything meaningful.

Generally speaking, you have the following choices:
  • Return iterators and the end iterator if not found
  • Boost::optional
  • Silly return codes
Out of all the options boost::optional & exceptions (when there really are exceptional circumstances) are the best methods, if you are dealing with containers return an iterator to end and test for the end iterator.

Returning Zero/Null for smart pointers is acceptable in some cases too, when the other alternatives don't make sense. Consider the following code:

class some_class_name{
template<typename T> operator shared_ptr<T>() { return shared_ptr<T>(); }
} nullPtr;

Use this template function when any boost::shared_ptr<> typed null pointer is needed.


smart pointers overview




shared_ptr for this pointer