Disciplines in Microsoft Engineering Team

I really want this blog to be a place to express my own ideas and thoughts, but I don't refuse reference other people's great ideas, especially when they are really helpful for me or potential readers.

The following content is copied from a MSDN blog post named- Product Development Disciplines at Microsoft, I just highlighted some lines.

"Over the last several months in my role here in China, I have given talks at several leading universities and met with many of the leading faculty and students working on technologies related to the Data Platform. I’ve also spoken at several industry conferences, meeting with customers, partners, analysts and other industry folks. There are many topics that come up at these meetings – changing technology trends, distributed development, the tremendous growth of Asia etc. But one topic that seems to come up more than almost any other is the question of how we organize and conduct our product development in Microsoft. I suppose this is only natural – Microsoft is one of the most successful software companies in the world, and the software industry here in this region is poised for tremendous growth, so it makes sense that people in the industry are eager to learn from the our experience over the last quarter century.
This is actually a very big topic and within Microsoft we have an Engineering Excellence group that actually runs courses that can span several days and provide an overview of Microsoft’s software development methodology, our engineering system, organizational structures, best practices, tools and technologies we use internally ensure quality, reliability, security etc and a variety of related topic. By no means would we claim that we have all this figured out perfectly and have a perfect system, but there is indeed a lot of accumulated knowledge and experience that we can share. And we do actually share this information, in appropriate form, with others in our industry, worldwide and also in this region.
As this is indeed a large topic, I don’t want to get too deep into this here, but I do want to address one aspect of our engineering system – the core disciplines that we organize our R&D teams around and the particular roles that each of these disciplines plays. I want to discuss this because I believe Microsoft does this a little bit differently from the rest of the industry even in the US, and especially here in China there is not a good understanding of these core disciplines and what role each of them plays.
Traditionally, the Microsoft engineering system has consisted of 3 “core” disciplines: “Development”, “Test”, and “Program Management”, also known as Dev/Test/PM for short. I’m going to touch on each of these briefly here, but I like to introduce them in a different order:
PM: When we think of engineering disciplines, most people start with “Dev”. For me however, things really start with the Program Management discipline. At Microsoft, “PM” means many different things, but for me the core essence of the PM role is two things:
1. The first part of the PM’s job is to understand the customer’s requirements and translate that into a functional specification of what we should build. This is where it all begins. If we don’t understand the customer, it is not very likely that we’ll end up building the right thing.
2. The second part of the PM’s job is to work with Dev and Test to translate the initial specification into a living, breathing product.
I find that many people, especially here in China, think “Project Management” when they hear PM. Indeed, Project Management is part of a PM’s job (under #2 above), but it is only a part of the PM’s job. The real skill that a PM brings is the expertise to listen to customers, understand the world from their point of view, and then to design a solution for their problem. This does not just mean giving customers what they ask for literally, but to truly understand them and design a solution that solves their problems even if the customers could never imagine the solution – as the famous saying goes, if we had only listened to customers, we would have looked for a faster horse, not come up with the automobile.
Dev: Of all the engineering disciplines, this one is probably the one people think about the most commonly. Dev is short-hand for “Development”, the folks who responsibility it is to actually design and build the software that we ship. The essential job of Dev is to take the functional specification produced by PM and translate that into an actual implementation. In the world of mission-critical system-level software, this implementation better be extremely reliable, secure, manageable, scalable and high-performance. And the designs and implementations Dev produces better stand the test of time and last for several versions and years to come.
Test: The test discipline in Microsoft is much misunderstood, certainly externally, but sometimes internally as well. When I first came to Microsoft many years ago, I was (pleasantly) surprised to find that Microsoft had almost as many, if not more, testers as developers. Coming from a company that had a much less developed testing discipline (and where as a result, quality assurance was considerably weak), it took a little while to get used to what the essence of the Test discipline really is. The reality is that, in Microsoft, how fast we can ship software depends on not how quickly we can design and implement it but rather on how quickly we can test it. This is because every piece of software we ship, especially on the systems-software side, has to pass an extremely high quality bar. The Test discipline is really an complex area, and one where have learned a lot over the years in terms of different types of testing that we employ – unit tests, functional test, integration tests, stress and long-haul tests, performance tests, security tests, localization tests, etc. The set of tools and techniques we employ in test is truly some of the most impressive and complex – automated test harnesses, automated test generators, automated test failure analyzers, automated security “fuzzers”, fail-point and state-machine based testing.
The three “core” engineering disciplines described above are like the 3 legs of a chair – you need all three of them, and in a balance, to have a proper engineering organization. No one leg can dominate the other – otherwise, you get an organization that may not be in touch with customers needs or one that does not pay enough attention to quality. Indeed, the three disciplines are a little bit like the branches of government – they form a system of checks and balances that ensures we understand what customers want, we design and build that with high quality, and we ensure that we deliver a product that meets customer expectations in every regard.
It is also important to emphasize that we aim to attract the best talent to all three core disciplines – the bar is equally high for all the disciplines, it just happens to be that the passion and skill-set for each is a little different:
- PMs usually have a passion for working with customers, conceptualizing what the product should do, and then working with their Dev and Test peers to coordinate all the work to make sure we deliver exactly that.
- Developers have a passion for building top-quality software – software that is innovative, simple, reliable, secure, scalable, high-performance and stands the test of time.
- Testers are passionate about finding all kinds of ways to break software and making sure making sure we find all the issues and bugs before we ship it to customers.
When we interview candidates, a very important part of what we do is find out which discipline the person’s talent and passion really lie in and directs them accordingly. Of course, over the course of one’s career, one’s passion and talent may change, and the person may change disciplines as a result – I myself started in the Dev discipline before switching to PM. This is only natural and we actually encourage that as a way to build better teams.
Other disciplines
It is also important to point out that although the three disciplines mentioned above are what have traditionally been considered the “core” disciplines at Microsoft, there are several other disciplines that are also becoming increasingly important. For example, User Experience (UX) professionals are essential to ensuring that products are intuitive and natural for users to use. A great user experience can make the difference a product that customers love versus one they merely tolerate. UX is certainly very important for products aimed at end consumers, but it is also important for all our audiences – Developers, IT Professionals, Information Workers.
As we move into the Software+Services era, a variety of disciplines related to architecting, building and running extremely large-scale infrastructure becomes increasingly important. Again, while this has been true for some time for our consumer facing web properties such as MSN and Live, it is now becoming increasingly important for all our product groups as more and more of them take steps to evolve their products along the Software+Services model.
Many candidates I talk to often want to discuss what role at Microsoft would be the best fit for them and how they can grow their careers. The best advice I can think of is to work on a technology and a role that they are really passionate about.
As I mentioned above, we value all the disciplines equally and a well-balanced organization needs great people in all the different roles. While different disciplines appeal to people with different passions and skill-sets, all the disciplines offer opportunities for innovation and great work. And all of them offer opportunities for advancement and leadership. Indeed if you look across the senior levels of Microsoft, there are leaders who emerged from various disciplines – what they shared was a passion for what the work they were doing.
I hope this discussion of the different engineering disciplines at Microsoft and the approach we take to them shall be useful for the many people who seem to be interested in this topic. If you have any questions or comments, feel free to post a reply to his entry."


Relevance Measuring in Information Retrieval System

One of the challenges Information Retrieval system faces is Relevance Quality. It's the main factor that determines end end user's happiness. (The other two are latency and corpus size)

To design and implement a IR system that has high relevance quality, we must have some methods to measure the quality of relevance.

Generally speaking, a measuring system consists of three components:
- Test Corpus (Document Collection for Test purpose)
- Test Query Set (Set of Queries for Test)
- Measuring Parameter (usually a function, used to measure the retrieval result of an IR system for some query in the query set, using the test corpus)

Test Corpus/Query is another story and we only focus on measuring parameter/function here.

1. Precision/Recall for un-ranked retrieval result

Precision = #Relevant Documents Retrieved / #Retrieved Documents, it's the percentage of the returned documents that are really relevant to the user query. (查准率)

Recall = #Relevant Documents Retrieved / #Total Relevant Document, it's the percentage of the relevant document in the corpus that is retrieved in the query result. (查全率)

2. NDCG for ranked retrieval result

NDCG stands for Normalized Discounted Cumulative Gain, which is a human rating based measuring system.

Gain - user will assign a numeric value (which is a score gained) to represent the goodness of a returned document for some specific query request.

Cumulative Gain - user will assign gain value for each document in the top K returned results, the values is assigned individually and independently.
 \mathrm{CG_{p}} = \sum_{i=1}^{p} rel_{i}
Discounted Cumulative Gain - when assigning relevance score to the returned document, there is a weight related to the order of the document in the retrieval result.
 \mathrm{DCG_{p}} = rel_{1} + \sum_{i=2}^{p} \frac{rel_{i}}{\log_{2}i}
Normalized Discounted Cumulative Gain - it's easy to understand: make the final value to be [0, 1]. Usually, the DCG score of the ideally ordered (ordered using Gain score) document list is used as the normalizing factor. So
 \mathrm{nDCG_{p}} = \frac{DCG_{p}}{IDCG{p}}
For concrete example of how to compute the NDCG value of a query result, please see wiki on NDCG

NDCG is widely used in today's commercial search engine evaluation, but the problem is that, if the returned document is ordered in the same way as the decreasing order of gain score, the NDCG value will be the max:1.

This means that, NDCG is only used for the measuring the ranking algorithm of a search engine and can't tell whether the returned document is highly related to the user intention or not. But in end user's perspective, the perfect return result should be highly related document ordered properly.

More technically, a typical query serving sub-system of an IR system has two phases, one is matching (find highly related document), and the other is ranking (order the matched documents). NDCG may be a proper tool to measure the ranking phase, but definitely not the matching phase. So I think is not an ideal measuring mechanism for IR system.

So, tuning the whole system against NDCG score only may not be a correct direction for search engine improving.

- The ideal set, which is used to calculate the normalization factor, is the highly scored documents list ordered properly, not the proper order of the returned documents. So the problem I mentioned above doesn't exist.
- But the final effect of this measuring method depends on what test corpus, what test query, what the predefined gain score for each query.