الأربعاء، 25 مارس 2009

The Unreasonable Effectiveness of Data



Alon Halevy, Peter Norvig, and I argue that we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data. See the full article here (IEEE Intelligent Systems, March/April 2009).

الخميس، 19 مارس 2009

Google and WPP Marketing Research Awards: Improving industry understanding and practices in online marketing



Google and the WPP Group have teamed up to create a new research program with the goal of improving industry understanding of digital marketing. Eleven research awards have been given to universities through the Google and WPP Marketing Research Awards Program, announced by both companies in the fall of 2008. The academic studies will harness WPP client data to explore how online media influences consumer behavior, attitudes, and decision making. The research provides an opportunity for very innovative thinking in an area that is at the crossroads of marketing, computer science, economics, and various mathematical disciplines.

More than 120 entries were received by the deadline for proposals. The awards represent the first round of grants in the three-year program towards which WPP and Google will commit up to $4.6 million in an effort to support research around digital marketing. Hal Varian, Google's Chief Economist, participated on the decision committee.

The winning projects offer convincing designs for exploring how online and offline marketing influence consumer attitudes, decisions, and purchase behavior. As marketing continues to become more digital and more measurable, the results of these studies will also advance our understanding of how advertising investment should be allocated among media channels.

The researchers and affiliated academic institutions participating in this first round of awarded projects are:

• “Effect of Online Exposure on Offline Buying: How Online Exposure
Aids or Hurts Offline Buying by Increasing the Impact of Offline
Attributes”; Amitav Chakravarti, New York University, Stern School of
Business, Department of Marketing

• “The Interaction Between Digital Marketing Tactics and Sales
Performance Online and Offline”; Elie Ofek, Associate Professor
Marketing, Harvard Business School and Zsolt Katona, Associate
Professor of Marketing, UC Berkeley, Haas School of Business

• ”Are Brand Attitudes Contagious? Consumer Response to Organic
Search Trends”; Donna L. Hoffman, Professor, A. Gary Anderson
Graduate School of Management, University of California Riverside and
Thomas P. Novak, A. Gary Anderson Graduate School of Management,
University of California Riverside

• “Does internet advertising help established brands or niche ("long
tail") brands more? Catherine Tucker, Assistant Professor of
Marketing, MIT Sloan School of Marketing and Avi Goldfarb, Associate
Professor of Marketing, Joseph L. Rotman School of Management
University of Toronto

• “Marketing on the Map: Visual Search and Consumer Decision Making”;
Nicolas Lurie, Assistant Professor of Marketing, College of
Management, Georgia Institute of Technology, College of Management and
Sam Ransbotham, Assistant Professor of Information Systems, Carroll
School of Management, Boston College

• “Methods for multivariate metric analysis; identifying change
drivers”; Trevor J. Hastie, Professor, Department of Statistics,
Stanford University

• “Unpuzzling the Synergy of Display and Search Advertising: Insights
from Data Mining of Chinese Internet Users”; Hairong Li, Department of
Advertising, Public Relations, and Retailing, Michigan State
University and Shuguang Zhao, Media Survey Lab, Tsinghua University

• “Optimal Allocation of Offline and Online Media Budget”; Sunil
Gupta, Professor of Business Administration, Harvard Business School;
Anita Elberse, Associate Professor, Harvard Business School; and
Kenneth C. Wilbur, Assistant Professor of Marketing, Marshall School
of Business, University of Southern California

• “Targeting Ads to Match Individual Cognitive Styles: A Market
Test”; Glen Urban, Professor, MIT Sloan School of Management

• “How do consumers determine what is relevant? A psychometric and
neuroscientific study of online search and advertising effectiveness”;
Antoine Bechara, Professor of Psychology and Neuroscience, Department
of Psychology/Brain & Creativity Institute, University of Southern
California and Martin Reimann, Fellow, Department of Psychology/Brain
& Creativity, University of Southern California

• “A Comprehensive Model of the Effects of Brand-Generated and
Consumer-Generated Communications on Brand Perceptions, Sales and
Share”; Douglas Bowman and Manish Tripathi, Professors of Marketing,
Goizueta Business School, Emory University.

You can find more information about the Google and WPP Marketing Research Awards Program on the website.

الأربعاء، 18 مارس 2009

And the award goes to...



Corinna Cortes, Head of Google Research in New York, has just been awarded the ACM Paris Kanellakis Theory and Practice Award jointly with Vladimir Vapnik (Royal Holloway College and NEC Research). The award recognizes their invention in the early 1990s of the soft-margin support vector machine, which has become the supervised machine learning method of choice for applications ranging from image analysis to document classification to bioinformatics.

What is so important about this invention? In supervised machine learning, we create algorithms that can learn a rule to accurately classify new examples based on a set of training examples (e.g. spam or non-spam). There is no single attribute of an email message that tells us with certainty that it is spam. Instead, many attributes have to be considered, forming a vector of very high dimension. The same situation arises in many other machine practical learning tasks, including many that we work on at Google.

To learn accurate classifiers, we need to solve several big problems. First, the rule learned from the training data should be accurate on new test examples, even though it has not seen those examples. In other words, the rule must generalize well. Second, we must be able to find the optimal rule efficiently. Both of these problems are especially daunting for very high dimensional data. Third, the method for computing the rule should be able to accommodate errors in the training data, such as messages that are given conflicting labels by different people (my spam may be your ham).

Soft-margin support vector machines wrap these three problems together into an elegant mathematical package. The crucial insight is that classification problems of this kind can be expressed as finding in very high dimension (or even infinite dimension) the hyperplane that best separates the positive examples (ham) from the negative ones (spam).

Remarkably, the solution of this problem does not depend on the dimensionality of the data, it depends only on the pairwise similarities between the training examples determined by the agreement or disagreement between corresponding attributes. Furthermore, a hyperplane that separates the training data well can be shown to generalize well to unseen data with the same statistical properties.

Now, you might be asking how could this be done if the training data is inconsistently labeled. After all, you cannot have the same example on both sides of the separating hyperplane. That's where the soft margin idea comes in: the quadratic optimization program that finds the optimal separating hyperplane can be cleverly modified to "give up" on a fraction of the training examples that cannot be classified correctly.

With this crucial improvement, support vector machines became really practical, while the core ideas have had huge influence in the development of further learning algorithms for an ever wider range of tasks.

Congratulations to Corinna (and Vladimir) on the well-deserved award.