ويكي

الخميس، 16 أبريل 2009

Socially Adjusted CAPTCHAs

Posted by Rich Gossweiler, Maryam Kamvar, Shumeet Baluja

Unfortunately, there is a war going on between humans and 'bots. Software
'bots are attempting to generate massive numbers of computer accounts
which are then sold in bulk to spammers. Spammers use these accounts to
inundate emails and discussion boards. Meanwhile humans are trying to
simply create an account and don't want to spend a lot of time proving
that they are not a program.

Typically we use CAPTCHAs -- we present an image of some distorted text
and then ask the applicant to type in the letters. As image processing gets
more sophisticated, these letter sequences tend to get longer and more
distorted, sometimes to the point where humans fail too.

So we switched the game. We show an image, say an airplane, but it
is randomly rotated and we ask the applicant to rotate it to "up." This
is generally hard for computers but easy for people. Well, for the most
part.

Since computers are good at faces, skies, text, etc. we sift
through our database of images running state-of-the-art up detectors to
remove those images. But of the images that remain, some are too hard
for people to figure out. What is up for a plate or a piece of
abstract art?

So here is where it gets interesting. We show people several images, one
of which is a "candidate" and we see how people do. If everyone rotates
it the same way, it is a keeper. If there is a lot of variation, we
discard it. As extra credit it turns out that even if the original image were
taken at an angle, it does not matter, since people, in large numbers,
socially adjust the CAPTCHA.

Read the full paper here (posted with the permission of WWW'09).

الأربعاء، 15 أبريل 2009

The Grill: Google's Alfred Spector on the hot seat

Posted by Ben Bayer, Google Research

Alfred Spector, Google's VP of Research, tells COMPUTERWORLD the ins and outs of Research at Google and where it's headed for the future. Read the complete interview here.

الخميس، 2 أبريل 2009

Predicting the Present with Google Trends

Posted by Hal Varian, Chief Economist and Hyunyoung Choi, Decision Support Engineering Analyst

Can Google queries help predict economic activity?

The answer depends on what you mean by "predict." Google Trends and Google Insights for Search provide a real time report on query volume, while economic data is typically released several days after the close of the month. Given this time lag, it is not implausible that Google queries in a category like "Automotive/Vehicle Shopping" during the first few weeks of March may help predict what actual March automotive sales will be like when the official data is released halfway through April.

That famous economist Yogi Berra once said "It's tough to make predictions, especially about the future." This inspired our approach: let us lower the bar and just try to predict the present.

Our work to date is summarized in a paper called Predicting the Present with Google Trends. We find that Google Trends data can help improve forecasts of the current level of activity for a number of different economic time series, including automobile sales, home sales, retail sales, and travel behavior.

Even predicting the present is useful, since it may help identify "turning points" in economic time series. If people start doing significantly more searches for "Real Estate Agents" in a certain location, it is tempting to think that house sales might increase in that area in the near future.

Our paper outlines one approach to short-term economic prediction, but we expect that there are several other interesting ideas out there. So we suggest that forecasting wannabes download some Google Trends data and try to relate it to other economic time series. If you find an interesting pattern, post your findings on a website and send a link to econ-forecast@google.com. We'll report on the most interesting results in a later blog post.

It has been said that if you put a million monkeys in front of a million computers, you would eventually produce an accurate economic forecast. Let's see how well that theory works ...

الأربعاء، 25 مارس 2009

The Unreasonable Effectiveness of Data

Posted by Fernando Pereira, Google Research

Alon Halevy, Peter Norvig, and I argue that we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data. See the full article here (IEEE Intelligent Systems, March/April 2009).

الخميس، 19 مارس 2009

Google and WPP Marketing Research Awards: Improving industry understanding and practices in online marketing

Posted by Jeff Walz, University Relations and Anne Bray, Head of Agency, WPP

Google and the WPP Group have teamed up to create a new research program with the goal of improving industry understanding of digital marketing. Eleven research awards have been given to universities through the Google and WPP Marketing Research Awards Program, announced by both companies in the fall of 2008. The academic studies will harness WPP client data to explore how online media influences consumer behavior, attitudes, and decision making. The research provides an opportunity for very innovative thinking in an area that is at the crossroads of marketing, computer science, economics, and various mathematical disciplines.

More than 120 entries were received by the deadline for proposals. The awards represent the first round of grants in the three-year program towards which WPP and Google will commit up to $4.6 million in an effort to support research around digital marketing. Hal Varian, Google's Chief Economist, participated on the decision committee.

The winning projects offer convincing designs for exploring how online and offline marketing influence consumer attitudes, decisions, and purchase behavior. As marketing continues to become more digital and more measurable, the results of these studies will also advance our understanding of how advertising investment should be allocated among media channels.

The researchers and affiliated academic institutions participating in this first round of awarded projects are:

• “Effect of Online Exposure on Offline Buying: How Online Exposure
Aids or Hurts Offline Buying by Increasing the Impact of Offline
Attributes”; Amitav Chakravarti, New York University, Stern School of
Business, Department of Marketing

• “The Interaction Between Digital Marketing Tactics and Sales
Performance Online and Offline”; Elie Ofek, Associate Professor
Marketing, Harvard Business School and Zsolt Katona, Associate
Professor of Marketing, UC Berkeley, Haas School of Business

• ”Are Brand Attitudes Contagious? Consumer Response to Organic
Search Trends”; Donna L. Hoffman, Professor, A. Gary Anderson
Graduate School of Management, University of California Riverside and
Thomas P. Novak, A. Gary Anderson Graduate School of Management,
University of California Riverside

• “Does internet advertising help established brands or niche ("long
tail") brands more? Catherine Tucker, Assistant Professor of
Marketing, MIT Sloan School of Marketing and Avi Goldfarb, Associate
Professor of Marketing, Joseph L. Rotman School of Management
University of Toronto

• “Marketing on the Map: Visual Search and Consumer Decision Making”;
Nicolas Lurie, Assistant Professor of Marketing, College of
Management, Georgia Institute of Technology, College of Management and
Sam Ransbotham, Assistant Professor of Information Systems, Carroll
School of Management, Boston College

• “Methods for multivariate metric analysis; identifying change
drivers”; Trevor J. Hastie, Professor, Department of Statistics,
Stanford University

• “Unpuzzling the Synergy of Display and Search Advertising: Insights
from Data Mining of Chinese Internet Users”; Hairong Li, Department of
Advertising, Public Relations, and Retailing, Michigan State
University and Shuguang Zhao, Media Survey Lab, Tsinghua University

• “Optimal Allocation of Offline and Online Media Budget”; Sunil
Gupta, Professor of Business Administration, Harvard Business School;
Anita Elberse, Associate Professor, Harvard Business School; and
Kenneth C. Wilbur, Assistant Professor of Marketing, Marshall School
of Business, University of Southern California

• “Targeting Ads to Match Individual Cognitive Styles: A Market
Test”; Glen Urban, Professor, MIT Sloan School of Management

• “How do consumers determine what is relevant? A psychometric and
neuroscientific study of online search and advertising effectiveness”;
Antoine Bechara, Professor of Psychology and Neuroscience, Department
of Psychology/Brain & Creativity Institute, University of Southern
California and Martin Reimann, Fellow, Department of Psychology/Brain
& Creativity, University of Southern California

• “A Comprehensive Model of the Effects of Brand-Generated and
Consumer-Generated Communications on Brand Perceptions, Sales and
Share”; Douglas Bowman and Manish Tripathi, Professors of Marketing,
Goizueta Business School, Emory University.

You can find more information about the Google and WPP Marketing Research Awards Program on the website.

الأربعاء، 18 مارس 2009

And the award goes to...

Posted by Fernando Pereira, Research Director

Corinna Cortes, Head of Google Research in New York, has just been awarded the ACM Paris Kanellakis Theory and Practice Award jointly with Vladimir Vapnik (Royal Holloway College and NEC Research). The award recognizes their invention in the early 1990s of the soft-margin support vector machine, which has become the supervised machine learning method of choice for applications ranging from image analysis to document classification to bioinformatics.

What is so important about this invention? In supervised machine learning, we create algorithms that can learn a rule to accurately classify new examples based on a set of training examples (e.g. spam or non-spam). There is no single attribute of an email message that tells us with certainty that it is spam. Instead, many attributes have to be considered, forming a vector of very high dimension. The same situation arises in many other machine practical learning tasks, including many that we work on at Google.

To learn accurate classifiers, we need to solve several big problems. First, the rule learned from the training data should be accurate on new test examples, even though it has not seen those examples. In other words, the rule must generalize well. Second, we must be able to find the optimal rule efficiently. Both of these problems are especially daunting for very high dimensional data. Third, the method for computing the rule should be able to accommodate errors in the training data, such as messages that are given conflicting labels by different people (my spam may be your ham).

Soft-margin support vector machines wrap these three problems together into an elegant mathematical package. The crucial insight is that classification problems of this kind can be expressed as finding in very high dimension (or even infinite dimension) the hyperplane that best separates the positive examples (ham) from the negative ones (spam).

Remarkably, the solution of this problem does not depend on the dimensionality of the data, it depends only on the pairwise similarities between the training examples determined by the agreement or disagreement between corresponding attributes. Furthermore, a hyperplane that separates the training data well can be shown to generalize well to unseen data with the same statistical properties.

Now, you might be asking how could this be done if the training data is inconsistently labeled. After all, you cannot have the same example on both sides of the separating hyperplane. That's where the soft margin idea comes in: the quadratic optimization program that finds the optimal separating hyperplane can be cleverly modified to "give up" on a fraction of the training examples that cannot be classified correctly.

With this crucial improvement, support vector machines became really practical, while the core ideas have had huge influence in the development of further learning algorithms for an ever wider range of tasks.

Congratulations to Corinna (and Vladimir) on the well-deserved award.

الأربعاء، 18 فبراير 2009

Beyond Web-2.0

Posted by T.V Raman, Research Scientist

A little over a year ago, I gave a lightning talk at the W3C Technical Plenary in Boston where I looked forward to what came after Web-2.0. The key insight underlying that talk was that the Web was now mature enough for us to build Web technologies purely out of Web parts. Web-2.0 is a result of applying the Web to itself and is therefore better thought of as Web(Web()) or more concisely, Web².

Today, we can build new web artifacts out of existing ones by aggregation (web mashups) and by projection (filtered views), and publish the resulting artifacts on the web by assigning them a URL. This leads to the insight that this web that is to come potentially consists of the power-set of all web content. These ideas, and their logical consequences are detailed in article entitled Toward 2^W --- Beyond Web 2.0 in the February edition of the Communications Of The ACM. You can also find a slightly more extensive blog I posted here.