الثلاثاء، 23 يونيو 2009
Speed Matters
At Google, we've gathered hard data to reinforce our intuition that "speed matters" on the Internet. Google runs experiments on the search results page to understand and improve the search experience. Recently, we conducted some experiments to determine how users react when web search takes longer. We've always viewed speed as a competitive advantage, so this research is important to understand the trade-off between speed and other features we might introduce. We wanted to share this information with the public because we hope it will give others greater insight into how important speed can be.
Speed as perceived by the end user is driven by multiple factors, including how fast results are returned and how long it takes a browser to display the content. Our experiments injected server-side delay to model one of these factors: extending the processing time before and during the time that the results are transmitted to the browser. In other words, we purposefully slowed the delivery of search results to our users to see how they might respond.
All other things being equal, more usage, as measured by number of searches, reflects more satisfied users. Our experiments demonstrate that slowing down the search results page by 100 to 400 milliseconds has a measurable impact on the number of searches per user of -0.2% to -0.6% (averaged over four or six weeks depending on the experiment). That's 0.2% to 0.6% fewer searches for changes under half a second!
Furthermore, users do fewer and fewer searches the longer they are exposed to the experiment. Users exposed to a 200 ms delay since the beginning of the experiment did 0.22% fewer searches during the first three weeks, but 0.36% fewer searches during the second three weeks. Similarly, users exposed to a 400 ms delay since the beginning of the experiment did 0.44% fewer searches during the first three weeks, but 0.76% fewer searches during the second three weeks. Even if the page returns to the faster state, users who saw the longer delay take time to return to their previous usage level. Users exposed to the 400 ms delay for six weeks did 0.21% fewer searches on average during the five week period after we stopped injecting the delay.
While these numbers may seem small, a daily impact of 0.5% is of real consequence at the scale of Google web search, or indeed at the scale of most Internet sites. Because the cost of slower performance increases over time and persists, we encourage site designers to think twice about adding a feature that hurts performance if the benefit of the feature is unproven. To learn more on how to improve the performance of your website visit code.google.com/speed. For more details on our experiments, download this PDF.
الاثنين، 22 يونيو 2009
A new landmark in computer vision
Posted by Jay Yagnik, Head of Computer Vision Research
[Cross-posted with the Official Google Blog]
Science fiction books and movies have long imagined that computers will someday be able to see and interpret the world. At Google, we think computer vision has tremendous potential benefits for consumers, which is why we're dedicated to research in this area. And today, a Google team is presenting a paper on landmark recognition (think: Statue of Liberty, Eiffel Tower) at the Computer Vision and Pattern Recognition (CVPR) conference in Miami, Florida. In the paper, we present a new technology that enables computers to quickly and efficiently identify images of more than 50,000 landmarks from all over the world with 80% accuracy.
To be clear up front, this is a research paper, not a new Google product, but we still think it's cool. For our demonstration, we begin with an unnamed, untagged picture of a landmark, enter its web address into the recognition engine, and poof — the computer identifies and names it: "Recognized Landmark: Acropolis, Athens, Greece." Thanks computer.
How did we do it? It wasn't easy. For starters, where do you find a good list of thousands of landmarks? Even if you have that list, where do you get the pictures to develop visual representations of the locations? And how do you pull that source material together in a coherent model that actually works, is fast, and can process an enormous corpus of data? Think about all the different photographs of the Golden Gate Bridge you've seen — the different perspectives, lighting conditions, and image qualities. Recognizing a landmark can be difficult for a human, let alone a computer.
Our research builds on the vast number of images on the web, the ability to search those images, and advances in object recognition and clustering techniques. First, we generated a list of landmarks relying on two sources: 40 million GPS-tagged photos (from Picasa and Panoramio) and online tour guide webpages. Next, we found candidate images for each landmark using these sources and Google Image Search, which we then "pruned" using efficient image matching and unsupervised clustering techniques. Finally, we developed a highly efficient indexing system for fast image recognition. The following image provides a visual representation of the resulting clustered recognition model:
While we've gone a long way towards unlocking the information stored in text on the web, there's still much work to be done unlocking the information stored in pixels. This research demonstrates the feasibility of efficient computer vision techniques based on large, noisy datasets. We expect the insights we've gained will lay a useful foundation for future research in computer vision.
If you're interested to learn more about this research, check out the paper.
[Cross-posted with the Official Google Blog]
Science fiction books and movies have long imagined that computers will someday be able to see and interpret the world. At Google, we think computer vision has tremendous potential benefits for consumers, which is why we're dedicated to research in this area. And today, a Google team is presenting a paper on landmark recognition (think: Statue of Liberty, Eiffel Tower) at the Computer Vision and Pattern Recognition (CVPR) conference in Miami, Florida. In the paper, we present a new technology that enables computers to quickly and efficiently identify images of more than 50,000 landmarks from all over the world with 80% accuracy.
To be clear up front, this is a research paper, not a new Google product, but we still think it's cool. For our demonstration, we begin with an unnamed, untagged picture of a landmark, enter its web address into the recognition engine, and poof — the computer identifies and names it: "Recognized Landmark: Acropolis, Athens, Greece." Thanks computer.
How did we do it? It wasn't easy. For starters, where do you find a good list of thousands of landmarks? Even if you have that list, where do you get the pictures to develop visual representations of the locations? And how do you pull that source material together in a coherent model that actually works, is fast, and can process an enormous corpus of data? Think about all the different photographs of the Golden Gate Bridge you've seen — the different perspectives, lighting conditions, and image qualities. Recognizing a landmark can be difficult for a human, let alone a computer.
Our research builds on the vast number of images on the web, the ability to search those images, and advances in object recognition and clustering techniques. First, we generated a list of landmarks relying on two sources: 40 million GPS-tagged photos (from Picasa and Panoramio) and online tour guide webpages. Next, we found candidate images for each landmark using these sources and Google Image Search, which we then "pruned" using efficient image matching and unsupervised clustering techniques. Finally, we developed a highly efficient indexing system for fast image recognition. The following image provides a visual representation of the resulting clustered recognition model:
In the above image, related views of the Acropolis are "clustered" together, allowing for a more efficient image matching system.
While we've gone a long way towards unlocking the information stored in text on the web, there's still much work to be done unlocking the information stored in pixels. This research demonstrates the feasibility of efficient computer vision techniques based on large, noisy datasets. We expect the insights we've gained will lay a useful foundation for future research in computer vision.
If you're interested to learn more about this research, check out the paper.
الاثنين، 15 يونيو 2009
Large-scale graph computing at Google
Posted by Grzegorz Czajkowski, Systems Infrastructure Team
If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by Web 2.0, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one major Internet company has been founded upon this graph.
Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.
In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel).
Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers of dozens of Pregel applications within Google have found that "thinking like a vertex," which is the essence of programming in Pregel, is intuitive.
We've been using Pregel internally for a while now, but we are beginning to share information about it outside of Google. Greg Malewicz will be speaking at the joint industrial track between ACM PODC and ACM SPAA this August on the very subject. In case you aren't able to join us there, here's a spoiler: The seven bridges of Königsberg — inspiration for Leonhard Euler's famous theorem that established the basics of graph theory — spanned the Pregel river.
If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by Web 2.0, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one major Internet company has been founded upon this graph.
Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.
In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the Bulk Synchronous Parallel Model inspired Pregel).
Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing PageRank, for example, takes only about 15 lines of code. Developers of dozens of Pregel applications within Google have found that "thinking like a vertex," which is the essence of programming in Pregel, is intuitive.
We've been using Pregel internally for a while now, but we are beginning to share information about it outside of Google. Greg Malewicz will be speaking at the joint industrial track between ACM PODC and ACM SPAA this August on the very subject. In case you aren't able to join us there, here's a spoiler: The seven bridges of Königsberg — inspiration for Leonhard Euler's famous theorem that established the basics of graph theory — spanned the Pregel river.
الثلاثاء، 9 يونيو 2009
Google Fusion Tables
Posted by Alon Halevy, Google Research and Rebecca Shapley, User Experience
Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly.
Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users.
In the version we're launching today, you can upload tabular data sets (right now, we're supporting up to 100 MB per data set, 250 MB of data per user) and share them with your collaborators or with the world. You can choose to share all of your data with your collaborators, or keep parts of it hidden. You can even share different portions of your data with different collaborators.
When you edit the data in place, your collaborators always get the latest version. The attribution feature means your data will get credit for its contribution to any data set built with it. And yes, you can export your data back out of the cloud as CSV files.
Want to understand your data better? You can filter and aggregate the data, and you can visualize it on Google Maps or with other visualizations from the Google Visualization API. In this example, an intensity map of the world shows countries that won more than 10 gold medals in the Summer Olympics. You can then embed these visualizations in other properties on the Web (e.g., blogs and discussion groups) by simply pasting some HTML code we provide you.
The power of data is truly harnessed when you combine data from multiple sources. For example, consider combining data about access to fresh water in various countries with data about malaria rates in those countries, or as shown here, showing three sources of GDP data side by side. Fusion Tables enables you to fuse multiple sets of data when they are about the same entities. In database speak, we call this a join on a primary key but the data originates from multiple independent sources. This is just the start, more join capabilities will come soon.
But Fusion Tables doesn't require you and your collaborators to stop there. What if you don't agree on all of the values? Or need to understand the assumptions behind the data better? Fusion Tables enables you to discuss data at different granularity levels -- you can discuss individual rows or columns or even individual cells. If a collaborator with edit permission changes data during the discussion, viewers will see the change as part of the discussion trail.
We hope you find Fusion Tables useful. As usual with first releases, we realize there is much missing, and we look forward to hearing your feedback.
Database systems are notorious for being hard to use. It is even more difficult to integrate data from multiple sources and collaborate on large data sets with people outside your organization. Without an easy way to offer all the collaborators access to the same server, data sets get copied, emailed and ftp'd--resulting in multiple versions that get out of sync very quickly.
Today we're introducing Google Fusion Tables on Labs, an experimental system for data management in the cloud. It draws on the expertise of folks within Google Research who have been studying collaboration, data integration, and user requirements from a variety of domains. Fusion Tables is not a traditional database system focusing on complicated SQL queries and transaction processing. Instead, the focus is on fusing data management and collaboration: merging multiple data sources, discussion of the data, querying, visualization, and Web publishing. We plan to iteratively add new features to the systems as we get feedback from users.
In the version we're launching today, you can upload tabular data sets (right now, we're supporting up to 100 MB per data set, 250 MB of data per user) and share them with your collaborators or with the world. You can choose to share all of your data with your collaborators, or keep parts of it hidden. You can even share different portions of your data with different collaborators.
When you edit the data in place, your collaborators always get the latest version. The attribution feature means your data will get credit for its contribution to any data set built with it. And yes, you can export your data back out of the cloud as CSV files.
Want to understand your data better? You can filter and aggregate the data, and you can visualize it on Google Maps or with other visualizations from the Google Visualization API. In this example, an intensity map of the world shows countries that won more than 10 gold medals in the Summer Olympics. You can then embed these visualizations in other properties on the Web (e.g., blogs and discussion groups) by simply pasting some HTML code we provide you.
The power of data is truly harnessed when you combine data from multiple sources. For example, consider combining data about access to fresh water in various countries with data about malaria rates in those countries, or as shown here, showing three sources of GDP data side by side. Fusion Tables enables you to fuse multiple sets of data when they are about the same entities. In database speak, we call this a join on a primary key but the data originates from multiple independent sources. This is just the start, more join capabilities will come soon.
But Fusion Tables doesn't require you and your collaborators to stop there. What if you don't agree on all of the values? Or need to understand the assumptions behind the data better? Fusion Tables enables you to discuss data at different granularity levels -- you can discuss individual rows or columns or even individual cells. If a collaborator with edit permission changes data during the discussion, viewers will see the change as part of the discussion trail.
We hope you find Fusion Tables useful. As usual with first releases, we realize there is much missing, and we look forward to hearing your feedback.
الاثنين، 8 يونيو 2009
Remembering Rajeev Motwani
Posted by Alfred Spector, VP of Research
Many hundreds of us at Google were fortunate to have been educated, advised, and inspired by Professor Rajeev Motwani. Six of us were his PhD students and very many others (including our founders) were advised by or took courses from him. Others Googlers, who were not students at Stanford, had close collegial relations. But, no matter what the relationship, we respected Rajeev as a great man. He was not just a mathematically deep computer scientist, not just an entrepreneurial computer scientist who catalyzed value at the intersection of his work and the real world, he was also a thoughtful, caring, and honorable friend.
The words of just a few of us speak louder than any summary I can make:
Sergey Brin wrote in his blog, “Officially, Rajeev was not my advisor, and yet he played just as big a role in my research, education, and professional development. In addition to being a brilliant computer scientist, Rajeev was a very kind and amicable person and his door was always open. No matter what was going on with my life or work, I could always stop by his office for an interesting conversation and a friendly smile.”
Zoltan Gyongyi wrote, “Not only a great educator and one of the brightest researchers of his generation, Rajeev was also a catalyst of Silicon Valley innovation--Google itself standing as a proof. Moreover, he was a mentor, colleague, role model, friend to many Googlers. I am utterly unable to find words that would properly express my personal gratitude to him and the weight of this loss.”
Mayur Datar wrote, “I was fortunate to have Rajeev as my PhD advisor for five years at Stanford. Beyond graduation, he often helped me with priceless career guidance and professional help in terms of meetings with other people in Silicon Valley. There are only a handful of people I can think of who are such high caliber academics and entrepreneurs. His contributions and impact on CS theory community, Stanford CS Dept, and Silicon Valley enterprises and entrepreneurs is unfathomable. I still find it hard to come to terms with his horrible reality. My deepest condolences and prayers go out to his family. He will be fondly remembered and dearly missed by all of us!"
An Zhu wrote, “I am both fortunate and honored to have Rajeev as my PhD advisor. The 5 years at Stanford is very memorable to me. I’m eternally grateful for his advice and support throughout. It is indeed a sad day for many, including his students.”
Alon Halevy wrote, “Rajeev was an inspiration to me and my colleagues on so many levels. As a young graduate student, I remember him working on some of the toughest theoretical computer science problems of the day. Later, his taste for good theory and ability to apply it to practice had a huge impact on various aspects of data management research. As a professor, and now as a Googler, I am awed at the amazing stream of high-caliber students that he mentored. As an entrepreneur, he gave me some generous and well-timed advice. And most of all, as a person, his kindness and willingness to help anyone was a true inspiration.”
Vibhu Mittal wrote, “He was a brilliant researcher and a great professor. And yet the only thing that I can remember right now is that he was a fun, generous, helpful guy who was always willing to sit down and chat for a few minutes. I hope wherever he is, he is still doing it. And I hope there’ll be more people like him in this world to help people like us. I wish his family well — words cannot express what I feel for them.”
Gagan Aggarwal wrote, “I feel extremely fortunate to have had Rajeev as my PhD advisor. He was a wonderful advisor--always very flexible and willing to let his students work at their own pace, while making sure that things are going alright and providing guidance when needed. One of the several striking features of Rajeev's research was his ability to translate real life problems into clean, well-motivated, abstract questions (that he would promptly pose to his students). He was for me an eternal source of fresh problems and great ideas, a source I could tap into whenever my own ideas dried up (and was planning to, just last week). It is impossible to come to terms with the fact that I am never going to do this again. Rajeev had an unmatched clarity of thought and perceptiveness that was evident not only in doing research with him but also in the invaluable advice he gave me about career choices and life in general. ...Rajeev took on many diverse roles: teacher, entrepreneur, advisor and friend, and filled them all as only he could have. His passing will leave an impossible-to-fill void among all those whose lives he touched.”
There are more notes from Googlers, among those of many others, on the Stanford blog commemorating Rajeev.
I’d like to close by noting that Rajeev Motwani’s work on the intersection of theory and practice inspired not only the way Google processes information, but also Google's core scientific values: we fundamentally believe in the power of applying mathematical analysis and algorithmic thinking to challenging real world problems. This philosophy was inherent in Rajeev’s research, the education he gave PhD students, and the advice and classes he provided to many more.
With his and the recent untimely deaths of other influential computer scientists and friends, we are all reminded to seize each day and make the most of it. I think Rajeev would have wanted us to keep this in mind.
Many hundreds of us at Google were fortunate to have been educated, advised, and inspired by Professor Rajeev Motwani. Six of us were his PhD students and very many others (including our founders) were advised by or took courses from him. Others Googlers, who were not students at Stanford, had close collegial relations. But, no matter what the relationship, we respected Rajeev as a great man. He was not just a mathematically deep computer scientist, not just an entrepreneurial computer scientist who catalyzed value at the intersection of his work and the real world, he was also a thoughtful, caring, and honorable friend.
The words of just a few of us speak louder than any summary I can make:
Sergey Brin wrote in his blog, “Officially, Rajeev was not my advisor, and yet he played just as big a role in my research, education, and professional development. In addition to being a brilliant computer scientist, Rajeev was a very kind and amicable person and his door was always open. No matter what was going on with my life or work, I could always stop by his office for an interesting conversation and a friendly smile.”
Zoltan Gyongyi wrote, “Not only a great educator and one of the brightest researchers of his generation, Rajeev was also a catalyst of Silicon Valley innovation--Google itself standing as a proof. Moreover, he was a mentor, colleague, role model, friend to many Googlers. I am utterly unable to find words that would properly express my personal gratitude to him and the weight of this loss.”
Mayur Datar wrote, “I was fortunate to have Rajeev as my PhD advisor for five years at Stanford. Beyond graduation, he often helped me with priceless career guidance and professional help in terms of meetings with other people in Silicon Valley. There are only a handful of people I can think of who are such high caliber academics and entrepreneurs. His contributions and impact on CS theory community, Stanford CS Dept, and Silicon Valley enterprises and entrepreneurs is unfathomable. I still find it hard to come to terms with his horrible reality. My deepest condolences and prayers go out to his family. He will be fondly remembered and dearly missed by all of us!"
An Zhu wrote, “I am both fortunate and honored to have Rajeev as my PhD advisor. The 5 years at Stanford is very memorable to me. I’m eternally grateful for his advice and support throughout. It is indeed a sad day for many, including his students.”
Alon Halevy wrote, “Rajeev was an inspiration to me and my colleagues on so many levels. As a young graduate student, I remember him working on some of the toughest theoretical computer science problems of the day. Later, his taste for good theory and ability to apply it to practice had a huge impact on various aspects of data management research. As a professor, and now as a Googler, I am awed at the amazing stream of high-caliber students that he mentored. As an entrepreneur, he gave me some generous and well-timed advice. And most of all, as a person, his kindness and willingness to help anyone was a true inspiration.”
Vibhu Mittal wrote, “He was a brilliant researcher and a great professor. And yet the only thing that I can remember right now is that he was a fun, generous, helpful guy who was always willing to sit down and chat for a few minutes. I hope wherever he is, he is still doing it. And I hope there’ll be more people like him in this world to help people like us. I wish his family well — words cannot express what I feel for them.”
Gagan Aggarwal wrote, “I feel extremely fortunate to have had Rajeev as my PhD advisor. He was a wonderful advisor--always very flexible and willing to let his students work at their own pace, while making sure that things are going alright and providing guidance when needed. One of the several striking features of Rajeev's research was his ability to translate real life problems into clean, well-motivated, abstract questions (that he would promptly pose to his students). He was for me an eternal source of fresh problems and great ideas, a source I could tap into whenever my own ideas dried up (and was planning to, just last week). It is impossible to come to terms with the fact that I am never going to do this again. Rajeev had an unmatched clarity of thought and perceptiveness that was evident not only in doing research with him but also in the invaluable advice he gave me about career choices and life in general. ...Rajeev took on many diverse roles: teacher, entrepreneur, advisor and friend, and filled them all as only he could have. His passing will leave an impossible-to-fill void among all those whose lives he touched.”
There are more notes from Googlers, among those of many others, on the Stanford blog commemorating Rajeev.
I’d like to close by noting that Rajeev Motwani’s work on the intersection of theory and practice inspired not only the way Google processes information, but also Google's core scientific values: we fundamentally believe in the power of applying mathematical analysis and algorithmic thinking to challenging real world problems. This philosophy was inherent in Rajeev’s research, the education he gave PhD students, and the advice and classes he provided to many more.
With his and the recent untimely deaths of other influential computer scientists and friends, we are all reminded to seize each day and make the most of it. I think Rajeev would have wanted us to keep this in mind.
الاشتراك في:
الرسائل (Atom)