‏إظهار الرسائل ذات التسميات YouTube. إظهار كافة الرسائل
‏إظهار الرسائل ذات التسميات YouTube. إظهار كافة الرسائل

الثلاثاء، 26 نوفمبر 2013

Released Data Set: Features Extracted From YouTube Videos for Multiview Learning


“If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.”

Performance of machine learning algorithms, supervised or unsupervised, is often significantly enhanced when a variety of feature families, or multiple views of the data, are available. For example, in the case of web pages, one feature family can be based on the words appearing on the page, and another can be based on the URLs and related connectivity properties. Similarly, videos contain both audio and visual signals where in turn each modality is analyzed in a variety of ways. For instance, the visual stream can be analyzed based on the color and edge distribution, texture, motion, object types, and so on. YouTube videos are also associated with textual information (title, tags, comments, etc.). Each feature family complements others in providing predictive signals to accomplish a prediction or classification task, for example, in automatically classifying videos into subject areas such as sports, music, comedy, games, and so on.

We have released a dataset of over 100k feature vectors extracted from public YouTube videos. These videos are labeled by one of 30 classes, each class corresponding to a video game (with some amount of class noise): each video shows a gameplay of a video game, for teaching purposes for example. Each instance (video) is described by three feature families (textual, visual, and auditory), and each family is broken into subfamilies yielding up to 13 feature types per instance. Neither video identities nor class identities are released.

We hope that this dataset will be valuable for research on a variety of multiview related machine learning topics, including multiview clustering, co-training, active learning, classifier fusion and ensembles.

The data and more information can be obtained from the UCI machine learning repository (multiview video dataset), or from here.

الجمعة، 27 يوليو 2012

New Challenges in Computer Science Research



Yesterday afternoon at the 2012 Computer Science Faculty Summit, there was a round of lightning talks addressing some of the research problems faced by Google across several domains. The talks pointed out some of the biggest challenges emerging from increasing digital interaction, which is this year’s Faculty Summit theme.

Research Scientist Vivek Kwatra kicked things off with a talk about video stabilization on YouTube. The popularity of mobile devices with cameras has led to an explosion in the amount of video people capture, which can often be shaky. Vivek and his team have found algorithmic approaches to make casual videos look more professional by simulating professional camera moves. Their stabilization technology vastly improves the quality of amateur footage.

Next, Ed Chi (Research Scientist) talked about social media focusing on the experimental circle model that characterizes Google+. Ed is particularly interested in how social interaction on the web can be designed to mimic live communication. Circles on Google+ allow a user to manage their audience and share content in a targeted fashion, which reflects face-to-face interaction. Ed discussed how, from an HCI perspective, the challenge going forward is the need to consider the trinity of social media: context, audience, content.

John Wilkes, Principal Software Engineer, talked about cluster management at Google and the challenges of building a new cluster manager-- that is, an operating system for a fleet of machines. Everything at Google is big and a consequence of operating at such tremendous scale is that machines are bound to fail. John’s team is working to make things easier for internal users enabling our ability to respond to more system requests. There are several hard problems in this domain, such as issues with configuration, making it as easy as possible to run a binary, increasing failure tolerance, and helping internal users understand their own needs as well as the behavior and performance of their system in our complicated distributed environment.

Research Scientist and coffee connoisseur Alon Halevy took to the podium to confirm that he did indeed author an empirical book on coffee, and also talked with attendees about structured data on the web. Structured data is comprised of hundreds of millions of (relatively small) tables of data, and Alon’s work is focused on enabling data enthusiasts to discover and visualize those data sets. Great possibilities open up when people start combining data sets in meaningful ways, which inspired the creation of Fusion Tables. An example is a map made in the aftermath of the 2011 earthquake and tsunami in Japan, that shows natural disaster data alongside the locations of the world’s nuclear plants. Moving forward, Alon’s team will continue to think about interesting things that can be done with data, and the techniques needed to distinguish good data from bad data.

To wrap up the session, Praveen Paritosh did a brief, but deep dive into the Knowledge Graph, an intelligent model that understands real-world entities and their relationships to one another-- things, not strings-- which launched earlier this year.

The Google Faculty Summit continued today with more talks, and breakout sessions centered on our theme of digital interaction. Check back for additional blog posts in the coming days.

الجمعة، 4 مايو 2012

Video Stabilization on YouTube



One thing we have been working on within Research at Google is developing methods for making casual videos look more professional, thereby providing users with a better viewing experience. Professional videos have several characteristics that differentiate them from casually shot videos. For example, in order to tell a story, cinematographers carefully control lighting and exposure and use specialized equipment to plan camera movement.

We have developed a technique that mimics professional camera moves and applies them to videos recorded by hand-held devices. Cinematographers use specialized equipment such as tripods and dollies to plan their camera paths and hold them steady. In contrast, think of a video you shot using a mobile phone camera. How steady was your hand and were you able to anticipate an interesting moment and smoothly pan the camera to capture that moment? To bridge these differences, we propose an algorithm that automatically determines the best camera path and recasts the video as if it were filmed using stabilization equipment. Specifically, we divide the original, shaky camera path into a set of segments, each approximated by either a constant, linear or parabolic motion of the camera. Our optimization finds the best of all possible partitions using a computationally efficient and stable algorithm. For details, check out our earlier blog post or read our paper, Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths, published in IEEE CVPR 2011.

The next time you upload your videos to YouTube, try stabilizing them by going to the YouTube editor or directly from the video manager by clicking on Edit->Enhancements. For even more convenience, YouTube will automatically detect if your video needs stabilization and offer to do it for you. Many videos on YouTube have already been enhanced using this technology.

More recently, we have been working on a related problem common in videos shot from mobile phones. The camera sensors in these phones contain what is known as an electronic rolling shutter. When taking a picture with a rolling shutter camera, the image is not captured instantaneously. Instead, the camera captures the image one row of pixels at a time, with a small delay when going from one row to the next. Consequently, if the camera moves during capture, it will cause image distortions ranging from shear in the case of low-frequency motions (for instance an image captured from a driving car) to wobbly distortions in the case of high-frequency perturbations (think of a person walking while recording video). These distortions are especially noticeable in videos where the camera shake is independent across frames. For example, take a look at the video below.


Original video with rolling shutter distortions


In our recent paper titled Calibration-Free Rolling Shutter Removal, which was awarded the best paper at IEEE ICCP 2012, we demonstrate a solution to correct these rolling shutter distortions in videos. A significant feature of our approach is that it does not require any knowledge of the camera used to shoot the video. The time delay in capturing two consecutive rows that we mention above is in fact different for every camera and affects the extent of distortions. Having knowledge of this delay parameter can be useful, but difficult to obtain or estimate via calibration. Imagine a video that is already uploaded to YouTube -- it will be challenging to obtain this parameter! Instead, we show that just the visual data in the video has enough information to appropriately describe and compensate for the distortions caused by the camera motion, even in the presence of a rolling shutter. For more information, see the narrated video description of our paper.

This technique is already integrated with the YouTube stabilizer. Starting today, if you stabilize a video from a mobile phone or other rolling shutter cameras, we will also automatically compensate for rolling shutter distortions. To see our technique in action, check out the video below, obtained after applying rolling shutter compensation and stabilization to the one above.


After stabilization and rolling shutter removal


الاثنين، 19 مارس 2012

Gamification for Improved Search Ranking for YouTube Topics



In earlier posts we discussed automatic ways to find the most talented emerging singers and the funniest videos using the YouTube Slam experiment. We created five “house” slams -- music, dance, comedy, bizarre, and cute -- which produce a weekly leaderboard not just of videos but also of YouTubers who are great at predicting what the masses will like. For example, last week’s cute slam winning video claims to be the cutest kitten in the world, beating out four other kittens, two puppies, three toddlers and an amazing duck who feeds the fish. With a whopping 620 slam points, YouTube user emoatali99 was our best connoisseur of cute this week. On the music side, it is no surprise that many of music slam’s top 10 videos were Adele covers. A Whitney Houston cover came out at the top this week, and music slam’s resident expert on talent had more than a thousand slam points. Well done! Check out the rest of the leaderboards for cute slam and music slam.

Can slam-style game mechanics incentivize our users to help improve the ranking of videos -- not just for these five house slams -- but for millions of other search queries and topics on YouTube? Gamification has previously been used to incentivize users to participate in non-game tasks such as image labeling and music tagging. How many votes and voters would we need for slam to do better than the existing ranking algorithm for topic search on YouTube?

As an experiment, we created new slams for a small number of YouTube topics (such as Latte Art Slam and Speed Painting Slam) using existing top 20 videos for these topics as the candidate pool. As we accumulated user votes, we evaluated the resulting YouTube Slam leaderboard for that topic vs the existing ranking on youtube.com/topics (baseline). Note that both the slam leaderboard and the baseline had the same set of videos, just in a different order.

What did we discover? It was no surprise that slam ranking performance had a high variance in the beginning and gradually improved as votes accumulated. We are happy to report that four of five topic slams converged within 1000 votes with a better leaderboard ranking than the existing YouTube topic search. In spite of small number of voters, Slam achieves better ranking partly because of gamification incentives and partly because it is based on machine learning, using:

  1. Preference judgement over a pair, not absolute judgement on a single video, and,

  2. Active solicitation of user opinion as opposed to passive observation. Due to what is called a “cold start” problem in data modeling, conventional (passive observation) techniques don’t work well on new items with little prior information. For any given topic, Slam’s improvement over the baseline in ranking of the “recent 20” set of videos was in fact better than the improvement in ranking of the “top 20” set.

Demographics and interests of the voters do affect slam leaderboard ranking, especially when the voter pool is small. An example is a Romantic Proposals Slam we featured on Valentine’s day last month. Men thought this proposal during a Kansas City Royals game was the most romantic, although this one where the man pretends to fall off a building came close. On the other hand, women rated this meme proposal in a restaurant as the best, followed by this movie theater proposal.

Encouraged by these results, we will soon be exploring slams for a few thousand topics to evaluate the utility of gamification techniques to YouTube topic search. Here are some of them: Chocolate BrowniePaper PlaneBush FlyingStealth TechnologyStencil GraffitiYosemite National Park, and Stealth Technology.

Have fun slamming!

الخميس، 9 فبراير 2012

Quantifying comedy on YouTube: why the number of o’s in your LOL matter



In a previous post, we talked about quantification of musical talent using machine learning on acoustic features for YouTube Music Slam. We wondered if we could do the same for funny videos, i.e. answer questions such as: is a video funny, how funny do viewers think it is, and why is it funny? We noticed a few audiovisual patterns across comedy videos on YouTube, such as shaky camera motion or audible laughter, which we can automatically detect. While content-based features worked well for music, identifying humor based on just such features is AI-Complete. Humor preference is subjective, perhaps even more so than musical taste.

 Fortunately, at YouTube, we have more to work with. We focused on videos uploaded in the comedy category. We captured the uploader’s belief in the funniness of their video via features based on title, description and tags. Viewers’ reactions, in the form of comments, further validate a video’s comedic value. To this end we computed more text features based on words associated with amusement in comments. These included (a) sounds associated with laughter such as hahaha, with culture-dependent variants such as hehehe, jajaja, kekeke, (b) web acronyms such as lol, lmao, rofl, (c) funny and synonyms of funny, and (d) emoticons such as :), ;-), xP. We then trained classifiers to identify funny videos and then tell us why they are funny by categorizing them into genres such as “funny pets”, “spoofs or parodies”, “standup”, “pranks”, and “funny commercials”.

 Next we needed an algorithm to rank these funny videos by comedic potential, e.g. is “Charlie bit my finger” funnier than “David after dentist”? Raw viewcount on its own is insufficient as a ranking metric since it is biased by video age and exposure. We noticed that viewers emphasize their reaction to funny videos in several ways: e.g. capitalization (LOL), elongation (loooooool), repetition (lolololol), exclamation (lolllll!!!!!), and combinations thereof. If a user uses an “loooooool” vs an “loool”, does it mean they were more amused? We designed features to quantify the degree of emphasis on words associated with amusement in viewer comments. We then trained a passive-aggressive ranking algorithm using human-annotated pairwise ground truth and a combination of text and audiovisual features. Similar to Music Slam, we used this ranker to populate candidates for human voting for our Comedy Slam.

 So far, more than 75,000 people have cast more than 700,000 votes, making comedy our most popular slam category. Give it a try!

Further reading:
  1. Opinion Mining and Sentiment Analysis,” by Bo Pang and Lillian Lee. 
  2. A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews,” by Oren Tsur, Dmitry Davidov, and Ari Rappoport. 
  3. That’s What She Said: Double Entendre Identification,” by Chloe Kiddon and Yuriy Brun.

الأربعاء، 2 نوفمبر 2011

Discovering Talented Musicians with Acoustic Analysis



In an earlier post we talked about the technology behind Instant Mix for Music Beta by Google. Instant Mix uses machine hearing to characterize music attributes such as its timbre, mood and tempo. Today we would like to talk about acoustic and visual analysis -- this time on YouTube. A fundamental part of YouTube's mission is to allow anyone anywhere to showcase their talents -- occasionally leading to life-changing success -- but many talented performers are never discovered. Part of the problem is the sheer volume of videos: forty eight hours of video are uploaded to YouTube every minute (that’s eight years of content every day). We wondered if we could use acoustic analysis and machine learning to pore over these videos and automatically identify talented musicians.

First we analyzed audio and visual features of videos being uploaded. We wanted to find “singing at home” videos -- often correlated with features such as ambient indoor lighting, head-and-shoulders view of a person singing in front of a fixed camera, few instruments and often a single dominant voice. Here’s a sample set of videos we found.



Then we estimated the quality of singing in each video. Our approach is based on acoustic analysis similar to that used by Instant Mix, coupled with a small set of singing quality annotations from human raters. Given these data we used machine learning to build a ranker that predicts if an average listener would like a performance.

While machines are useful for weeding through thousands of not-so-great videos to find potential stars, we know they alone can't pick the next great star. So we turn to YouTube users to help us identify the real hidden gems by playing a voting game called YouTube Slam. We're putting an equal amount of effort into the game itself -- how do people vote? What makes it fun? How do we know when we have a true hit? We're looking forward to your feedback to help us refine this process: give it a try*. You can also check out singer and voter leaderboards. Toggle “All time” to “Last week” to find emerging talent in fresh videos or all-time favorites.

Our “Music Slam” has only been running for a few weeks and we have already found some very talented musicians. Many of the videos have less than 100 views when we find them.



And while we're excited about what we've done with music, there's as much undiscovered potential in almost any subject you can think of. Try our other slams: cute, bizarre, comedy, and dance*. Enjoy!

Related work by Google Researchers:
Video2Text: Learning to Annotate Video Content”, Hrishikesh Aradhye, George Toderici, Jay Yagnik, ICDM Workshop on Internet Multimedia Mining, 2009.

* Music and dance slams are currently available only in the US.

الاثنين، 20 يونيو 2011

Auto-Directed Video Stabilization with Robust L1 Optimal Camera Paths



Earlier this year, we announced the launch of new features on the YouTube Video Editor, including stabilization for shaky videos, with the ability to preview them in real-time. The core technology behind this feature is detailed in this paper, which will be presented at the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR 2011).

Casually shot videos captured by handheld or mobile cameras suffer from significant amount of shake. Existing in-camera stabilization methods dampen high-frequency jitter but do not suppress low-frequency movements and bounces, such as those observed in videos captured by a walking person. On the other hand, most professionally shot videos usually consist of carefully designed camera configurations, using specialized equipment such as tripods or camera dollies, and employ ease-in and ease-out for transitions. Our goal was to devise a completely automatic method for converting casual shaky footage into more pleasant and professional looking videos.



Our technique mimics the cinematographic principles outlined above by automatically determining the best camera path using a robust optimization technique. The original, shaky camera path is divided into a set of segments, each approximated by either a constant, linear or parabolic motion. Our optimization finds the best of all possible partitions using a computationally efficient and stable algorithm.

To achieve real-time performance on the web, we distribute the computation across multiple machines in the cloud. This enables us to provide users with a real-time preview and interactive control of the stabilized result. Above we provide a video demonstration of how to use this feature on the YouTube Editor. We will also demo this live at Google’s exhibition booth in CVPR 2011.

For further details, please read our paper.