The explosive growth of video on the internet, exemplified by the success of video sharing websites such as YouTube, calls for new ways of indexing and searching audiovisual content. A team of European researchers has developed a groundbreaking solution that is finding commercial applications.
Most video search technologies currently rely on semantic annotation in which videos have to be manually tagged with keywords so they can be found via a text-based search. As most YouTube users will attest, tagging one or two videos in this way is not particularly problematic. However, manually annotating thousands of clips, as content providers and media libraries regularly do, can be extremely time consuming and costly.
A faster alternative is to use software to automatically extract snippets of a video and create a unique identifier based on a variety of audiovisual features, such as scene, motion and music changes. These so-called digital media fingerprints can then be used to index and search full audio/video content. The technology works well for uncompressed, raw audio and video, but it has not been used effectively with the far more common, space-saving compressed files that stream from websites, are stored in media libraries or are broadcast by TV stations. Until now, that is.
“We wanted to develop a way of indexing and searching compressed video files quickly and easily regardless of their compression format or how or where they are stored,” says Nick Achilleopoulos who oversaw development of the technology as manager of the EU-funded DIVAS project.
To achieve that goal, the DIVAS researchers developed two advanced software engines: one to create fingerprints from compressed audio and/or video and another to use these unique identifiers to carry out content-based searches of audiovisual material.
Unlike most digital fingerprinting systems, the DIVAS indexing software does not require video to be uncompressed, reducing the need for computer processing power and storage space, while greatly accelerating the indexing process. For example, whereas other systems would have to generate a fingerprint from 60 gigabytes of raw video, the DIVAS technology can create a fingerprint from the 4GB DVD-quality compressed version. Crucially, it works across most popular video formats, from the DVD and TV broadcast MPEG standard to Microsoft’s WMV and also with standalone audio files in formats such as MP3 and AAC.
“The fingerprint extraction software defines audio and video features much as a human viewer perceives audiovisual elements… It builds the fingerprint based on visual features, such as scene changes, the way the camera cuts and moves, the brightness level, and the movement of people and objects,” the project manager explains.
Audio features such as speech and music also form part of the fingerprint – providing crucial additional information to differentiate between visually similar video content like lectures or music concerts.
The audiovisual fingerprints, each just a tiny fraction of the size of original content, are stored in the XML file format in combination with the MPEG 7 multimedia content description standard, creating an easily accessible and rapidly searchable video index.
“Say you saw a short clip of a TV series and wanted to see more of it but did not know the name. You could easily upload the clip to a DIVAS search engine and then use this to find not only the series, but also the season, episode and the exact minute of a scene the clip is from,” Achilleopoulos explains.
One caveat, however, is that the searcher would have to have an indexed database of video content to compare the fingerprinted clip to. That would prove useful to someone with a lot of digital movies to help them find videos in their collection from trailers on the internet – indeed, the DIVAS team developed an experimental plug-in for the Firefox web browser to that effect.
However, the key commercial market for the technology consists of media companies and internet search providers seeking faster methods of indexing and searching video, production companies scouring the internet for pirated versions of copyrighted works, as well as, interestingly, TV advertisers.
“A lot of companies are interested in monitoring broadcasts to make sure TV stations are airing their adverts in the time slots and with the frequency they pay for. Currently, they do this by recording broadcasts on expensive equipment and even have people watch the TV, but a much cheaper alternative would be to record compressed files and have software automatically creating fingerprints of the content. These could then be matched with the advertiser’s content, letting them know precisely when and how often their adverts are shown,” Achilleopoulos says.
Members of the DIVAS consortium are currently in talks with a large advertising firm with a view to deploying the technology commercially to monitor TV broadcasts. They have also been approached by companies looking to use their technology to improve internet searches of their video databases.
Israeli project partner Optibase, meanwhile, has integrated the DIVAS technology into its EZTV internet video delivery system for corporate, government and educational users in local and wide area network environments.
Achilleopoulos notes that the partners are looking for investors to help develop or support additional commercial applications for the DIVAS technology, which, as a modular system, is easy to integrate with existing systems and can be expanded to offer additional functionality.