Achievement Award

Research and Development of Extremely Robust and Fast Media Search Technology

Kunio KASHINO, Hidehisa NAGANO, Takayuki KUROZUMI

  Today, vast quantities of audio and video signals are being collected, stored and distributed every day. An essential task is searching for desired information embedded in potentially unlabeled signals. The basis of this searching is identification of signals, with a stored dictionary and appropriate distance measures. However, unlike the case of text search, quickly searching huge amounts of stored audio or video signals had not been a simple task until about 15 years ago. This was particularly because those signals tend to significantly vary or fluctuate due to noise, distortion, compression or editing.
   The recipients solved this problem by inventing a series of techniques called Robust Media Search (RMS). RMS enables extremely robust identification of audio and video signals, even if they are severely altered due to noise, distortion, compression, or editing. The technology is characterized by (1) representing spatio-temporal local regions of signals with binarized or coarsely quantized features, and (2) quickly spotting any consistently aligned parts of signals through combinatorial use of those localized or region-based features.
   Now, these ideas are well-known and widely-used but, when the recipients invented the basic RMS scheme in the early 2000s, their methods were quite unique in that unexplored field because, at that time, the features were usually calculated from the whole signal, not from local regions, for each point in time.
   With the above-introduced inventions, as well as further technical development accumulated thus far, very robust and fast signal search has been realized. For example, with RMS, it is possible for a single PC to search against a whole database containing some tens of thousands of hoursf worth of audio or video features in less than 1 second, or to detect specified music in an audio signal even if the music is used in the background of a main loud sound and its SNR is as low as -20dB.
   Today, with the ever-increasing amount of accessible media data stored worldwide, the recipientsf technology is becoming widely deployed. For example, for copyright management purposes, the music fragments used in TV or radio programs are being identified with RMS, using a music database. This was a revolutionary solution in the area of copyright management, since RMS enabled automatic recognition of music, including very short or background uses, and copyright management efficiency has been greatly improved. Another example is video posting sites, where RMS is used to prevent illicit use of copyrighted materials. Furthermore, RMS has also been utilized in many other applications, including so-called gsecond screen,h whereby an audio or video segment captured by a smartphone can be a trigger to connect to the Internet in order to send or receive related information.
   To summarize, the recipientsf achievement has significantly contributed not only to the research field as a series of pioneering activities regarding basic media processing but also to the everyday world through a wide range of innovative applications in the areas of media content production, distribution, viewing, utilization, and information retrieval, linking the real world and the ICT world using sounds and images as keystones.
Fig.1 Schematic diagram of signal detection by Robust Media Search technology
Fig. 1 Schematic diagram of signal detection by Robust Media Search technology

References

  1. Hidehisa Nagano, Kunio Kashino, Hiroshi Murase: "A Fast Search Algorithm for Background Music Signals Based on the Searches for Numerous Small-Region Spectrograms", IEICE Trans. Inf. & Syst. (JPN Edition), vol.J87-D-II, no.5, pp.1179-1188 (2004).
  2. Takayuki Kurozumi, Hidehisa Nagano, Kunio Kashino: "A Robust Video Search Method for Video Signal Queries Captured in the Real World", IEICE Trans. Inf. & Syst. (JPN Edition), vol.J90-D, no.8, pp.2223-2231 (2007).
  3. Kunio Kashino, Akisato Kimura, Hidehisa Nagano, and Takayuki Kurozumi: "Robust Search Methods for Music Signals Based on Simple Representation", Proc. ICASSP, pp.1421-1424 (2007).
  4. Kunio Kashino: "Audio fingerprinting: techniques and applications", The Journal of the Acoustical Society of Japan, vol.66, no.2, pp.71-76 (2010).
  5. Kunio Kashino, Takayuki Kurozumi, Ryo Mukai: "Media Content Identification Technology", The Journal of IEICE, vol.93, no.4, pp.340-342 (2010).
  6. Ryo Mukai, Takayuki Kurozumi, Takahito Kawanishi, Hidehisa Nagano, Kunio Kashino: "Robust Media Search Technology for Content-Based Audio and Video Identi cation", E-Letter of Multimedia Communications Technical Committee, IEEE Communications Society (2012).
Close