Google speech recognition and YouTube Closed Captions - what do they have in common?
Some time ago we wrote about the new feature in Google that allows to see several political videos with automatically generated speech transcripts. I was correct when I said Google is soon to index all sound and videos using the speech recognition technology. Just couple of days ago they announced a new service called GAudi that lets users search for text inside videos. So far, search can only be done within a limited subset of videos but the service is still in the “lab” stage and I am sure there’s more to come (e.g. all videos on the net transcribed and indexed).
Several folks have indicated that the speech recognition algorithms currently used by Google are not unique and can lead to some embarrassing moments BUT this is where I think the recently announced Closed Captions feature of YouTube will come into play.
If you know a little about the insides of the speech recognition and translation algorithms (one can argue they are the same if looked at from machine point of view) you will definitely know that the most advanced algorithms use the learning technique (Google Translation service as an example). This is when two sets of data representing the same thing are fed into the algorithm (e.g. same text in English and Spanish) and the algorithm improves upon itself with every new set of data that it gets based on the learnings from that dataset comparison.
Speech recognition and Closed Captions is exactly what Google needs as two corresponding sets of data that can be used to improve the speech recognition quality!
By providing Closed Captions for your YouTube video you will also help Google to correctly transcribe every single word in your video.