Imagine that we want to find a scene of someone with sunglasses in a video. There are two options: we play the video in search of the exact moment or we use a semantic search that uses artificial intelligence to analyze each frame and return the results.
The second is what the experiment that I show you today offers, created by David Chuan-en Lin and presented on Twitter.
In his thread he shows some examples of what can be achieved with whichframe.com. We just have to upload the video to that website (or indicate one from YouTube) and specify in words (in English) what we are looking for. The system will take care of the analysis and return the result, although it may take a while to do so.
Search a video * semantically * with AI. https://t.co/9ASZ85Q5AA
Example: Which frame has a person with sunglasses and earphones?
Try searching with text, image, or text + image.
More examples[1/7] pic.twitter.com/y6OI5VDTxc
– David Chuan-en Lin (@chuanenlin) April 17, 2021
You’re not just limited to searching with text. We can also search with images, as if it were a reverse Google image search, and also a combination of text + image.
On how he created it, he comments:
The query is powered by OpenAI’s CLIP neural network to perform zero-shot image classification and the interface was built with Streamlit.
The web does not yet have an https certificate, and surely it is not prepared to receive thousands of requests per second, but it gives us an idea of what we will be able to have soon on large video platforms.