AI Video Comprehension at Parmonic

Piyush Saggi
|
August 4, 2020
|
Product Updates

It is probably the most popular question we get at Parmonic: “How does it work?” Does a computer understand our video? “Can a robot watch a video like a human?” First, it’s AI, not a robot. But these questions are worth unpacking. To determine if a computer has video comprehension skills, we have to ask:

What does it mean to “comprehend?”

Comprehension is a multilevel principle. At the outermost level, it means identifying the principal topic of the item at hand. A video equivalent would be, say, creating a title.

Often a title might not be enough to guide a viewer. So, the next level of comprehension comes into play – semantics. You might come up with a couple of key moments from a video and make subtopics out of them.

Once you direct the attention to a subtopic, the third level drills into the content. This could be objects, actors, phrases, etc.

Probably the deepest level of comprehension would involve taking all these levels and analyzing every frame of a video. For reference, a typical video is shot with thirty frames in every second. So, your thirty-minute product demo would contain 54,000 frames.

That’s a lot of frames!

Humans excel at video comprehension. It is how the species has survived- making sense of the visual and auditory signals around us allowed cavemen not to be eaten by a saber-tooth tiger. What humans do not have is patience or time. Unless you have a viral Netflix sensation, people will not sit down and watch all of your videos. Plus, the number of videos keeps increasing. Recent research shows that

In 2019 businesses will create an equivalent of Netflix every 10 seconds. - Research by Cisco and Parmonic

But machines do not complain. They do not get tired, and they can follow the instructions of an algorithm without blinking. A computer program can crunch spreadsheets and numbers like there’s no tomorrow. It can “crunch” video the same way. Give a program one thousand videos and it can mark every word spoken, and their frequency.

How we can help

Parmonic’s work and technology are aimed at making the task of “watching” easier for humans by making the most intelligent video engine out there. An engine that can do more than identify faces and objects. We want an engine that can use semantics to fully “comprehend” what is on the screen.

As Geoffrey West writes in Scale, the pace of life is increasing as our society becomes more digitally based. We are building the video AI engine that helps your viewers get to what matters as quickly as possible.

Or, maybe one day you can ask your friendly robot friend PAR to watch something for you and get a 3-minute synopsis.