Kevin Hsieh†, Ganesh Ananthanarayanan§, Peter Bodik§, Shivaram Venkataraman§*, Paramvir Bahl§, Matthai Philipose§, Phillip B. Gibbons†, Onur Mutlu^†
†Carnegie Mellon University
§Microsoft
*University of Wisconsin
^ETH Zürich
Large volumes of video are continuously recorded by cameras deployed for traffic control and surveillance with the goal of answering "after the fact" queries such as: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. Current systems for processing such queries on large video datasets incur either high cost at video ingest time or high latency at query time. We present Focus, a system providing both low-cost and low-latency querying on large video datasets. Focus’ architecture flexibly and effectively divides the query processing work between ingest time and query time. At ingest time (on live videos), Focus uses cheap convolutional network classifiers (CNNs) to construct an approximate index of all possible object classes in each frame (to handle queries for any class in the future). At query time, Focus leverages this approximate index to provide low latency, but compensates for the lower accuracy of the cheap CNNs through the judicious use of an expensive CNN. Experiments on commercial video streams show that Focus is 48X (up to 92X) cheaper than using expensive CNNs for ingestion, and provides 125X (up to 607X) lower query latency than a state-of-the-art video querying system (NoScope).
FULL PAPER: pdf