Candid About Cameras — Human Analytics vs Video Analytics
April 22, 2014
This is the golden age for video surveillance systems. The old analog “can’t do much” cameras have been superseded by modern network cameras, which connect by IP and have many programmable features that can be controlled and polled remotely. Instead of using a joystick or array of buttons to call a camera on screen, today’s IP cameras are managed by Video Management Software (VMS), which provides a dashboard on a computer monitor to control and manage your video camera network.
Technology has come a long way for surveillance video cameras. Resolution has increased dramatically, and so has the frame rates. Commonly available now are 30 frame per second HDTV resolution cameras. With the 2007 introduction of the iPhone, its two-megapixel camera became the most widely used digital camera on the market. Video Surveillance camera manufacturers took advantage of the newly available high quality, low cost camera hardware and incorporated the same parts into their products.
But there is always a tradeoff with video cameras; the high resolution of camera images and their storage requirements go hand-in-hand. The more resolution a camera has, the greater the storage requirements to record and archive the video footage.
Watch It Later
In general, surveillance video is recorded, not watched live. Less than 1% of all surveillance video is ever viewed by human eyes. There are too many cameras to watch. Studies have shown that when a security person watches a video monitor, fatigue sets in after 18 minutes. Usually, there is nothing interesting happening on the video screen to hold the attention of the security room operator. When you think about it, why should a human be dedicated to watching live security video?
Ideally, a security operator would only watch video live when an “activity of interest” is occurring. Surveillance video customers would very much like to have a system that can tell them when to watch which camera. This would be a welcomed timesaver and innovation beyond the current model of just storing video to be reviewed, post-mortem, after a security event has ended.
Video analytics was born from the desire for automation of the labor-intensive cost associated with manually watching video live. And the desire to know sooner (instantly) when something bad is happening. It would be ideal if a software algorithm could “watch” (analyze) all the live video footage from all the cameras and tell a human if something needs attention. The good thing about using a computer algorithm to examine video is it can run 24/7 without taking any coffee breaks. The bad thing about video analytics is, when compared to a human brain, software algorithms are primitive at recognizing suspicious activity.
If a video analytics algorithm is too rough in its analysis, the result can be many false positives. Repeatedly identifying possible problems that call for a security guard to review can also introduce operator fatigue if there are a high percentage of false alarms. When Peter cries wolf too many times, the real wolf may just walk right by him.
It is not realistic to expect a VA system to make much sense in a crowded, dynamic environment such as a train station or the finish line at a marathon. There is too much going on for a computer to reliably detect and report “bad behavior”. I’ve seen VA demonstrations showing how software rules can spot an abandoned suitcase left in the middle of an open floor in an airport. Oh if only terrorists were that obvious! But they are not. Terrorists do not design their heinous plans for the convenience of VA systems.
It was observant humans who recognized and stopped the shoe bomber, the Times Square bombing attempt and the underwear bomber. The government recognizes the important role watchful citizens can play in stopping acts of terror or crime. Cities and transit systems loudly voice their desire to get tips of suspicious behavior from people. The relatively new model of engaging the public to participate in Public Safety has been immortalized by the “If you see something, say something” slogan which is publicized by most major transit systems.
Humans are incredibly perceptive and have an ability to just know when something doesn’t look right. People have the ability to identify suspicious activity that exceeds any computer algorithm. As an example, on a hot summer day in Boston, two men boarded a train. Though it was 95 degrees that day, the men were dressed up to their necks in some sort of plastic suits. No computer algorithm would have flagged them as suspicious, but a train rider did and used his smartphone to submit a report the suspicious men to transit police. Armed with a photo of the men and the GPS map location of the train, police were able to intercept the train and investigate.
VA is very good at recognizing when rudimentary rules are violated. For example, if no one is supposed to be in a parking lot between 10 PM and 5 AM, a VA system can identify motion in the area and flag that activity. Or if in a museum, no one is supposed to cross a certain line to get closer to a valuable painting, a VA system can detect when the virtual tripwire has been breached, when someone gets too close to the painting. But understanding human behavior is a whole ‘nother ballgame. We humans are very good at spotting something out of the ordinary. We take note of subtle differences. This ability could be called “Human Analytics” and refer to a person who hears or observes something suspicious and reports it to security. The introduction of the iPhone allowed surveillance video camera makers to benefit from the new, lower prices of camera technology which resulted from the mass-market manufacturing of iPhones.
Well, once again the smartphone has given surveillance video a huge boost! Smartphones are everywhere now. Both iPhone and Android are widely carried by people everywhere. The ease of use of smartphones coupled with their ubiquity make them ideal tools for reporting crimes and suspicious activities.
Mass Transit systems like Boston, Atlanta, San Francisco and others have introduced smartphone apps for their riders to be able to quickly send photos, maps and descriptions of security or safety concerns to the police. It is very useful for police dispatcher to have a photo and description and 2-way chat with the rider, however the GPS map is perhaps the most valuable piece of information transmitted as part of a rider’s report.
Where am I?
There is not a surveillance camera in the world that knows where it is. Unlike smartphones which all have GPS chips, there is no such thing as a surveillance camera with a GPS chip. Video cameras are assigned names in a VMS to provide some sort of location information to them for a guard’s convenience to know, roughly, where each camera is. For example, “North Parking Lot camera”, “Front entrance Camera”, etc.
Video Cameras all have a “Field of View” (FoV), which is the area they can “see”. Most surveillance cameras have a fixed FoV, but some have movable lenses and are called Pan Tilt Zoom (PTZ) cameras. ELERTS is able to define the Field of View for each camera in a network as a polygon on a map. Even for PTZ cameras, ELERTS Attention Engine system allows the camera’s viewing area to be stored. These FoV polygons are stored in ELERTS cloud, for future reference. The FoV polygons allow ELERTS system to know, geographically, where each camera can ‘see.’
Right Here, Right Now
When a report is sent from an ELERTS app, the LAT/LON GPS information is sent with the report to ELERTS cloud. Knowing the location where the report is coming from allows ELERTS to correlate that location with the stored FoV polygons for each camera in the network. In moments, ELERTS can identify nearby Surveillance cameras and send that information to the VMS system. The VMS system can then pop a live view onto a big screen monitor to show the security dispatcher a live view of a crime or security concern. Stopping crime is better than recording it.
At the end of the day, this is what a VA system is trying to do also, determine which is the camera of interest and bring it to the attention of a security operator, immediately. First Responders cannot respond until they know where a problem exists.
As discussed, VA systems are very good at monitoring simple scenarios like noticing when people are walking in a parking lot, 24/7. VA systems may be good at spotting when a car is driving the wrong way down a road. But human intuition is vastly superior as a sensor for trouble involving other humans.
Security is always best served as a layered system. Different layers provide different tools and strengths. The best Surveillance Video networks will incorporate both VA and Human Analytics into their security system.