Patent Issued for Object Recognition for Security Screening and Long Range Video Surveillance

Posted by dtaitano December 11, 2014 2:24 pm Categories: Legal, Surveillance Tags: analytics, invention, Object Recognition, Patent, security screening.

VerticalNews journalists report that a patent by the inventors Shet, Vinay Damodar (Princeton, NJ); Bahlmann, Claus (Princeton, NJ); Singh, Maneesh Kumar (Lawrenceville, NJ), filed on February 16, 2012, was published online on December 2, 2014.

The patent’s assignee for patent number 8903128 is Siemens Aktiengesellschaft (Munich, DE).

News editors obtained the following quote from the background information supplied by the inventors: “The present disclosure relates generally to computer vision, and more particularly, to security screening and long range video surveillance using computer vision.

Patent Abstract
A method of detecting an object in image data that is deemed to be a threat includes annotating sections of at least one training image to indicate whether each section is a component of the object, encoding a pattern grammar describing the object using a plurality of first order logic based predicate rules, training distinct component detectors to each identify a corresponding one of the components based on the annotated training images, processing image data with the component detectors to identify at least one of the components, and executing the rules to detect the object based on the identified components.

“Security screening systems inspect checked and hand baggage, cargo, containers, passengers, etc. for content, such as, explosives, improvised explosive devices (IEDs), firearms, contraband, drugs, etc. They play a key role in the Homeland Defense/Security strategy for increased safety in airports, air and sea traffic. For instance, since August 2010 the government has mandated 100% air cargo screening, with possible extension to sea cargo. State-of-the-art security screening systems require improvement in a number of aspects. This includes (a) efficient and effective automation for improved throughput and focused operator attention and (b) a systems view and integration of various components in screening, e.g., reconstruction, segmentation, detection, recognition, visualization, standards, platform, etc., to achieve an efficient screening workflow.

“A current system for security screening involves two stages. In a first, automated, stage, X-Ray, CT, etc. scan data is obtained and image reconstruction is performed. Resulting images often encode material properties, such as, density or effective atomic number Z.sub.eff. Then, pixels or voxels of suspicious density and Z.sub.eff are identified, and contiguous regions segmented. Statistics of suspicious regions (e.g., mass, volume, etc.) are computed and compared to critical thresholds. In a second stage, identified suspicious regions are manually verified for occurrence of a threat by the human operator. This strategy is employed in many screening systems developed by various scanner vendors. However, these systems require a large amount of operator supervision, due to the large number of false alarms.

“Further, there is an increasing need for fast extraction and review, from real-time and archived surveillance video, of activities involving humans, vehicles, packages or boats. This need has been driven by the rapid expansion of video camera network installations worldwide in response to enhanced site security and safety requirements. The amount of data acquired by such video surveillance devices today far exceeds the operator’s capacity to understand its contents and meaningfully search through it. This represents a fundamental bottleneck in the security and safety infrastructure and has prevented video surveillance technology from reaching its full potential.

“Automated video analytics modules operating over video surveillance systems provide one means of addressing this problem, by analyzing the contents of the video feed and generating a description of interesting events transpiring in the scene. However, these modules are inadequate to robustly detect human and vehicular activities in video.”

As a supplement to the background information on this patent, VerticalNews correspondents also obtained the inventors’ summary information for this patent: “According to an exemplary embodiment of the invention, a method of detecting an object in image data that is deemed to be a threat includes annotating sections of at least one training image to indicate whether each section is a component of the object, encoding a pattern grammar describing the object using a plurality of first order logic based predicate rules, training distinct component detectors to each identify a corresponding one of the components based on the annotated training images, processing image data with the component detectors to identify at least one of the components, and executing the rules to detect the object based on the identified components. The pattern grammar may be implemented as instructions in a processor, where executing of the rules is performed by the processor executing the instructions.

“The image data may be output by a security screening device. In at least one embodiment, the training is performed using Adaptive Boosting.

“In an embodiment, the threatening object is a knife where the annotated sections indicate whether each component is one of a handle, a guard, or a blade of the knife.

“In an embodiment, the threatening object is a gun where the annotated sections indicate whether each component is one of a lock, a stock, or a barrel of the gun.

“In an embodiment, the object is a detonator and the annotated sections indicate whether each component is one of a tube and an explosive material.

“In an embodiment, the object is a bomb and the annotated sections indicate whether each component is one of a detonator, explosive material, a cable, and a battery.

“The image data may be X-ray image data. The image data may be computed tomography (CT) image data.

“In an embodiment, training includes determining uncertainty values for each of the rules, converting the rules into a knowledge-based artificial neural network, where each uncertainty value corresponds to a weight of a link in the neural network, and using a back-propagation algorithm modified to allow local gradients over a bilattice specific inference operation to optimize the link weights.

“In an embodiment, the pattern grammar describes a visual pattern of the threatening object by encoding knowledge about contextual clues, scene geometry, and visual pattern constraints.

“In an embodiment, the training of a corresponding one of the component detectors includes performing a physics-based perturbation on one of the annotated training images to generate a new annotated training image and training the distinct component detectors based on the annotated training images and the new annotated training image.

“The perturbation may be a geometric transformation. The performing of the perturbation may include adding another object to be superimposed with a component in the training image to generate the new annotated training image.

“According to an exemplary embodiment of the invention, a method of training a threat detector to detect an object in image data that is deemed to be a threat includes defining a pattern grammar to describe a visual pattern that is representative of the object, encoding the pattern grammar using a plurality of first order predicate based logic rules, and dividing an object into component parts, training distinct component detectors to each detect a corresponding one of the component parts, and generating the threat detector from the rules.

“According to an exemplary embodiment of the invention, a method of detecting an activity in video data includes annotating sections of at least one training video to indicate whether each section is a component of the activity, encoding a pattern grammar describing the object using a plurality of first order logic based predicate rules, training distinct component detectors to each identify a corresponding one of the components based on the annotated training videos, processing video data with the component detectors to identify at least one of the components, and executing the rules to detect the activity based on the identified components.”

0 Comments