How is the camera market responding to automation?
In an effort to automate or augment mundane human-based tasks, such as monitoring CCTV, production lines, or driving vehicles, existing computer vision solutions are largely based on outdated camera systems. But are these the best tools for the job?
Traditionally, cameras were the go-to for autonomous applications as they are widely available in the market place, offer high resolution output, and, more importantly, are cheap and easy to use. They are also well-understood by humans because they generally work in visible light – the same range as the human eye.
Additionally, they offer a level of variation, such as multispectral or hyperspectral modalities which can move beyond the normal frequencies of visible light into infrared and ultraviolet spectrums, which is an attractive prospect. However, they are still fundamentally similar to how a human sees the world.
For many use cases this is a good thing as the camera image is easily interpretable by human users and can also be used as evidence if misconduct was recorded in a monitored area. But in some ways this immediate similarity can be limiting when it is a machine that is the end user. By breaking this link to human-based perception, we open up new possibilities for machine-based computer vision and perception systems.
Getting a fresh perspective
With cameras, which only produce a 2D image, depth, and the precise location of any detected object, must be estimated based on available context and knowledge of the environment within the field of view. In most practical applications, this would yield only imprecise estimates of locations, especially in crowded environments where many objects are only partially visible due to occlusions or obstructions. It can also lead to misclassification of people or objects in posters on a wall or billboard, or even the side of a bus, which to a 2D sensor appear real.
If more precise and reliable tracking is needed, multiple images from different perspectives, like stereo vision systems, can be combined. We have all experienced the capabilities of such technologies in the tracking of players and balls in sports games.
Such solutions need very precise knowledge and control of individual sensor locations and time synchronisation between them. This is often impractical to achieve and can often offset the potential cost advantage of using cameras in the first place.
On the other hand, with 3D sensors, such as radar and LiDAR, depth can be measured directly providing much stronger context for perception applications. This direct measurement of distance is especially useful for the precise detection and tracking of humans, cars or other objects in complex or changing environments.
Additionally, this high precision tracking can often be achieved with a more limited number of sensors, covering greater distances and with an inherent robustness to changing lighting, including full darkness, and adverse weather conditions which might render a camera-based solution inoperable.
Removing the human element
Another strong benefit of moving towards other sensor modalities is a reduction in Personally Identifiable Information (PII). This can allow the use of LiDAR sensors in situations where a camera could not be placed due to security or privacy concerns. For example, the image from a typical LiDAR cannot uniquely identify faces, or read sensitive documents, both of which limit the application of cameras.
This is because, commonly, each tracked object detected by a LiDAR sensor reflects only a handful of points, whereas a camera image needs much higher resolution to reliably detect an object of interest. Due to the added information of distance in point clouds, these few points are still more than enough to precisely detect, classify and track various objects.
An evolving technology
The rapid evolution of 3D sensors over time, especially the increases in both resolution and frame rate, enable more advanced perception algorithms to be run. Not only is it now possible to detect and classify people and objects, but also to determine key characteristics, such as orientation, pose, pace and even, by inference, intent. This opens both new security application areas and previously difficult operating environments, greatly expanding the number of potential use cases. Separation of discrete objects in a crowded or partially obscured environment also becomes easier as sensor performance improves.
The reduced dependence on human readable data also enables the application of efficient edge processing, reducing the need to move large volumes of raw data from sensors to the end user. Instead, the majority of processing can be performed at the edge, resulting in a significantly reduced data burden since only the resulting object data needs to be transmitted to the output system. For a single sensor this is not perhaps so important, but in a large deployment over a wide area the savings could be game changing.
There is no doubting that cameras will remain the primary solution for many security applications. In fact, with the increasing availability of cheap and reliable computer vision solutions embedded into more and more camera products, it is inevitable that both their use and range of applications will continue to grow.
However, depending on the individual requirements of the application at hand, distance measuring sensor modalities, such as LiDAR, will either greatly enhance the capabilities of the security solution or replace cameras outright as, all aspects considered, a cheaper, easier to integrate and more reliable option.
For further information please visit https://cronai.ai/