An introduction to the field of computer vision and image recognition, and how Deep Learning is fueling the fire of this hot topic.

Computer Vision is an interdisciplinary field that focuses on how machines or computers can emulate the way in which humans’ brains and eyes work together to visually process the world around them.

Research on Computer Vision can be traced back to beginning in the 1960s. The 1970’s saw the foundations of computer vision algorithms used today being made; like the shift from basic digital image processing to focusing on the understanding of the 3D structure of scenes, edge extraction and line-labelling. Over the years, computer vision has developed many applications; 3D imaging, facial recognition, autonomous driving, drone technology and medical diagnostics to name a few.

Not to be confused with digital image processing; Computer Vision is concerned with the extraction of understanding from images. There are two key components to Machine Vision, engineering autonomous systems that are able to perform tasks of human vision and developing algorithms and computational models that are able to replicate the inner workings of vision and biological understanding.

People working on initial Computer Vision projects believed a camera attached to a computer, describing what it sees, would do the trick – the most cited story of this being Marvin Minsky’s request in 1966 at MIT, for an undergrad to work on a summer project of linking a computer and camera together, and getting the computer to describe the images it saw. However, as we know now –vision and image recognition in humans is so complex, replicating this in intelligent machines or computer systems is an extremely difficult task!

The process of computer vision is built on three principles of sight, pattern recognition and understanding; tasks include different approaches to obtaining, processing and studying varieties of images. Much success has come in recreating the human eye through a variety of sensors, image processors, and cameras that match our own capabilities if not exceed them.

Our biological neural networks are proficient in extrapolating patterns and recognizing images within split seconds, and we’ve had the luxury of training our systems since birth with all this data from the world around us. Neuroscientists from MIT discovered that the human brain is able to interpret and understand images seen for a mere 13 milliseconds. To gain the knowledge of image meanings, from objects, actions, emotions, and settings will require massive data sets to train artificial neural networks. To replicate these rapid speeds and big data sets will take enormous processing power.

As humans, we automatically see lines, curvatures and different shapes of what we are looking at, to help distinguish what the object is, whereas, computers see numerical indexes and matrices. This is where convolutional neural networks come into play. Deep neural networks use filters or feature detectors, and feature mapping to process the image data through different layers of the network, to better distinguish what the object is. This helps draw out the shape of an object through assigning different filters to vertical and horizontal lines. As this technology advances, so will computer vision capabilities.

Some popular uses of computer vision systems can be found in autonomous car systems that capture and process images of the world around the car like road signs and obstacles. We’re also finding a rise in the uses for image search and recognition, from Facebook tagging when you upload your latest pictures and smart CCTV systems, to searching the internet using an uploaded or copied image on Google. The future of computer vision is looking good, and the field has seen an increase within a variety of industries lately. It’s predicted that the Computer Vision Hardware and Software market is set to increase from a reported $6.6BN in 2015 to $48.6BN by 2022.


what is computer vision?

Leave a Comment