Visual perception

The thermal camera detects infrared energy and allows firefighters to ‘see’ in smoke and darkness.

Visual perception refers to the mind’s ability to process and make sense of what the eyes see. Our perception of the displayed image on a Thermal Imaging Camera (TIC), will affect our ability to interpret information for making quick and accurate decisions.

Visual sight and cues

Seeing involves the physical process of receiving visual stimuli through the eyes and detecting light and shapes. In many cases, our attention is drawn to movement, contrast, colour and shapes, including lines, edges and outlines. However, seeing alone does not ensure understanding the significance or context of the visual information. Visual perception should not be confused with visual acuity or the sharpness of our vision. Having perfect eyesight doesn’t necessarily mean having great visual perception.

Detection, Recognition and Identification (DRI) are terms commonly used in in fields like surveillance, security and imaging systems, and describe different levels of object identification based on visual or sensor inputs:

Detection: the ability to distinguish the presence of an object from its background.
Recognition: classifying objects, shapes or patterns, such as people, furniture or building features.
Identification: discerning objects in detail, to enable precise decision-making.

Perception of the displayed image

Visual perception involves the cognitive processing of visual stimuli, including interpretation, recognition and understanding. It goes beyond mere observation and interpretation of the screen display, with decision-making by integrating contextual information and prior knowledge to derive meaning from visual screen cues. When other sensory inputs are available, they can be combined with visual information to provide a more comprehensive understanding of the environment.

Top-down and bottom-up visual processing

Top-down processing relies on prior knowledge, experiences and training to interpret what is seen, even if the information is unclear. For firefighters, this includes training and experience with the camera in recognising varied fire behaviour, typical building layouts and objects. It can enable us to streamline the process of perception when there are gaps in available information or situations where our senses could overwhelm us. It’s important that training closely resembles operational conditions to enhance top-down processing in real scenarios.

Bottom-up processing focuses on building perceptions based on the current input of information in real time, also known as data-driven processing. There is little need for interpretation as the information we receive is basic, such as lines, edges, size, shape and basic depth cues, to identify objects – what you see is what you get, without the influence of prior knowledge or expectations.

Visual perception skills

If we are trying to locate the seat of the fire. We can focus on relevant visual information while filtering out irrelevant details. ‘Visual attention’ of the energy indicators at and in the doorway provides an immediate cue.

Basic visual perception skills can be typically broken down into several sub-areas, which are relevant to screen interpretation and involve both top-down and bottom-up processing:

Visual closure: The ability to recognise or reconstruct familiar objects when they are only partially visible or obscured. This skill is essential in identifying objects partially hidden behind or under other items, such as furniture or people.

Visual figure-ground: Distinguishing an object from its background or surroundings. Objects are perceived as either figure (the object or element in view) or ground (the background). This generally relies on good thermal contrast of single items and is also crucial for identifying objects in a cluttered environment within the Field of View (FOV).
Visual discrimination: Determining differences or similarities in objects based on size, shade/colour, shape, etc. If we are trying to locate a fire, we are searching for indications of high energy. Examples include the introduction of screen colours at higher temperatures that may give an immediate indication of fire conditions, linked to the screen’s heat ‘colour reference bar’ (temperature scale), and the contrast and movement of convection flow. Edge definition (shapes) and contrast, within the displayed image can also assist in identifying objects.
Visual attention: The ability to focus on relevant information while filtering out irrelevant details. For instance, when searching for a person in a cluttered environment, attention is directed towards familiar shapes or shades, while still monitoring fire-behaviour conditions.
Visual spatial relationships: Understanding the relationships of objects within the environment, for example, under, above, behind, in front, etc. This can also be a helpful depth cue.
Visual memory: Recalling visual traits of a form or object This skill is crucial for scene comprehension and comparing, recognising, mentally manipulating objects or recalling spatial layouts.
Visual sequential memory: The ability to recall a sequence of viewed objects in the correct order. This can be important for situational awareness and navigation when returning along a previously taken route.
Visual form constancy: Knowing that a form or shape is the same, even if it is closer/further away (larger/smaller) or turned around. This can typically include people, furniture or other items identified in a structure.

Using Gestalt’s theory, a combination of, ‘figure-ground’, ‘similarity’ and ‘closure’ allows us to identify and recognise the fire hose. If searching for the exit, ‘continuity’ leads our eye and camera to follow it.

Some of these are the same or align with Gestalt’s theory of visual perception – ‘We see the whole as more than the sum of the parts’. The theory explains how the human brain interprets information about relationships and hierarchy in a design or image. Key principles of Gestalt that apply to thermal imaging include:

Closure (visual closure): Recognizing incomplete objects. Similar to the mind’s ability based on past experience, to complete images.
Figure-ground: Distinguishing objects from their background.
Proximity: Perceiving closely positioned items as a group or within a common region, such as a dining setting.
Similarity: Our attention may group items based on the same shape, size, greyscale shade or colour regardless of proximity.
Continuity: Following lines or curves from one object to another. This can include hoses or areas that form a path such as typical walkways between furniture or hallways.
Emergence: Recognizing a whole object from the arrangement of elements without analysing every detail.
Symmetry and order: The human tendency to simplify complexity when our environment overloads our senses with stimuli. We seek order and sometimes fill in missing information to create a complete picture.

The ‘visual discrimination’ of screen colourisation draws immediate attention to the burning material on the floor.

Human, environmental and technical factors when viewing and using the camera

Human factors: Visual attention can be compromised by information overload, distractions and multitasking, which reduces the ability to process displayed information.

The ability to focus on relevant screen details and anomalies depends on directing the fovea, the narrow central, high-resolution part of the retina, to different locations on the screen. Individual factors, such as visual acuity, greyscale and colour perception, screen eye relief, viewing angle and other personal preferences, can also influence perception.

Prior experience with similar images enhances interpretation through pattern recognition and familiarity.

Cognitive search strategies can include examining items individually, in complex or cluttered scenes. Or, processing multiple items and their similarity simultaneously when objects are easily distinguishable by temperature and other features, and examining scenes over time to perceive changes during dynamic situations.

Training and experience will improve our ability for both top-down and bottom-up processing and interpretation of the displayed screen image.

Environmental factors: Smoke, airborne particles, fogging and contaminants on the viewing screen, camera lens or mask can restrict or limit visibility. Lighting conditions, screen glare and reflections on the screen’s surface from external light sources can also affect the viewing screen.

Technical factors: The quality of the displayed image is affected by factors that include, but are not limited to, sensor resolution, non-uniformity correction, dynamic range, Noise Equivalent Temperature Difference (NETD), refresh rate, screen display technology, including size, resolution, brightness and contrast.

Screen indicators that can enhance visual perception

The TIC presents a screen image of infrared radiation, otherwise invisible to the eye. The clarity and contrast of the camera’s image play a crucial role in shaping perception.

Camera modes like ‘TI Basic’ utilise grayscale and colourisation with shades of yellow, orange and red to denote elevated temperatures. The contrast of greyscale and colours matched to the screen heat colour reference bar (temperature scale) to supplement the greyscale image can indicate details. Introducing colour can bring a focal point and immediate and obvious attention to high heat levels within the FOV.

Even two-dimensional screen clues can still provide some depth and motion perception. This is due to the eye’s ability to detect such things as thermal contrast, texture gradient, overlap the relative size of objects (close or further away), linear perspective (parallel lines converge) and motion parallax of objects.

Some cameras may have additional modes, screen palettes and features for particular applications. Cameras with both a thermal and visual sensor may have the ability to provide a thermal overlay over a visual image that highlights thermal contrast in specialist operating modes. Some add additional detail and edge definition by blending a visual input. Others use software refinement of the infrared image alone to add edge definition. These can be useful in enhancing the image for interpretation and perception of objects.

Conclusion

Visual perception of the screen image involves bottom-up and top-down cognitive processes. It allows us to efficiently filter out irrelevant information and focus attention on specific objects or details. It includes interpreting the thermal image, identifying the contrast and form of objects, recognising movement, having depth perception and spatial awareness. It is a multi-faceted process, can include other sensory inputs, and is essential for navigation, evaluation, problem-solving and achieving objective outcomes safely.

Reference

Anderson, John R. Cognitive Psychology and Its Implications. 7th ed., Worth Publishers, 2010.

About the Author

Gavin Parker

+ posts

I joined Country Fire Authority (CFA) as a volunteer member of at the age of 16 in 1976 and commenced a full time career with CFA in January 1995. I have been working in the Latrobe Valley since 2000 and since that time have attended numerous fires and incidents in the power industry including several significant mine fires and currently working as a Senior Station Officer at the Traralgon Fire Station on D Platoon. Apart from my normal duties I have had a long term interest in firefighting aircraft operations, thermal imaging and the coal industry.