icon-carat-right menu search cmu-wordmark

Detecting Malicious Code Using Information Flow Analysis

Created December 2025

Like many organizations, the Department of Defense (DoD) uses a variety of software and tools produced by various supply chains, all of which can be compromised by an adversary. This is a risk that affects most organizations whether they use third-party software, develop their own, or both. Most software development includes externally-developed software packages (e.g., external libraries) produced by various supply chains, each of which has the potential to be compromised by an adversary. An adversary might hack into a contractor’s network and inject malicious code or a tool may be compromised by a malicious insider threat (as seen in the XZ Utils backdoor incident of 2024). If undetected, such malware can result in costly compromises of mission-critical systems; in the unprecedented SolarWinds incident of 2020, for example, 18,000 organizations were infected by malicious code, 100 of which were then targeted.

Tracing Potentially Malicious Control and Data Flow

Researchers at the Software Engineering Institute (SEI) developed the Detection of Malicious Code (DMC) tool to prevent potentially devastating attacks from hidden malicious code. These embedded threats, such as sensitive data exfiltration routines, timebombs, logic bombs, remote-access Trojans (RATs), and backdoors, are especially dangerous in mission-critical environments. Our aim was to develop a tool that would report execution traces that allow a remote attacker to launch an arbitrary process, write arbitrary data to arbitrary files, and perform other sensitive operations. The challenge that we faced was detecting these traces of potentially malicious hidden code without execution, particularly when this malicious code is camouflaged by an otherwise entirely legitimate and benign codebase.

Taint analysis can track the flow of potentially unsafe data (taint) from designated system functions that return potentially sensitive information (sources) to critical points where information can be exfiltrated outside of the program (sinks). However, traditional taint analysis is limited in that it conflates all flow paths from a given source to a given sink, so a malicious flow path can be "hidden" by a benign flow path. Standard dynamic analysis detection methods execute the code within a sandbox, a potentially risky approach (if it escapes the sandbox) and one that may overlook deeply disguised malicious code. We looked to augment existing detection toolsets when developing the DMC tool.

Example of Benign and Malicious Flows
Example of Benign and Malicious Flows

The DMC Tool and Process

The DMC tool detects two types of malicious code: (1) exfiltration of sensitive data and (2) timebombs, logic bombs, remote-access Trojans (RATs), and other similar malicious code. We chose to build the tool on LLVM over other options to better scale to real-world programs. While our tool focuses on C/C++ codebases, it can also support other languages that compile to LLVM IR. Additionally, the tool performs all analysis at the level of LLVM intermediate representation (IR), and has some support for binaries via lifting to LLVM IR.

The tool does use taint analysis to track the flow of sensitive and auxiliary information, but we used auxiliary information (e.g., if a filename was specified by a local user or if it came from network data) to help separate benign flows from malicious flows. Even if there is a benign flow from a given source to a given sink, we still look for additional flows from the same source to the same sink in potentially malicious parts of the codebase. For each flow, we generate a witness trace and identify any restrictions on what data a potential attacker can pass to sensitive sinks (e.g., reading files only in a single folder).

The tool only uses static analysis to analyze all possible executions abstractly, so there’s no need to execute the potentially malicious code as in dynamic analysis. In principle, static analysis can detect all possible malicious flows, but it may also report false positives. Further human analysis is required to compare the tool’s output flows with the program’s intended functionality to verify if the code is actually malicious. The DMC tool produces output that concisely and precisely characterizes the potentially malicious behaviors of the codebase, so that a human analyst can quickly and accurately determine whether the behavior is benign or malicious.

Detecting Malicious Code Using Information Flow Analysis

Software and Tools

Detection of Malicious Code (DMC) Tool

DMC is a tool for detecting potentially malicious behavior in C/C++ codebases using static information-flow analysis.

Learn More

Learn More

Detection of Malicious Code: Taint Flow Analysis for Weapons Systems Software

Presentation

Lori Flynn presented these slides and handout at the 2024 Department of Defense Maintenance Symposium Program. The presentation detailed the Detection of Malicious Code (DMC) static taint flow analysis tool and discussed the tool's methods, features, and example output along with possible future work.

Learn More

Detection of Malicious Code (DMC) Tool

Software

DMC is a tool for detecting potentially malicious behavior in C/C++ codebases using static information-flow analysis.

Download

Detection of Malicious Code Using Information Flow Analysis

Presentation

Dr. Will Klieber presented this project at the CMU SEI Research Review 2024.

Learn More

Detection of Malicious Code Using Information Flow Analysis

Poster

Dr. Will Klieber presented this poster at Research Review 2024.

Download