Cross-file analysis examples
This document provides an overview of Semgrep cross-file (interfile) analysis features through specific examples, such as its use in type inferences, class inheritance, constant propagation, and taint analysis. Several examples provide a comparison between the results of Semgrep Pro Engine and Semgrep OSS Engine.
Tips and tricks for an interactive experience
The following resources can help you to test the code in the sections below. As you work through the examples in this document, try the following:
- Ensure that the Cross-file analysis toggle is enabled on the Playground page.
- Rules you use in Semgrep Pro Engine require
interfile: true
key included in theoptions
key. See the following example.
- Rules you use in Semgrep Pro Engine require
- The Semgrep cross-file analysis testing repository
- Clone the repository:
git clone https://github.com/semgrep/semgrep-pro-tests
- Follow the instructions in the sections of this document. Generally:
- To run Semgrep Pro Engine with cross-file (interfile) analysis, run:
semgrep --pro --config=pro.yaml .
- To run Semgrep Pro Engine with cross-function (interprocedural) analysis, run:
semgrep --pro-intrafile --config=pro.yaml .
- To run Semgrep Pro Engine with cross-file (interfile) analysis, run:
- Clone the repository:
Taint tracking
Semgrep OSS allows you to search for the flow of any potentially exploitable input into an important sink using taint mode. For more information, see the taint mode documentation.
In the examples below, see a comparison of Semgrep OSS and Semgrep Pro Engine while searching for dangerous calls using data obtained get_user_input
call. The rule does this by specifying the source of taint as get_user_input(...)
and the sink as dangerous(...);
.
Java
Semgrep matches dangerous(“Select * from “ + user_input)
, because user_input
is obtained by calling get_user_input
. However, it does not match the similar call using still_user_input
, because its analysis does not cross function boundaries to know that still_user_input
is a wrapper function for user_input
.
Semgrep Pro Engine matches both dangerous calls because it does cross function boundaries. In fact, with Semgrep Pro Engine, the taint rule can track calls to get_user_input
over multiple jumps in multiple files.
Ensure that the Pro Engine beta toggle is enabled in the following link to an example of dangerous taint rule. To run Semgrep Pro Engine in the cloned Semgrep Pro Engine testing repository. Go to docs/taint_tracking/java
and run the following command:
semgrep --config pro.yaml . --pro
JavaScript and TypeScript
Here, Semgrep OSS matches dangerous(“Select * from “ + user_input)
, because user_input
is obtained by calling get_user_input
. However, Semgrep OSS does not match the similar call using still_user_input
, because its analysis does not cross function boundaries to know that still_user_input
is a wrapper function for user_input
.
Semgrep Pro matches both dangerous calls because it does cross function boundaries. In fact, with Semgrep Pro, the taint rule can track calls to get_user_input
over multiple jumps in multiple files.
Enable the Semgrep Pro Engine beta toggle in the following link to an example of dangerous taint. To run Semgrep Pro Engine in the cloned Semgrep Pro Engine testing repository. Go to docs/taint_tracking/javascript
and run the following command:
semgrep --config pro.yaml . --pro
ES6 and CommonJS
The JavaScript and TypeScript ecosystems contain various ways for importing and exporting code, Semgrep Pro Engine can track dataflow through ES6 imports or exports and some CommonJS export paths (See Known limitations of Semgrep Pro Engine.
ES6
Semgrep Pro Engine can track data through the definition of exports for es6:
export function readUser() {
return get_user_input("example")
}
Semgrep Pro Engine can follow the dataflow when it is imported into another location:
import { readUser } from "./es6/es6";
readUser()
CommonJS
Semgrep Pro Engine can track data through the definition of exports for CommonJS when the function is defined inline:
module.exports = function get_user() {
return get_user_input("example")
}
Semgrep is able to follow the dataflow when it is required in another location:
const readUser = require("./commonjs/common")
readUser()
To run Semgrep Pro in the cloned Semgrep Pro Engine testing repository. Go to docs/taint_tracking/imports
and run the following command:
semgrep --config pro.yaml . --pro
Type inference and class inheritance
Class inheritance
This section compares the possible findings of a scan across multiple files using Semgrep OSS and Semgrep Pro. The file app.java
includes two check functions that throw exceptions. This example looks for methods that throw a particular exception, ExampleException
.
When using this rule, Semgrep OSS matches code that throws ExampleException
but not BadRequest
. Check other files in the docs/class_inheritance
directory. In the context of all files, you can find that this match does not capture the whole picture. The BadRequest
extends ExampleException
:
File example_exception.java
:
package example;
public class ExampleException extends Exception {
public ExampleException(String exception) {
super(exception);
}
}
File bad_request.java
:
package example;
class BadRequest extends ExampleException {
public BadRequest(String exception) {
super(exception);
}
}
Where ExampleException
is thrown, we also want to find BadRequest
, because BadRequest
is a child of ExampleException
. Unlike Semgrep OSS, Semgrep Pro Engine can find BadRequest
. Since Semgrep Pro Engine uses information from all the files in the directory it scans, it detects BadRequest
and finds both thrown exceptions.
If you are following in the cloned Semgrep Pro Engine testing repository, in the docs/class_inheritance
directory, try the following commands to test the difference:
- Run Semgrep OSS:
semgrep --config pro.yaml .
- Run Semgrep Pro Engine:
semgrep --config pro.yaml . --pro
Using class inheritance with typed metavariables
Semgrep Pro Engine uses cross-file (interfile) class inheritance information when matching typed metavariables. Continuing the example from the previous section, see the following example file, which has defined some exceptions and includes their logging:
The rule searches for any variable of type ExampleException
being logged. Semgrep is not able to find instances of BadRequest
being logged, unlike Semgrep Pro Engine. Allowing typed metavariables to access information from the entire program enables users to query any variable for its type and use that information in conjunction with the rest of the code resulting in more accurate findings.
For a more realistic example where typed metavariables are used, see the following rule written by the Semgrep community to find code vulnerable to the log4j vulnerability.
Run Semgrep Pro Engine in the cloned Semgrep Pro Engine testing repository. Go to docs/class_inheritance_with_typed_metavariables
and run the following command:
semgrep --config pro.yaml . --pro
Constant propagation
Finding dangerous calls
Constant propagation provides a syntax for eliminating false positives in Semgrep rules. Even if a variable is set to a constant before being used in a function call several lines below, Semgrep knows that it must have that value and matches the function call. For example, this rule looks for non-constant values passed to the dangerous
function:
Java
Semgrep OSS matches the first and second calls as it cannot find a constant value for either user_input
or EMPLOYEE_TABLE_NAME
.
Now consider an example a bit more complicated to illustrate what Semgrep Pro Engine can do. If the EMPLOYEE_TABLE_NAME
is imported from a global constants file with the following content:
Global constants file:
package com.main;
public final class Constants {
public static final double PI = 3.14159;
public static final double PLANCK_CONSTANT = 6.62606896e-34;
public static final String EMPLOYEE_TABLE_NAME = "Employees";
}
Semgrep Pro Engine matches the first call without any change to the rule.
Run Semgrep Pro Engine in the cloned Semgrep Pro Engine testing repository. Go to docs/constant_propagation_dangerous_calls
and run the following command:
semgrep --config pro.yaml . --pro
JavaScript and TypeScript
Semgrep matches the first and second calls because Semgrep cannot find a constant value for either user_input
or EMPLOYEE_TABLE_NAME
.
Now consider an example a bit more complicated to illustrate what Semgrep Pro Engine can do. If the EMPLOYEE_TABLE_NAME
is imported from a global constants file with the following content:
Global constants file:
export const PI = 3.14159;
export const PLANCK_CONSTANT = 6.62606896e-34;
export const EMPLOYEE_TABLE_NAME = "Employees";
Semgrep Pro Engine matches the first call without any change to the rule.
Run Semgrep Pro Engine in the cloned Semgrep Pro Engine testing repository. Go to docs/constant_propagation_dangerous_calls
and run the following command:
semgrep --config pro.yaml . --pro
Propagating values
In the previous example, we only cared whether the string was constant or not, so we used ”...”
, but constant propagation also propagates the constant value. To illustrate the use of Semgrep Pro Engine with constant propagation, the rule from the previous section is changed to search for calls to dangerous("Employees");
.
Java
With Semgrep Pro Engine, this rule matches the last three calls to dangerous
, since these calls are selected from the Employees
table, though each one obtains the table name differently:
Run Semgrep Pro Engine in the cloned Semgrep Pro Engine testing repository. Go to docs/constant_propagation_propagating_values
and run the following command:
semgrep --config pro.yaml . --pro
JavaScript and TypeScript
With Semgrep Pro Engine, this rule matches the last three calls to dangerous
, since these calls are selected from the Employees
table, though each one obtains the table name differently:
Run Semgrep Pro Engine in the cloned Semgrep Pro Engine testing repository. Go to docs/constant_propagation_propagating_values
and run the following command:
semgrep --config pro.yaml . --pro
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.