Pattern syntax
Getting started with rule writing? Try the Semgrep Tutorial ๐
This document describes Semgrepโs pattern syntax. You can also see pattern examples by language. In the command line, patterns are specified with the flag --pattern
(or -e
). Multiple
coordinating patterns may be specified in a configuration file. See
rule syntax for more information.
Pattern matchingโ
Pattern matching searches code for a given pattern. For example, the
expression pattern 1 + func(42)
can match a full expression or be
part of a subexpression:
foo(1 + func(42)) + bar()
In the same way, the statement pattern return 42
can match a top
statement in a function or any nested statement:
def foo(x):
if x > 1:
if x > 2:
return 42
return 42
Ellipsis operatorโ
The ...
ellipsis operator abstracts away a sequence of zero or more
items such as arguments, statements, parameters, fields, characters.
The ...
ellipsis can also match any single item that is not part of
a sequence when the context allows it.
See the use cases in the subsections below.
Function callsโ
Use the ellipsis operator to search for function calls or
function calls with specific arguments. For example, the pattern insecure_function(...)
finds calls regardless of its arguments.
insecure_function("MALICIOUS_STRING", arg1, arg2)
Functions and classes can be referenced by their fully qualified name, e.g.,
django.utils.safestring.mark_safe(...)
ormark_safe(...)
System.out.println(...)
orprintln(...)
You can also search for calls with arguments after a match. The pattern func(1, ...)
will match both:
func(1, "extra stuff", False)
func(1) # Matches no arguments as well
Or find calls with arguments before a match with func(..., 1)
:
func("extra stuff", False, 1)
func(1) # Matches no arguments as well
The pattern requests.get(..., verify=False, ...)
finds calls where an argument appears anywhere:
requests.get(verify=False, url=URL)
requests.get(URL, verify=False, timeout=3)
requests.get(URL, verify=False)
Match the keyword argument value with the pattern $FUNC(..., $KEY=$VALUE, ...)
.
Method callsโ
The ellipsis operator can also be used to search for method calls.
For example, the pattern $OBJECT.extractall(...)
matches:
tarball.extractall('/path/to/directory') # Oops, potential arbitrary file overwrite
You can also use the ellipsis in chains of method calls. For example,
the pattern $O.foo(). ... .bar()
will match:
obj = MakeObject()
obj.foo().other_method(1,2).again(3,4).bar()
Function definitionsโ
The ellipsis operator can be used in function parameter lists or in the function body. To find function definitions with mutable default arguments:
pattern: |
def $FUNC(..., $ARG={}, ...):
...
def parse_data(parser, data={}): # Oops, mutable default arguments
pass
The YAML |
operator allows for multiline strings.
The ellipsis operator can match the function name.
Match any function definition:
Regular functions, methods, and also anonymous functions (such as lambdas).
To match named or anonymous functions use an ellipsis ...
in place of the name of the function.
For example, in JavaScript the pattern function ...($X) { ... }
matches
any function with one parameter:
function foo(a) {
return a;
}
var bar = function (a) {
return a;
};
Class definitionsโ
The ellipsis operator can be used in class definitions. To find classes that inherit from a certain parent:
pattern: |
class $CLASS(InsecureBaseClass):
...
class DataRetriever(InsecureBaseClass):
def __init__(self):
pass
The YAML |
operator allows for multiline strings.
Ellipsis operator scopeโ
The ...
ellipsis operator matches everything in its current scope. The current scope of this operator is defined by the patterns that precede ...
in a rule. See the following example:
Semgrep matches the first occurrence of bar
and baz
in the test code as these objects fall under the scope of foo
and ...
. The ellipsis operator does not match the second occurrence of bar
and baz
as they are not inside of the function definition, therefore these objects in their second occurrence are not inside the scope of the ellipsis operator.
Stringsโ
The ellipsis operator can be used to search for strings containing any data. The pattern crypto.set_secret_key("...")
matches:
crypto.set_secret_key("HARDCODED SECRET")
This also works with constant propagation.
In languages where regular expressions use a special syntax
(for example JavaScript), the pattern /.../
will match
any regular expression construct:
re1 = /foo|bar/;
re2 = /a.*b/;
Binary operationsโ
The ellipsis operator can match any number of arguments to binary operations. The pattern $X = 1 + 2 + ...
matches:
foo = 1 + 2 + 3 + 4
Containersโ
The ellipsis operator can match inside container data structures like lists, arrays, and key-value stores.
The pattern user_list = [..., 10]
matches:
user_list = [8, 9, 10]
The pattern user_dict = {...}
matches:
user_dict = {'username': 'password'}
The pattern user_dict = {..., $KEY: $VALUE, ...}
matches the following and allows for further metavariable queries:
user_dict = {'username': 'password', 'address': 'zipcode'}
You can also match just a key-value pair in
a container, for example in JSON the pattern "foo": $X
matches
just a single line in:
{ "bar": True,
"name": "self",
"foo": 42
}
Conditionals and loopsโ
The ellipsis operator can be used inside conditionals or loops. The pattern:
pattern: |
if $CONDITION:
...
The YAML |
operator allows for multiline strings.
matches:
if can_make_request:
check_status()
make_request()
return
A metavariable can match a conditional or loop body if the body statement information is re-used later. The pattern:
pattern: |
if $CONDITION:
$BODY
matches:
if can_make_request:
single_request_statement()
Half or partial statements can't be matches; both of the examples above must specify the contents of the conditionโs body (e.g., $BODY
or ...
), otherwise they are not valid patterns.
Matching single items with an ellipsisโ
Ellipsis ...
is generally used to match sequences of similar elements.
However, you can also match single item using ellipsis ...
operator.
The following pattern is valid in languages with a C-like
syntax even though ...
matches a single boolean value rather
than a sequence:
if (...)
return 42;
Another example where a single expression is matched by an ellipsis is the right-hand side of assignments:
foo = ...;
However, matching a sequence of items remains the default meaning of an
ellipsis. For example, the pattern bar(...)
matches bar(a)
,
but also bar(a, b)
and bar()
. To force a match on a single item,
use a metavariable as in bar($X)
.
Metavariablesโ
Metavariables are an abstraction to match code when you donโt know the value or contents ahead of time, similar to capture groups in regular expressions.
Metavariables can be used to track values across a specific code scope. This includes variables, functions, arguments, classes, object methods, imports, exceptions, and more.
Metavariables look like $X
, $WIDGET
, or $USERS_2
. They begin with a $
and can only
contain uppercase characters, _
, or digits. Names like $x
or $some_value
are invalid.
Expression metavariablesโ
The pattern $X + $Y
matches the following code examples:
foo() + bar()
current + total
Import metavariablesโ
Metavariables can also be used to match imports. For example, import $X
matches:
import random
Reoccuring metavariablesโ
Re-using metavariables shows their true power. Detect useless assignments:
pattern: |
$X = $Y
$X = $Z
Useless assignment detected:
initial_value = 10 # Oops, useless assignment
initial_value = get_initial_value()
The YAML |
operator allows for multiline strings.
Literal Metavariablesโ
You can use "$X"
to match any string literal. This is similar
to using "..."
, but the content of the string is stored in the
metavariable $X
, which can then be used in a message
or in a metavariable-regex
.
You can also use /$X/
and :$X
to respectively match
any regular expressions or atoms (in languages that support
those constructs, e.g., Ruby).
Typed metavariablesโ
Syntaxโ
Typed metavariables only match a metavariable if itโs declared as a specific type.
Java:โ
For example, to look for calls to the log
method on Logger
objects.
A simple pattern for this purpose could use a metavariable for the Logger object.
pattern: $LOGGER.log(...)
But if we are concerned about finding calls to the Math.log()
method as well, we can use a typed metavariable to put a type constraint on the $LOGGER
metavariable.
pattern: (java.util.logging.Logger $LOGGER).log(...)
Alternatively, if we want to capture more logger types, for example custom logger types, we could instead add a constraint to the type of the argument in this methodcall instead.
pattern: $LOGGER.log(java.util.logging.LogRecord $RECORD)
C:โ
In this example in C, we want to capture all cases where something is compared to a char array. We start with a simple pattern that looks for comparison between two variables.
pattern: $X == $Y
We can then put a type constraint on one of the metavariables used in this pattern by turning it into a typed metavariable.
pattern: $X == (char *$Y)
int main() {
char *a = "Hello";
int b = 1;
// Matched
if (a == "world") {
return 1;
}
// Not matched
if (b == 2) {
return -1;
}
return 0;
}
Go:โ
The syntax for a typed metavariable in Go looks different from the syntax for Java.
In this Go example we look for calls to the Open
function, but only on an object of the zip.Reader
type.
pattern: ($READER : *zip.Reader).Open($INPUT)
func read_file() {
reader, _ := zip.NewReader(readerat, 18276)
// Matched
reader.Open("data")
dir := http.Dir("/")
// Not matched
f, err := dir.Open(c.Param("file"))
}
For Go, Semgrep currently does not recognize the type of all variables that are declared on the same line. That is, the following will not take both a
and b
as int
s: var a, b = 1, 2
TypeScript:โ
In this example, we want to look for uses of the DomSanitizer function.
pattern: ($X: DomSanitizer).sanitize(...)
constructor(
private _activatedRoute: ActivatedRoute,
private sanitizer: DomSanitizer,
) { }
ngOnInit() {
// Not matched
this.sanitizer.bypassSecurityTrustHtml(DOMPurify.sanitize(this._activatedRoute.snapshot.queryParams['q']))
// Matched
this.sanitizer.bypassSecurityTrustHtml(this.sanitizer.sanitize(this._activatedRoute.snapshot.queryParams['q']))
}
Using typed metavariablesโ
Type inference applies to the entire file! One common way to use typed metavariables is to check for a function called on a specific type of object. For example, let's say you're looking for calls to a potentially unsafe logger in a class like this:
class Test {
static Logger logger;
public static void run_test(String input, int num) {
logger.log("Running a test with " + input);
test(input, Math.log(num));
}
}
If you searched for $X.log(...)
, you can also match Math.log(num)
. Instead, you can search for (Logger $X).log(...)
which gives you the call to logger
. See the rule logger_search.
Since matching happens within a single file, this is only guaranteed to work for local variables and arguments. Additionally, Semgrep currently understands types on a shallow level. For example, if you have int[] A
, it will not recognize A[0]
as an integer. If you have a class with fields, you will not be able to use typechecking on field accesses, and it will not recognize the classโs field as the expected type. Literal types are understood to a limited extent. Expanded type support is under active development.
Ellipsis metavariablesโ
You can combine ellipses and metavariables to match a sequence
of arguments and store the matched sequence in a metavariable.
For example the pattern foo($...ARGS, 3, $...ARGS)
will
match:
foo(1,2,3,1,2)
When referencing an ellipsis metavariable in a rule message or metavariable-pattern, include the ellipsis:
- message: Call to foo($...ARGS)
Displaying matched metavariables in rule messagesโ
Display values of matched metavariables in rule messages. Add a metavariable to the rule message (for example Found $X
) and Semgrep replaces it with the value of the detected metavariable.
To display matched metavariable in a rule message, add the same metavariable as you are searching for in your rule to the rule message.
- Find the metavariable used in the Semgrep rule. See the following example of a part Semgrep rule (formula):
This formula uses
- pattern: $MODEL.set_password(โฆ)
$MODEL
as a metavariable. - Insert the metavariable to rule message:
- message: Setting a password on $MODEL
- Use the formula displayed above against the following code:
user.set_password(new_password)
The resulting message is:
Setting a password on user
Run the following example in Semgrep Playground to see the message (click Open in Editor, and then Run, unroll the 1 Match to see the message):
If you're using Semgrep's advanced dataflow features, see documentation of experimental feature Displaying propagated value of metavariable.
Equivalencesโ
Semgrep automatically searches for code that is semantically equivalent.
Importsโ
Equivalent imports using aliasing or submodules are matched.
The pattern subprocess.Popen(...)
matches:
import subprocess.Popen as sub_popen
sub_popen('ls')
The pattern foo.bar.baz.qux(...)
matches:
from foo.bar import baz
baz.qux()