Pattern syntax (experimental)
Patterns are the expressions Semgrep uses to match code when it scans for vulnerabilities. This article describes the new syntax for Semgrep pattern operators. See Pattern syntax for information on the existing pattern syntax.
There is often a one-to-one translation from the existing syntax to the experimental syntax. These changes are marked with . However, some changes are quite different. These changes are marked with
- These patterns are experimental and subject to change.
- You can't mix and match existing pattern syntax with the experimental syntax.
pattern
The pattern
operator looks for code matching its expression in the existing syntax. However, pattern
is no longer required when using the experimental syntax. For example, you can use ...
wherever pattern: "...``` appears. For example, you can omit
pattern` and write the following:
any:
- "badthing1"
- "badthing2"
- "badthing3"
or, for multi-line patterns
any:
- |
manylines(
badthinghere($A)
)
- |
orshort()
You don't need double quotes for a single-line pattern when omitting the pattern
key, but note that this can cause YAML parsing issues.
As an example, the following YAML parses:
any:
- "def foo(): ..."
This, however, causes problems since :
is also used to denote a YAML dictionary:
any:
- def foo(): ...
any
Replaces pattern-either. Matches any of the patterns specified.
any:
- <pat1>
- <pat2>
...
- <patn>
all
Replaces patterns. Matches all of the patterns specified.
all:
- <pat1>
- <pat2>
...
- <patn>
inside
Replaces pattern-inside. Match any of the sub-patterns inside of the primary pattern.
inside:
any:
- <pat1>
- <pat2>
Alternatively:
any:
- inside: <pat1>
- inside: <pat2>
not
Replaces pattern-not. Accepts any pattern and does not match on those patterns.
not:
any:
- <pat1>
- <pat2>
Alternatively:
all:
- not: <pat1>
- not: <pat2>
regex
Replaces pattern-regex Matches based on the regex provided.
regex: "(.*)"
Metavariables
Metavariables are an abstraction to match code when you don't know the value or contents beforehand. They're similar to capture groups in regular expressions and can track values across a specific code scope. This includes variables, functions, arguments, classes, object methods, imports, exceptions, and more.
Metavariables begin with a $
and can only contain uppercase characters, _
, or digits. Names like $x
or $some_value
are invalid. Examples of valid metavariables include $X
, $WIDGET
, or $USERS_2
.
where
Unlike Semgrep's existing pattern syntax, the following operators no longer occur under pattern
or all
:
metavariable-pattern
metavariable-regex
metavariable-comparison
metavariable-analysis
focus-metavariable
These operators must occur within a where
clause.
A where
clause is required in a pattern where you're using metavariable operators. It indicates that Semgrep should match based on the pattern if all the conditions are true.
As an example, take a look at the following example:
all:
- inside: |
def $FUNC(...):
...
- |
eval($X)
where:
- <condition>
Because the where
clause is on the same indentation level as all
, Semgrep understands that everything under where
must be paired with the entire all
pattern. As such, the results of the ranges matched by the all
pattern are modified by the where
pattern, and the output includes some final set of ranges that are matched.
metavariable
Replaces:
This operator looks inside the metavariable for a match.
...
where:
- metavariable: $A
regex: "(.*)
- metavariable: $B
patterns: |
- "foo($C)"
- metavariable: $D
analyzer: entropy
comparison
Replaces metavariable-comparison. Compares metavariables against a basic Python comparison expression.
...
where:
- comparison: $A == $B
focus
Replaces focus-metavariable. Puts focus on the code region matched by a single metavariable or a list of metavariables.
...
where:
- focus: $A
Syntax search mode
New syntax search mode rules must be nested underneath a top-level match
key. For example:
rules:
- id: find-bad-stuff
severity: ERROR
languages: [python]
message: |
Don't put bad stuff!
match:
any:
- |
eval(input())
- all:
- inside: |
def $FUNC(..., $X, ...):
...
- |
eval($X)
Taint mode
The new syntax supports taint mode, and such roles no longer require mode: taint
in the rule. Instead, everything must be nested under a top-level taint
key.
rules:
- id: find-bad-stuff
severity: ERROR
languages: [python]
message: |
Don't put bad stuff!
taint:
sources:
- input()
sinks:
- eval(...)
propagators:
- pattern: |
$X = $Y
from: $Y
to: $X
sanitizers:
- magiccleanfunction(...)
Taint mode key names
The key names for the new syntax taint rules are as follows:
pattern-sources
--> sourcespattern-sinks
--> sinkspattern-propagators
--> propagatorspattern-sanitizers
--> sanitizers
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.