Recursive joins
Join mode is an extension of Semgrep that runs multiple rules at once and only returns results if certain conditions are met. This is an experimental mode that enables you to cross file boundaries, allowing you to write rules for whole codebases instead of individual files. More information is available in Join mode overview.
Recursive join mode has a recursive operator, -->, which executes a recursive query on the given condition. This recursive operator allows you to write a Semgrep rule that effectively crawls the codebase on a condition you specify, letting you build chains such as function call chains or class inheritance chains.
Understanding recursive join modeโ
In the background, join rules turn captured metavariables into database table columns. For example, a rule with $FUNCTIONNAME, $FUNCTIONCALLED, and $PARAMETER is a table similar to the following:
| $FUNCTIONNAME | $FUNCTIONCALLED | $PARAMETER | 
|---|---|---|
| getName | writeOutput | user | 
| getName | lookupUser | uid | 
| lookupUser | databaseQuery | uid | 
The join conditions then join various tables together and return a result if any rows match the criteria.
Recursive join mode conditions use recursive joins to construct a table that recursively joins with itself. For example, you can use a Semgrep rule that gets all function calls and join them recursively to approximate a callgraph.
Consider the following Python script and rule.
def function_1():
    print("hello")
    function_2()
def function_2():
    function_4()
def function_3():
    function_5()
def function_4():
    function_5()
def function_5():
    print("goodbye")
rules:
- id: python-callgraph
  message: python callgraph
  languages: [python]
  severity: INFO
  pattern: |
    def $CALLER(...):
      ...
      $CALLEE(...)
A join condition such as the following: python-callgraph.$CALLER --> python-callgraph.$CALLEE produces a table below. Notice how function_1 appears with function_4 and function_5 as callees, even though it is not directly called.
| $CALLER | $CALLEE | 
|---|---|
| function_1 | function_2 | 
| function_1 | function_4 | 
| function_1 | function_5 | 
| function_1 | |
| function_2 | function_4 | 
| function_2 | function_5 | 
| function_3 | function_5 | 
| function_4 | function_5 | 
| function_5 | 
Example ruleโ
It's important to think of a join mode rule as "asking questions about the whole project", rather than looking for a single pattern. For example, to find an SQL injection, you need to understand a few things about the project:
- Is there any user input?
- Do any functions manually build an SQL string using function input?
- Can the user input reach the function that manually builds the SQL string?
Now, you can write individual Semgrep rules that gather information about each of these questions. This example uses Vulnado for finding an SQL injection. Vulnado is a Spring application.
The first rule searches for user input into the Spring application. This rule also captures sinks that use a user-inputtable parameter as an argument.
rules:
- id: java-spring-user-input
  message: user input
  languages: [java]
  severity: INFO
  mode: taint
  pattern-sources:
  - pattern: |
      @RequestMapping(...)
      $RETURNTYPE $USERINPUTMETHOD(..., $TYPE $PARAMETER, ...) {
        ...
      }
  pattern-sinks:
  - patterns:
    - pattern: $OBJ.$SINK(...)
    - pattern: $PARAMETER
A second rule looks for all methods in the application that build an SQL string with a method parameter.
rules:
- id: method-parameter-formatted-sql
  message: method uses parameter for sql string
  languages: [java]
  severity: INFO
  patterns:
  - pattern-inside: |
      $RETURNTYPE $METHODNAME(..., $TYPE $PARAMETER, ...) {
        ...
      }
  - patterns:
    - pattern-either:
      - pattern: |
          "$SQLSTATEMENT" + $PARAMETER
      - pattern: |
          String.format("$SQLSTATEMENT", ..., $PARAMETER, ...)
    - metavariable-regex:
        metavariable: $SQLSTATEMENT
        regex: (?i)(select|delete|insert).*
Finally, the third rule is used to construct a pseudo-callgraph:
rules:
- id: java-callgraph
  languages: [java]
  severity: INFO
  message: $CALLER calls $OBJ.$CALLEE
  patterns:
  - pattern-inside: |
      $TYPE $CALLER(...) {
        ...
      }
  - pattern: $OBJ.$CALLEE(...)
The join rule, is displayed as follows:
rules:
- id: spring-sql-injection
  message: SQLi
  severity: ERROR
  mode: join
  join:
    refs:
    - rule: rule_parts/java-spring-user-input.yaml
      as: user-input
    - rule: rule_parts/method-parameter-formatted-sql.yaml
      as: formatted-sql
    - rule: rule_parts/java-callgraph.yaml
      as: callgraph
    on:
    - 'callgraph.$CALLER --> callgraph.$CALLEE'
    - 'user-input.$SINK == callgraph.$CALLER'
    - 'callgraph.$CALLEE == formatted-sql.$METHODNAME'
The on: conditions, in order, read as follows:
- Recursively generate a pseudo callgraph on $CALLER to $CALLEE.
- Match when a method with user input has a $SINK that is the $CALLER in the pseudo-callgraph.
- Match when the $CALLEE is the $METHODNAME of a method that uses a parameter to construct an SQL string.
Running this on Vulnado produces tables that look like this:
| $RETURNTYPE | $USERINPUTMETHOD | $TYPE | $PARAMETER | $OBJ | $SINK | 
|---|---|---|---|---|---|
| ... | ... | ... | ... | ... | ... | 
| LoginResponse | login | LoginRequest | input | user | token | 
| LoginResponse | login | LoginRequest | input | User | getUser | 
| ... | ... | ... | ... | ... | ... | 
| $RETURNTYPE | $METHODNAME | $TYPE | $PARAMETER | $SQLSTATEMENT | 
|---|---|---|---|---|
| ... | ... | ... | ... | ... | 
| User | fetch | String | un | select * from users where username = ' | 
| ... | ... | ... | ... | ... | 
| $CALLER | $CALLEE | 
|---|---|
| ... | ... | 
| login | getUser | 
| login | fetch | 
| getUser | fetch | 
| ... | ... | 
The join conditions select rows which meet the conditions.
- Match when a method with user input has a $SINK that is the $CALLER in the pseudo-callgraph.
| ... | user-input.$SINK | == | callgraph.$CALLER | ... | 
|---|---|---|---|---|
| ... | getUser | == | getUser | ... | 
- Match when the $CALLEE is the $METHODNAME of a method that uses a parameter to construct an SQL string.
| ... | callgraph.$CALLEE | == | formatted-sql.$METHODNAME | ... | 
|---|---|---|---|---|
| ... | fetch | == | fetch | ... | 
(semgrep) โ  join_mode_demo semgrep -f vulnado-sqli.yaml vulnado
Running 1 rules...
Running 3 rules...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโ|3/3
ran 3 rules on 11 files: 158 findings
vulnado/src/main/java/com/scalesec/vulnado/User.java
rule:spring-sql-injection: SQLi
55:      String query = "select * from users where username = '" + un + "' limit 1";
ran 0 rules on 0 files: 1 findings
Limitationsโ
Join mode only works on the metavariable contents, which means it's fundamentally operating with text strings and not code constructs. There will be some false positives if similarly-named metavariables are extracted.
Use casesโ
- Approximating callgraphs in a project
- Approximating class inheritance
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.