Skip to main content

    Recursive joins

    Join mode is an extension of Semgrep that runs multiple rules at once and only returns results if certain conditions are met. This is an experimental mode that enables you to cross file boundaries, allowing you to write rules for whole codebases instead of individual files. More information is available in Join mode overview.

    Recursive join mode has a recursive operator, -->, which executes a recursive query on the given condition. This recursive operator allows you to write a Semgrep rule that effectively crawls the codebase on a condition you specify, letting you build chains such as function call chains or class inheritance chains.

    Understanding recursive join modeโ€‹

    In the background, join rules turn captured metavariables into database table columns. For example, a rule with $FUNCTIONNAME, $FUNCTIONCALLED, and $PARAMETER is a table similar to the following:

    $FUNCTIONNAME$FUNCTIONCALLED$PARAMETER
    getNamewriteOutputuser
    getNamelookupUseruid
    lookupUserdatabaseQueryuid

    The join conditions then join various tables together and return a result if any rows match the criteria.

    Recursive join mode conditions use recursive joins to construct a table that recursively joins with itself. For example, you can use a Semgrep rule that gets all function calls and join them recursively to approximate a callgraph.

    Consider the following Python script and rule.

    def function_1():
    print("hello")
    function_2()

    def function_2():
    function_4()

    def function_3():
    function_5()

    def function_4():
    function_5()

    def function_5():
    print("goodbye")
    rules:
    - id: python-callgraph
    message: python callgraph
    languages: [python]
    severity: INFO
    pattern: |
    def $CALLER(...):
    ...
    $CALLEE(...)

    A join condition such as the following: python-callgraph.$CALLER --> python-callgraph.$CALLEE produces a table below. Notice how function_1 appears with function_4 and function_5 as callees, even though it is not directly called.

    $CALLER$CALLEE
    function_1function_2
    function_1function_4
    function_1function_5
    function_1print
    function_2function_4
    function_2function_5
    function_3function_5
    function_4function_5
    function_5print

    Example ruleโ€‹

    It's important to think of a join mode rule as "asking questions about the whole project", rather than looking for a single pattern. For example, to find an SQL injection, you need to understand a few things about the project:

    1. Is there any user input?
    2. Do any functions manually build an SQL string using function input?
    3. Can the user input reach the function that manually builds the SQL string?

    Now, you can write individual Semgrep rules that gather information about each of these questions. This example uses Vulnado for finding an SQL injection. Vulnado is a Spring application.

    The first rule searches for user input into the Spring application. This rule also captures sinks that use a user-inputtable parameter as an argument.

    rules:
    - id: java-spring-user-input
    message: user input
    languages: [java]
    severity: INFO
    mode: taint
    pattern-sources:
    - pattern: |
    @RequestMapping(...)
    $RETURNTYPE $USERINPUTMETHOD(..., $TYPE $PARAMETER, ...) {
    ...
    }
    pattern-sinks:
    - patterns:
    - pattern: $OBJ.$SINK(...)
    - pattern: $PARAMETER

    A second rule looks for all methods in the application that build an SQL string with a method parameter.

    rules:
    - id: method-parameter-formatted-sql
    message: method uses parameter for sql string
    languages: [java]
    severity: INFO
    patterns:
    - pattern-inside: |
    $RETURNTYPE $METHODNAME(..., $TYPE $PARAMETER, ...) {
    ...
    }
    - patterns:
    - pattern-either:
    - pattern: |
    "$SQLSTATEMENT" + $PARAMETER
    - pattern: |
    String.format("$SQLSTATEMENT", ..., $PARAMETER, ...)
    - metavariable-regex:
    metavariable: $SQLSTATEMENT
    regex: (?i)(select|delete|insert).*

    Finally, the third rule is used to construct a pseudo-callgraph:

    rules:
    - id: java-callgraph
    languages: [java]
    severity: INFO
    message: $CALLER calls $OBJ.$CALLEE
    patterns:
    - pattern-inside: |
    $TYPE $CALLER(...) {
    ...
    }
    - pattern: $OBJ.$CALLEE(...)

    The join rule, is displayed as follows:

    rules:
    - id: spring-sql-injection
    message: SQLi
    severity: ERROR
    mode: join
    join:
    refs:
    - rule: rule_parts/java-spring-user-input.yaml
    as: user-input
    - rule: rule_parts/method-parameter-formatted-sql.yaml
    as: formatted-sql
    - rule: rule_parts/java-callgraph.yaml
    as: callgraph
    on:
    - 'callgraph.$CALLER --> callgraph.$CALLEE'
    - 'user-input.$SINK == callgraph.$CALLER'
    - 'callgraph.$CALLEE == formatted-sql.$METHODNAME'

    The on: conditions, in order, read as follows:

    • Recursively generate a pseudo callgraph on $CALLER to $CALLEE.
    • Match when a method with user input has a $SINK that is the $CALLER in the pseudo-callgraph.
    • Match when the $CALLEE is the $METHODNAME of a method that uses a parameter to construct an SQL string.

    Running this on Vulnado produces tables that look like this:

    $RETURNTYPE$USERINPUTMETHOD$TYPE$PARAMETER$OBJ$SINK
    ..................
    LoginResponseloginLoginRequestinputusertoken
    LoginResponseloginLoginRequestinputUsergetUser
    ..................
    $RETURNTYPE$METHODNAME$TYPE$PARAMETER$SQLSTATEMENT
    ...............
    UserfetchStringunselect * from users where username = '
    ...............
    $CALLER$CALLEE
    ......
    logingetUser
    loginfetch
    getUserfetch
    ......

    The join conditions select rows which meet the conditions.

    • Match when a method with user input has a $SINK that is the $CALLER in the pseudo-callgraph.
    ...user-input.$SINK==callgraph.$CALLER...
    ...getUser==getUser...
    • Match when the $CALLEE is the $METHODNAME of a method that uses a parameter to construct an SQL string.
    ...callgraph.$CALLEE==formatted-sql.$METHODNAME...
    ...fetch==fetch...
    (semgrep) โžœ  join_mode_demo semgrep -f vulnado-sqli.yaml vulnado
    Running 1 rules...
    Running 3 rules...
    100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ|3/3
    ran 3 rules on 11 files: 158 findings
    vulnado/src/main/java/com/scalesec/vulnado/User.java
    rule:spring-sql-injection: SQLi
    55: String query = "select * from users where username = '" + un + "' limit 1";
    ran 0 rules on 0 files: 1 findings

    Limitationsโ€‹

    Join mode only works on the metavariable contents, which means it's fundamentally operating with text strings and not code constructs. There will be some false positives if similarly-named metavariables are extracted.

    Use casesโ€‹

    • Approximating callgraphs in a project
    • Approximating class inheritance

    Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.