Contributing rules
Publish rules in the open-source Semgrep Registry and share them with the Semgrep community to help others benefit from your rule-writing efforts and contribute to the field of software security. There are two ways in which you can contribute rules to the Semgrep Registry:
- For users of Semgrep AppSec Platform
- Contribute rules to the Semgrep Registry through Semgrep AppSec Platform. This workflow is recommended. See Contributing through Semgrep AppSec Platform (recommended). This workflow creates the necessary pull request for you and streamlines the whole process.
- For contributors to the repository through GitHub
- Contribute rules to the Semgrep Registry through a pull request. See the Contributing through GitHub section for detailed information.
Contributing through Semgrep AppSec Platform (recommended)
To contribute and publish rules to the Semgrep Registry through Semgrep AppSec Platform, follow these steps:
- Go to Playground.
- Click Create New Rule.
- Choose one of the following:
- Create a new rule and test code by clicking plus icon, select New rule and then click Save. Note: The test file must contain at least one true positive and one true negative test case to be approved. See the Tests section of this document for more information.
- In the Library panel, select a rule from a category in Semgrep Registry. Click Fork, modify the rule or test code, and then click Save.
- Click Share.
- Click Publish to Registry.
- Fill in the required and optional fields.
- Click Continue, and then click Create PR.
This workflow automatically creates a pull request in the GitHub Semgrep Registry. Find more about the Semgrep Registry by reading the Rule writing and Tests sections.
You can also publish rules as private rules outside of Semgrep Registry. These rules are not included in the Semgrep Registry, but they are accessible to your Semgrep organisation. See the Private rules documentation for more information.
Contributing through GitHub
Fork our repository and make a pull request. Sign our Contributor License Agreement (CLA) on GitHub before Semgrep, Inc. can accept your contributions. Make a pull request to the Semgrep Registry with two files:
- The semgrep pattern (as YAML file).
- The test file (with the file extension of the language or framework). The test file must contain at least one true positive and one true negative test case to be approved. See the Tests section of this document for more information.
Pull requests require the approval of at least one maintainer and successfully passed CI jobs.
Find more about the Semgrep Registry by reading the Rule writing and Tests sections.
Writing a rule for Semgrep Registry
The following sections document necessary fields in rule files of Semgrep Registry, provide information about rule messages, inform about test files, mention rule quality checkers, and describe additional fields required by rules in the security category.
General rule requirements
All rules in general, regardless of whether they are intended only as local rules or for Semgrep Registry, have the same initial requirements. The following table is also included in the Rule Syntax article.
All required fields must be present at the top-level of a rule, immediately under the rules
key.
Field | Type | Description |
---|---|---|
id | string | Unique, descriptive identifier, for example: no-unused-variable |
message | string | Message that includes why Semgrep matched this pattern and how to remediate it. See also Rule messages. |
severity | string | One of the following values: INFO (Low severity), WARNING (Medium severity), or ERROR (High severity). The severity key specifies how critical are the issues that a rule potentially detects. Note: Semgrep Supply Chain differs, as its rules use CVE assignments for severity. For more information, see Filters section in Semgrep Supply Chain documentation. |
languages | array | See language extensions and tags |
pattern * | string | Find code matching this expression |
patterns * | array | Logical AND of multiple patterns |
pattern-either * | array | Logical OR of multiple patterns |
pattern-regex * | string | Find code matching this PCRE2-compatible pattern in multiline mode |
Only one of the following is required: pattern
, patterns
, pattern-either
, pattern-regex
Every rule also requires a test file in the language that the rule is targeting. See Tests for more details.
Semgrep registry rule requirements
In addition to the fields mentioned above, rules submitted to Semgrep Registry have additional required fields:
Field | Description | Possible values | Example |
---|---|---|---|
metadata | All rules require | Required by all Semgrep Registry rules:
|
|
Additionally required by
| |||
technology | Nested under the metadata field. Additional information about the technology. This helps to specify rulesets in Semgrep Registry. |
|
|
category | Nested under the metadata field. If you use catagory security , include additional metadata. See Including fields required by security category. |
|
|
references | Additional information that gives more context to the user of the rule. This helps developers understand the issue and how to fix it. | No finite value. Any additional information that gives more context. |
|
- If you use category
security
, include additional metadata. See Including fields required by security category. - Cross-file (interfile) analysis requires
interfile: true
under theoptions
key in YAML rules. For more information, see Creating rules that analyze across files.
Understanding rule namespacing
The namespacing format for contributing rules in the Semgrep Registry is <language>/<framework>/<category>/$MORE
. If the rule does not belong to a particular framework, add it to the language directory, which uses the word lang
in place of the <framework>
- <language>/<lang>
.
Tests
Include a test file in the language that your rule is targeting. A test file includes the following:
- At least one test where the rule detects a finding. This is called a true positive finding.
- At least one test where the rule does not detect a finding. This is called a true negative finding.
Test file names must match the rule file name, except for the file extension. For example, if the rule is in my-rule.yaml
, the test file name must be my-rule.js
. Use any valid extension for the target language.
- In the test file, include examples that mark:
- What is expected to be a finding.
- What is not a finding.
- The test file name must match the rule file name, except for the file extension.
See the examples of the rule and test file below:
Rule file:
rules:
- id: my-rule
pattern: var $X = "...";
…
In the test file, mark an expected finding with a comment tag, and mention ruleid of your rule in the comment before the expected finding. Also, mark the code that is expected not to be a finding with a comment stating ok
and add the ruleid also. See the example below:
// ruleid: my-rule
var strdata = "hello";
// ok: my-rule
var numdata = 1;
For more information, visit Testing rules.
Rule messages
Include a rule message that provides details about the matched pattern and informs about how to mitigate any related issues. Provide the following information in a rule message:
- Description of the pattern. For example: missing parameter, dangerous flag, out-of-order function calls.
- Description of why this pattern was detected. For example: logic bug, introduces a security vulnerability, bad practice.
- An alternative that resolves the issue. For example: Use another function, validate data first, and discard the dangerous flag.
Use the YAML multiline string operator >-
when rule messages span multiple lines. This presents the best-looking rule message on the command line without having to worry about line wrapping or escaping the quote or using the backslash.
For an example of a good rule message, see: this rule for Django's mark_safe.
mark_safe()
is used to mark a string as safe for HTML output. This disables escaping and may expose the content to XSS attacks. Instead, use django.utils.html.format_html()
to build HTML for rendering.
Rule quality checker
When you contribute rules to the Semgrep Registry, our quality checkers (linters) evaluate if the rule conforms to Semgrep, Inc. standards. The semgrep-rule-lints
job runs linters on a new rule to check for mistakes, performance problems, and best practices for submitting to the Semgrep Registry. To improve your rule writing, use Semgrep itself to scan semgrep-rules.
Including fields required by security category
Rules in category security
in the Semgrep Registry require specific metadata fields that ensure consistency across the ecosystem in both Semgrep AppSec Platform and Semgrep CLI. Nest these metadata under the metadata
field.
If your rule has a category: security
, the following metadata are required:
Required metadata field | Values | Example use |
---|---|---|
cwe | A Comment Weakness Enumeration (CWE). | cwe: "CWE-502: Deserialization of Untrusted Data" |
confidence | HIGH , MEDIUM , LOW | confidence: MEDIUM |
likelihood | HIGH , MEDIUM , LOW | likelihood: MEDIUM |
impact | HIGH , MEDIUM , LOW | impact: HIGH |
subcategory | vuln , audit , secure default |
|
These fields help you to find rules in different categories such as:
- High confidence security rules for CI pipelines.
- OWASP Top 10 or CWE Top 25 rulesets.
- Technology. For example,
react
so it is easy to find Reac rulesets. - Audit rules with lower confidence are intended for code auditors.
Examples of rules with a full list of required metadata:
- High confidence JavaScript and TypeScript rule: javascript.express.security.audit.express-open-redirect.express-open-redirect
- Medium confidence Python rule: python.lang.security.dangerous-system-call.dangerous-system-call
- Low confidence C# rule: csharp.lang.security.ssrf.rest-client.ssrf
Details of each field mentioned above are provided in the subsections below with examples.
CWE
Include the appropriate Comment Weakness Enumeration (CWE). CWE can explain what vulnerability your rule is trying to find. Examples:
If you write an SQL Injection rule, use the following:
cwe:
- "CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')"
If you write an XSS rule, use the following:
cwe:
- "CWE-79: Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')"
Confidence
Indicate confidence of the rule to detect true positives. See the possible options below:
- HIGH - Security concern, with high true positives. Useful in CI/CD pipelines.
- MEDIUM - Security concern, but some false positives. Useful in CI/CD pipelines.
- LOW - Expect a fair amount of false positives, similar to audit style rules. These rules can detect many false positives.
HIGH
HIGH confidence rules can use Semgrep advanced features such as metavariable-comparison
or taint mode
, to detect true positives. See examples below:
- go.lang.security.audit.crypto.use_of_weak_rsa_key.use-of-weak-rsa-key
- javascript.express.security.audit.express-open-redirect.express-open-redirect
- javascript.jose.security.jwt-hardcode.hardcoded-jwt-secret
confidence: HIGH
MEDIUM
MEDIUM confidence rules can use Semgrep advanced features such as metavariable-comparison
or taint mode
, but with some false positives. See examples below:
- javascript.express.security.audit.express-ssrf.express-ssrf
- javascript.express.security.express-xml2json-xxe.express-xml2json-xxe
confidence: MEDIUM
LOW
Low confidence rules generally find something which appears to be dangerous while reporting a lot of false positives. See examples below:
confidence: LOW
Likelihood
Specify how likely it is that an attacker can exploit the issue that has been found. The possible values are LOW
, MEDIUM
, HIGH
.
HIGH
HIGH likelihood rules specify a very high concern that the vulnerability can be exploited. Examples:
- The use of weak encryption: go.lang.security.audit.crypto.use_of_weak_rsa_key.use-of-weak-rsa-key
- Disabled security feature in a configuration: javascript.angular.security.detect-angular-sce-disabled.detect-angular-sce-disabled
- Hardcoded secrets that use a constant value
"..."
: javascript.jose.security.jwt-hardcode.hardcoded-jwt-secret - Rules that leverage
taint mode sources
which indicate sources that can come from an attacker. Such as HTTPPOST
,GET
,PUT
, andDELETE
request values. For example: javascript.express.security.audit.express-open-redirect.express-open-redirect
likelihood: HIGH
MEDIUM
MEDIUM likelihood rules detect a vulnerability in most circumstances. Although it can be hard for an attacker to exploit them. Also, these rules can detect part of a problem, but not the whole issue. Examples:
taint mode sources
that reach ataint mode sink
but the source is only vulnerable in certain conditions for example OS Environment Variables, or loading from disk: python.aws-lambda.security.dangerous-spawn-process.dangerous-spawn-processtaint mode sources
with ataint mode sink
but is missing ataint mode sanitizer
which can introduce more false positives: javascript.express.security.express-puppeteer-injection.express-puppeteer-injection
likelihood: MEDIUM
LOW
LOW likelihood rules tend to find something dangerous, but are not evaluating whether something is truly vulnerable, for example:
taint mode sources
such as function arguments which may or may not be tainted which reach ataint mode sink
: typescript.react.security.audit.react-href-var.react-href-var- A rule which uses
search mode
to find the use of a dangerous function for example:trustAsHTML
,bypassSecurityTrust()
,eval()
, orinnerHTML
: javascript.browser.security.dom-based-xss.dom-based-xss
likelihood: LOW
Impact
Indicate how much damage can a vulnerability cause. Use LOW, MEDIUM, and HIGH.
HIGH
HIGH impact rules can detect extremely damaging vulnerabilities, such as injection vulnerabilities. Examples:
- javascript.sequelize.security.audit.sequelize-injection-express.express-sequelize-injection
- ruby.rails.security.audit.xxe.xml-external-entities-enabled.xml-external-entities-enabled
impact: HIGH
MEDIUM
MEDIUM impact rules are issues that are less likely to lead to full system compromise but still are fairly damaging. Examples:
- python.flask.security.injection.raw-html-concat.raw-html-format
- python.flask.security.injection.ssrf-requests.ssrf-requests
impact: MEDIUM
LOW
LOW impact rules are rules that leverage a security issue, but the impact is not too damaging to the application if discovered.
- go.gorilla.security.audit.session-cookie-missing-secure.session-cookie-missing-secure
- javascript.browser.security.raw-html-join.raw-html-join
impact: LOW
Subcategory
Include a subcategory to explain what is the type of the rule. See the subsections below for more details.
vuln
A vulnerability rule is something that developers certainly want to resolve. For example, an SQL Injection rule that uses taint mode. Example:
subcategory:
- vuln
audit
An audit rule is useful for code auditors. For example, an SQL rule which finds all uses of the database.exec(...)
that can be problematic. Example:
subcategory:
- audit
secure default
A secure default rule is useful for companies writing custom rules. For example, finding all usages to non-standard XML parsing libraries within the company. The rule can also bring a message that a developer can use only a company-approved library.
subcategory:
- secure default
Technology
Technology helps to define specific rulesets for languages, libraries, and frameworks that are available in Semgrep Registry, for example express
will be included in the p/express
rulepack.
technology:
- express
References
References help provide more context to a developer on what the issue is, and how to remediate the vulnerability, see examples below:
- A rule that is finding an issue in React: typescript.react.security.audit.react-href-var.react-href-var
references:
- https://reactjs.org/blog/2019/08/08/react-v16.9.0.html#deprecating-javascript-urls - A rule that is detecting an issue in Express: javascript.sequelize.security.audit.sequelize-injection-express.express-sequelize-injection
references:
- https://sequelize.org/docs/v6/core-concepts/raw-queries/#replacements
Updating existing open-source rules in Semgrep Registry
To update an existing open-source rule, follow these steps:
- Find a rule you want to update in the semgrep-rules repository.
- Submit a PR to the repository with your new update.
- Follow the same instructions and recommendations as you can find in the rest of this document. For example the security category has specific metadata requirements.
- Leave a message in the PR. Explain why are you making changes. What is the motivation for this update?
See a PR example.
There can be specific messages in the repository’s pipeline informing you about specific details of your rule. Ensure that your rule fulfills all of the necessities and requirements. However, sometimes the pipeline running in the semgrep-rules repository can have specific issues. In such a case, wait for a Semgrep reviewer's help.
Not finding what you need in this doc? Ask questions in our Community Slack group, or see Support for other ways to get help.