Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database design #5

Closed
andresriancho opened this issue Mar 27, 2015 · 25 comments
Closed

Database design #5

andresriancho opened this issue Mar 27, 2015 · 25 comments

Comments

@andresriancho
Copy link
Contributor

JSON backend

All the data should be stored in individual JSON files, one for each vulnerability. These files should be stored in db/ and the name should be <vuln-id>-<vulnerability-name>.json , where <vuln-id> is a unique ID (integer) that we'll use to reference/find the vulnerabilities and the <vulnerability-name> is a human readable name to make it easier for us to find a vulnerability in the output of an "ls" command. The vulnerability name should be dash separated. Examples:

  • 1-cross-site-scripting.json
  • 2-sql-injection.json
  • 3-missing-hsts-header.json

JSON files should look similar to this example:

{
  "id": 12345,
  "title": "Cross-Site Scripting",
  "description": "A very long description for Cross-Site Scripting",
  "severity": "medium",
  "wasc": ["0003"],
  "tags": ["xss", "client side"],
  "cwe": ["0003", "0007"],
  "owasp_top_10": {
        "2010": [1],
        "2013": [2]
  },
  "fix": {
      "guidance": "A very long text explaining how to fix XSS vulnerabilities",
      "effort": 50
    },
  "references": [
      {"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
      {"url": "http://asp.net/xss", "title": "How to fix XSS vulns in ASP.NET"},
      {"url": "http://owasp.org/xss", "title": "OWASP desc for XSS"}
    ]
  }

Notes about the JSON file:

  • "id" holds the vulnerability unique ID. The contents of the "id" field must match filename
  • The "OWASP" fields points to the OWASP Top10 category. We have different versions of this (2010 and 2013) to match the different releases of the OWASP Top10 project
  • "fix-effort" gives the user an idea of how much time it will take to solve this vulnerability, it's stored in minutes
  • Since a long blob of unformatted text is hard to read for any user, the description and solution text MUST use markdown for formatting. There are various python libraries which translate markdown to HTML, which will allow us to show the text in the GUI and or Web UI.

Long strings in JSON

JSON has an awful limitation: "No multi-line strings". In our case this is very limiting since we don't want to have awfully long lines in these sections:

  "description": "A very long description for Cross-Site Scripting",
  "solution": "A very long text explaining how to fix XSS vulnerabilities",

So, the parser should check if the description and solution fields are strings or arrays. If they are arrays, the contents of the array should be joined using an empty space. For example:

  "description": ["A very long description for Cross-Site Scripting",
                         " which has more than one line"]

Will be parsed as A very long description for Cross-Site Scripting which has more than one line and then if the user wants to enter new lines, he needs to do so explicitly:

  "description": ["Line1\n",
                         "Line2\n"]

Will be parsed as:

Line1
Line2

Translations

One of the cool things about this architecture is that it will allow us to easily add translations. Just adding a new set of JSON files (keeping the vulnerability IDs) would work.

Implementation details:

  • The English version will be in db/
  • Unix's environment variable LANG is used for setting the language
  • If there is no translation for the current LANG then the English version is used
  • Translations will be in db/ru/ , db/es/ , etc.
  • The translation json files will override the parts of the main JSON file which is in English, for example, the main file contains:
  "title": "Cross-Site Scripting",

And then the Russian translation file (for the same vulnerability id) says:

  "title": "Межсайтовый скриптинг",

And if the LANG environment variable is set to Russian then the python/ruby/go wrapper should return Cross-Site Scripting in Russian as the title for this vulnerability. Any field from the main JSON file can be overridden (for example you can override a link to wikipedia to point to the XSS description in Russian).

Content

We'll use the content already created for the Arachni scanner, gently contributed to vulndb/data by Tasos.

Unittesting

  • Unittests must be in the tests/ directory
  • We should write a json-schema for our JSON, and then validate the JSON after each push with a unittest. See https://python-jsonschema.readthedocs.org/en/latest/ for more information about json-schemas
  • Assert that the severity field is one of high, medium, low, informational
  • Assert that these fields are present and contain at least 30 chars:
    • description
    • solution
  • Assert that these fields are present:
    • id
    • title
    • fix_effort
    • severity
  • Assert that all references have url and title fields, and that they are not empty
  • For each value in WASC/OWASP, make sure that we can generate a link like http://owasp.org/foo/<id>, the URL is valid, site is online, not 404
  • For each URL field in the schema, we need to check that:
    • It's a valid URL
    • The site is online, and no 404 is returned
  • fix_effort value is in the right format
  • The markdown in both description and solution are well formed
  • The ID in the file name is the same as the one inside the JSON file
  • The lines are not longer than 90 columns

References

andresriancho/w3af#53

@Zapotek
Copy link

Zapotek commented Mar 27, 2015

I think it'd be a good idea to add a severity in the format, Arachni currently uses high, medium, low, informational.

@Zapotek
Copy link

Zapotek commented Mar 27, 2015

Also, for the references, it may be better if desc is changed to title or if we have a mandatory title and an optional description.

In addition, caps in the keys seem weird, it'd be more consistent if we use lowercases for everything and underscores only instead of mixing - and _.

@andresriancho
Copy link
Contributor Author

Added severity to the JSON format by updating the issue

@andresriancho
Copy link
Contributor Author

Changed all to lower case with underscores

@andresriancho
Copy link
Contributor Author

Changed desc to title, I don't believe we need desc+title for references, IMHO with title it's enough, but if you guys believe we should have both, we can add "desc"

@Zapotek
Copy link

Zapotek commented Mar 27, 2015

A few more suggestions, maybe rename solution to fix so that it'll match fix_effort (or vice versa) or better yet have:

{
    "fix": {
      "guidance": "Blah blah",
      "effort": 50
    }
}

So that we can keep a coherent relationship between the fields.
That would also allow us to get platform specific in the future by extending the schema.

@captn3m0
Copy link

Why not use yaml? Yaml has multiline support, and you could probably even use yaml-frontmatter format as well.

For eg, at backdoor, we store the challenge information in yaml as follows:

---
tags:
    - binary
    - medium
creator: vampire
score: '70'
title: 'Hidden Flag - Medium'
flag: This is the flag
---
n00b became depressed when 'Pro' found the flag in his binary in a matter of seconds.
This time he hid the flag a little more securely.
See if you can still find it: [file](http://hack.bckdr.in/HIDE-MEDIUM/hide_medium)

The lower portion is in markdown, and any such files render with the metadata in github perfectly.

@andresriancho
Copy link
Contributor Author

@Zapotek Agreed on the fix stuff, added to spec in issue text

@andresriancho
Copy link
Contributor Author

@captn3m0 I like yaml, i'm 100% sure it can be used to store our database and solves the multi-line issue that json has. Anything else we need to know about yaml? Any other extra feature we might take into account to consider migrating to that format?

@Zapotek
Copy link

Zapotek commented Mar 27, 2015

Well, some YAML parsers aren't really safe (looking at Ruby).

@Zapotek
Copy link

Zapotek commented Mar 27, 2015

@andresriancho If references only need a URL and a title we can use the URL as the key and the title as the value like in Arachni. And better make the title mandatory I think.

@andresriancho
Copy link
Contributor Author

Agreed on the mandatory title (changed) but don't agree on the URL as key, this:

  "references": [
      {"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
    ]

is more readable for humans than:

  "references": [
      {"http://foo.com/xss": "First reference to XSS vulnerability"},
    ]

And we want humans editing these JSON files easily, quickly, without second-doubting what they should enter in each section/field

@andresriancho
Copy link
Contributor Author

PS: Also added the tags which were present in the arachni json files

@captn3m0
Copy link

  • Most yaml parsers run in safe mode by default these days.
  • All json is automatically valid yaml as well, which means you could just rename files and they will be parsed as yaml fine even now

Other than this (and the high readability), I cant think of any other benefits.

@andresriancho
Copy link
Contributor Author

To accommodate fix guidance for different programming languages:

  "fix": {
      "guidance": "A very long text explaining how to fix XSS vulnerabilities",
      "effort": 50
    },

Or

  "fix": {
      "guidance": {"general": "A very long text explaining how to fix XSS vulnerabilities in general",
                          "django": "Change the view / template code using django stuff"}
      "effort": 50
    },

@m0sth8
Copy link
Contributor

m0sth8 commented Mar 27, 2015

Maybe the good way is to put guidance for different languages in different files, like you already described for languages support?

@andresriancho
Copy link
Contributor Author

I believe that would just complicate things and make it difficult to find the right file to edit 👎 for me.

@m0sth8
Copy link
Contributor

m0sth8 commented Mar 27, 2015

Usually, languages have json support out of the box ( "encoding/json" in Go or "json" in Python) and don't have yaml.
Also, it's easy to write bad yaml than json, but it's not so important in our case.

@andresriancho
Copy link
Contributor Author

👎 for yaml then

@m0sth8
Copy link
Contributor

m0sth8 commented Mar 27, 2015

I want to specify:

  1. Is description required and must contain at least 30 chars?
  2. What is a minimum and maximum length for title and tags? (I think about 2 and 255)

@andresriancho
Copy link
Contributor Author

Is description required and must contain at least 30 chars?

Yes, I set it to an arbitrary 30 chars, but if you see it's too low set it to more.

What is a minimum and maximum length for title and tags? (I think about 2 and 255)

Title feel like 4-255
Tags feel like 2-255

@m0sth8
Copy link
Contributor

m0sth8 commented Mar 27, 2015

If we gonna use

  "fix": {
      "guidance": {"general": "A very long text explaining how to fix XSS vulnerabilities in general",
                          "django": "Change the view / template code using django stuff"}
      "effort": 50
    },

than effort also depends on different programming languages/cms/frameworks

@andresriancho
Copy link
Contributor Author

Well, the effort is the aprox number of minutes required to fix the issue, I doubt that fixing a vuln in django takes 1 minute and then it will take 38 in php. We could either:

  • Associate it with the general/django/etc. and most likely have the same aprox effort for all
  • Make it independent and accept the mistake we might have in some cases

I'm +0 on this, you take the decision and let me know how it goes.

@m0sth8
Copy link
Contributor

m0sth8 commented Mar 27, 2015

What do you think if we add index file for search purpose?

{
"tags": {
    "tag_name1": [1, 2], // array of vulnerabilities ids for the tag
}
// ...
}

The file might be generated automatically.

@andresriancho
Copy link
Contributor Author

Might be useful, please create a different ticket for it

andresriancho added a commit that referenced this issue Apr 1, 2015
andresriancho added a commit that referenced this issue Apr 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants