Database design #5

andresriancho · 2015-03-27T11:40:50Z

JSON backend

All the data should be stored in individual JSON files, one for each vulnerability. These files should be stored in db/ and the name should be <vuln-id>-<vulnerability-name>.json , where <vuln-id> is a unique ID (integer) that we'll use to reference/find the vulnerabilities and the <vulnerability-name> is a human readable name to make it easier for us to find a vulnerability in the output of an "ls" command. The vulnerability name should be dash separated. Examples:

1-cross-site-scripting.json
2-sql-injection.json
3-missing-hsts-header.json

JSON files should look similar to this example:

{
  "id": 12345,
  "title": "Cross-Site Scripting",
  "description": "A very long description for Cross-Site Scripting",
  "severity": "medium",
  "wasc": ["0003"],
  "tags": ["xss", "client side"],
  "cwe": ["0003", "0007"],
  "owasp_top_10": {
        "2010": [1],
        "2013": [2]
  },
  "fix": {
      "guidance": "A very long text explaining how to fix XSS vulnerabilities",
      "effort": 50
    },
  "references": [
      {"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
      {"url": "http://asp.net/xss", "title": "How to fix XSS vulns in ASP.NET"},
      {"url": "http://owasp.org/xss", "title": "OWASP desc for XSS"}
    ]
  }

Notes about the JSON file:

"id" holds the vulnerability unique ID. The contents of the "id" field must match filename
The "OWASP" fields points to the OWASP Top10 category. We have different versions of this (2010 and 2013) to match the different releases of the OWASP Top10 project
"fix-effort" gives the user an idea of how much time it will take to solve this vulnerability, it's stored in minutes
Since a long blob of unformatted text is hard to read for any user, the description and solution text MUST use markdown for formatting. There are various python libraries which translate markdown to HTML, which will allow us to show the text in the GUI and or Web UI.

Long strings in JSON

JSON has an awful limitation: "No multi-line strings". In our case this is very limiting since we don't want to have awfully long lines in these sections:

  "description": "A very long description for Cross-Site Scripting",
  "solution": "A very long text explaining how to fix XSS vulnerabilities",

So, the parser should check if the description and solution fields are strings or arrays. If they are arrays, the contents of the array should be joined using an empty space. For example:

  "description": ["A very long description for Cross-Site Scripting",
                         " which has more than one line"]

Will be parsed as A very long description for Cross-Site Scripting which has more than one line and then if the user wants to enter new lines, he needs to do so explicitly:

  "description": ["Line1\n",
                         "Line2\n"]

Will be parsed as:

Line1
Line2

Translations

One of the cool things about this architecture is that it will allow us to easily add translations. Just adding a new set of JSON files (keeping the vulnerability IDs) would work.

Implementation details:

The English version will be in db/
Unix's environment variable LANG is used for setting the language
If there is no translation for the current LANG then the English version is used
Translations will be in db/ru/ , db/es/ , etc.
The translation json files will override the parts of the main JSON file which is in English, for example, the main file contains:

  "title": "Cross-Site Scripting",

And then the Russian translation file (for the same vulnerability id) says:

  "title": "Межсайтовый скриптинг",

And if the LANG environment variable is set to Russian then the python/ruby/go wrapper should return Cross-Site Scripting in Russian as the title for this vulnerability. Any field from the main JSON file can be overridden (for example you can override a link to wikipedia to point to the XSS description in Russian).

Content

We'll use the content already created for the Arachni scanner, gently contributed to vulndb/data by Tasos.

Unittesting

Unittests must be in the tests/ directory
We should write a json-schema for our JSON, and then validate the JSON after each push with a unittest. See https://python-jsonschema.readthedocs.org/en/latest/ for more information about json-schemas
Assert that the severity field is one of high, medium, low, informational
Assert that these fields are present and contain at least 30 chars:
- description
- solution
Assert that these fields are present:
- id
- title
- fix_effort
- severity
Assert that all references have url and title fields, and that they are not empty
For each value in WASC/OWASP, make sure that we can generate a link like http://owasp.org/foo/<id>, the URL is valid, site is online, not 404
For each URL field in the schema, we need to check that:
- It's a valid URL
- The site is online, and no 404 is returned
fix_effort value is in the right format
The markdown in both description and solution are well formed
The ID in the file name is the same as the one inside the JSON file
The lines are not longer than 90 columns

References

andresriancho/w3af#53

The text was updated successfully, but these errors were encountered:

Zapotek · 2015-03-27T12:25:09Z

I think it'd be a good idea to add a severity in the format, Arachni currently uses high, medium, low, informational.

Zapotek · 2015-03-27T12:28:17Z

Also, for the references, it may be better if desc is changed to title or if we have a mandatory title and an optional description.

In addition, caps in the keys seem weird, it'd be more consistent if we use lowercases for everything and underscores only instead of mixing - and _.

andresriancho · 2015-03-27T12:31:02Z

Added severity to the JSON format by updating the issue

andresriancho · 2015-03-27T12:33:32Z

Changed all to lower case with underscores

andresriancho · 2015-03-27T12:34:30Z

Changed desc to title, I don't believe we need desc+title for references, IMHO with title it's enough, but if you guys believe we should have both, we can add "desc"

Zapotek · 2015-03-27T13:42:10Z

A few more suggestions, maybe rename solution to fix so that it'll match fix_effort (or vice versa) or better yet have:

{
    "fix": {
      "guidance": "Blah blah",
      "effort": 50
    }
}

So that we can keep a coherent relationship between the fields.
That would also allow us to get platform specific in the future by extending the schema.

captn3m0 · 2015-03-27T14:00:21Z

Why not use yaml? Yaml has multiline support, and you could probably even use yaml-frontmatter format as well.

For eg, at backdoor, we store the challenge information in yaml as follows:

---
tags:
    - binary
    - medium
creator: vampire
score: '70'
title: 'Hidden Flag - Medium'
flag: This is the flag
---
n00b became depressed when 'Pro' found the flag in his binary in a matter of seconds.
This time he hid the flag a little more securely.
See if you can still find it: [file](http://hack.bckdr.in/HIDE-MEDIUM/hide_medium)

The lower portion is in markdown, and any such files render with the metadata in github perfectly.

andresriancho · 2015-03-27T14:02:36Z

@Zapotek Agreed on the fix stuff, added to spec in issue text

andresriancho · 2015-03-27T14:13:51Z

@captn3m0 I like yaml, i'm 100% sure it can be used to store our database and solves the multi-line issue that json has. Anything else we need to know about yaml? Any other extra feature we might take into account to consider migrating to that format?

Zapotek · 2015-03-27T14:17:59Z

Well, some YAML parsers aren't really safe (looking at Ruby).

Zapotek · 2015-03-27T14:25:12Z

@andresriancho If references only need a URL and a title we can use the URL as the key and the title as the value like in Arachni. And better make the title mandatory I think.

andresriancho · 2015-03-27T14:34:07Z

Agreed on the mandatory title (changed) but don't agree on the URL as key, this:

  "references": [
      {"url": "http://foo.com/xss", "title": "First reference to XSS vulnerability"},
    ]

is more readable for humans than:

  "references": [
      {"http://foo.com/xss": "First reference to XSS vulnerability"},
    ]

And we want humans editing these JSON files easily, quickly, without second-doubting what they should enter in each section/field

andresriancho · 2015-03-27T14:35:33Z

PS: Also added the tags which were present in the arachni json files

captn3m0 · 2015-03-27T15:45:58Z

Most yaml parsers run in safe mode by default these days.
All json is automatically valid yaml as well, which means you could just rename files and they will be parsed as yaml fine even now

Other than this (and the high readability), I cant think of any other benefits.

andresriancho · 2015-03-27T15:57:56Z

To accommodate fix guidance for different programming languages:

  "fix": {
      "guidance": "A very long text explaining how to fix XSS vulnerabilities",
      "effort": 50
    },

Or

  "fix": {
      "guidance": {"general": "A very long text explaining how to fix XSS vulnerabilities in general",
                          "django": "Change the view / template code using django stuff"}
      "effort": 50
    },

m0sth8 · 2015-03-27T16:28:19Z

Maybe the good way is to put guidance for different languages in different files, like you already described for languages support?

andresriancho · 2015-03-27T16:31:20Z

I believe that would just complicate things and make it difficult to find the right file to edit 👎 for me.

m0sth8 · 2015-03-27T17:02:11Z

Usually, languages have json support out of the box ( "encoding/json" in Go or "json" in Python) and don't have yaml.
Also, it's easy to write bad yaml than json, but it's not so important in our case.

andresriancho · 2015-03-27T18:50:07Z

👎 for yaml then

m0sth8 · 2015-03-27T20:32:11Z

I want to specify:

Is description required and must contain at least 30 chars?
What is a minimum and maximum length for title and tags? (I think about 2 and 255)

andresriancho · 2015-03-27T20:39:12Z

Is description required and must contain at least 30 chars?

Yes, I set it to an arbitrary 30 chars, but if you see it's too low set it to more.

What is a minimum and maximum length for title and tags? (I think about 2 and 255)

Title feel like 4-255
Tags feel like 2-255

m0sth8 · 2015-03-27T21:34:37Z

If we gonna use

  "fix": {
      "guidance": {"general": "A very long text explaining how to fix XSS vulnerabilities in general",
                          "django": "Change the view / template code using django stuff"}
      "effort": 50
    },

than effort also depends on different programming languages/cms/frameworks

andresriancho · 2015-03-27T22:11:43Z

Well, the effort is the aprox number of minutes required to fix the issue, I doubt that fixing a vuln in django takes 1 minute and then it will take 38 in php. We could either:

Associate it with the general/django/etc. and most likely have the same aprox effort for all
Make it independent and accept the mistake we might have in some cases

I'm +0 on this, you take the decision and let me know how it goes.

m0sth8 · 2015-03-27T22:47:40Z

What do you think if we add index file for search purpose?

{
"tags": {
    "tag_name1": [1, 2], // array of vulnerabilities ids for the tag
}
// ...
}

The file might be generated automatically.

andresriancho · 2015-03-27T22:50:17Z

Might be useful, please create a different ticket for it

andresriancho mentioned this issue Mar 27, 2015

Import arachni scanner data #4

Closed

m0sth8 mentioned this issue Mar 27, 2015

json schema for vulnerability #8

Merged

m0sth8 mentioned this issue Mar 30, 2015

Too many new lines #14

Closed

andresriancho closed this as completed Mar 30, 2015

andresriancho mentioned this issue Apr 1, 2015

Write unittests required by specification #20

Closed

andresriancho added a commit that referenced this issue Apr 1, 2015

Adding more tests for #20 #5

3eacaf9

andresriancho added a commit that referenced this issue Apr 2, 2015

Adding more tests for #5 #20

becd962

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database design #5

Database design #5

andresriancho commented Mar 27, 2015

Zapotek commented Mar 27, 2015

Zapotek commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

Zapotek commented Mar 27, 2015

captn3m0 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

Zapotek commented Mar 27, 2015

Zapotek commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

captn3m0 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

Database design #5

Database design #5

Comments

andresriancho commented Mar 27, 2015

JSON backend

Long strings in JSON

Translations

Content

Unittesting

References

Zapotek commented Mar 27, 2015

Zapotek commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

Zapotek commented Mar 27, 2015

captn3m0 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

Zapotek commented Mar 27, 2015

Zapotek commented Mar 27, 2015

andresriancho commented Mar 27, 2015

andresriancho commented Mar 27, 2015

captn3m0 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015

m0sth8 commented Mar 27, 2015

andresriancho commented Mar 27, 2015