Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved regexes #52

Merged
merged 6 commits into from
Mar 21, 2018
Merged

improved regexes #52

merged 6 commits into from
Mar 21, 2018

Conversation

eur0pa
Copy link

@eur0pa eur0pa commented Mar 19, 2018

No description provided.

@vincentcox
Copy link
Owner

Thanks for your commit! I will have a look this evening.

@vincentcox
Copy link
Owner

I was comparing if there were differences in the report. Actually, your commit gives more results + all the findings in the current version.

The only "aesthetic" "issue" is shown below:

image

Actually, it's not an issue, but I think I will add a feature that outputs what keyword matches with the regex so it's easier to see in the line what matches.

Good commit, I will merge it with develop and add that feature this week. Thanks for helping out!

@vincentcox vincentcox changed the base branch from master to develop March 21, 2018 08:39
@vincentcox vincentcox merged commit 484b42d into vincentcox:develop Mar 21, 2018
@eur0pa
Copy link
Author

eur0pa commented Mar 21, 2018

Glad it was useful—those are my go-to regexes when I manually go through stuff : )

Copy link

@Ayowel Ayowel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick notes on the regexes

(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api'
(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test'
(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat'
passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support for pwd ? Handle camel case to detect vars that might hold interesting information at some point ? Is passw actually used in applications to justify the ? ?

(P|p)(wd|assw(d|ord))

Maybe the case is handled, didn't check in the code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. In a hour I will push a new commit to the development branch with also updated wordlists.

Maybe the case is handled, didn't check in the code. -> The code checks all regex's case insensitive.

passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score
(private|secret|api|aws)[_-]?key|||80
https?:|||7
(db|database)[_-]?(passw(d|ord)?|secret)|||80
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note on pwd

(db|database)[_-]?(p(wd|assw(d|ord)?)|secret)

(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api'
(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test'
(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat'
passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pwd again ?

p(wd|assw(d|ord)?)

INSERT INTO .* VALUES|||40||| Intersting SQL transaction
Copy link

@Ayowel Ayowel Mar 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution with spaces, might also want to handle newlines and tabs (and non-blocking spaces, and all the annoying stuff that can be put in there).
You might also want to ensure that case is not taken into account as sql does not consider it either.

INSERT +INTO +.* +VALUES

Maybe also try to detect cases where an sql query is performed by inserting vars directly in the string as it also provides an access point (there WILL be problems with false positives, newline/tab handling, ... there is only so much one can do wit a type 3 grammar and this was quickly thrown together and is by no mean a proper regex for this usage) :

(SELECT|UPDATE|CREATE|CALL|PREPARE) +.*['"] *\+

I don't see why their would be update or create in a client, but you never know.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

So your suggestion is to use (SELECT|UPDATE|CREATE|CALL|PREPARE) +.*['"] *\+ ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go for this one: https://regex101.com/r/cYSPbB/1

Many thanks for your idea, it drove me in the right direction!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit is pushed. I will discuss the "pwd" keyword at work, I'm also unsure if it would add any value.

Copy link

@Ayowel Ayowel Mar 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code checks all regex's case insensitive.

great

I will discuss the "pwd" keyword at work, I'm also unsure if it would add any value.

Not the kind of keyword that would produce a lot of noise, and pwd is not all that rare a var name to hold sensitive data.

Mind if I ask where/what is your work ?

I will go for this one: https://regex101.com/r/cYSPbB/1

Not sure which field you're talking about. Is it this one ?

(SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+)| (UPDATE\s[\w]+\sSET\s[\w\,\'\=]+)| (INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+)| (DELETE\sFROM\s[\d\w\'\=]+)

In this case, you might want to put them all on there own lines, chaining them like that makes it hard to read and doesn't make it faster (and that will avoid the random spaces after the '|' 😏 )

SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+
UPDATE\s[\w]+\sSET\s[\w\,\'\=]+
INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\sFROM\s[\d\w\'\=]+

And as I said, there might be more than one space (typo, newline and tabs for readability, ...), so you actually want to have all your \s with a + :

SELECT\s+[\w\*\)\(\,\s]+\s+FROM\s+[\w]+
UPDATE\s+[\w]+\s+SET\s[\w\,\'\=]+
INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\s+FROM\s+[\d\w\'\=]+

Plus, it is not uncommon to name tables and columns with _, which is not supported by your current regexes. Overall, it is possible to use pretty much any character in a table name (which does not mean it is recommended, but that's an other point), so using .* or .+ instead of a complex rule would probably help (and considering that newlines aren't usually handled with a ., you might want to write (.|\n)+) :

SELECT\s+(.|\n)+\s+FROM\s+[\w]+
UPDATE\s+.+\s+SET\s[\w\,\'\=]+
INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\s+FROM\s+[\d\w\'\=]+
  • Note: [.] matches . and not a, no need to write [\.]

Overall, there is no need to be too selective on those regexes, the simple fact that the requests are made from a client is already a proof that there is a problem ; your goal with those is only to know :

  1. Which requests can be done:
    • If I know that I can make a SELECT, I can perform extraction
    • If I know that I can make an INSERT, I can corrupt the DB
    • If I know that I can make a UPDATE/DELETE/DROP, I hope that you've saved your DB recently
    • and so on...
  2. If an SQL injection is possible in a field as a regular user

So what you want to know is just where those are and their content. I'd personally go for:

SELECT\s(.|\s)+\sFROM\s.+
UPDATE\s+(.|\s)+\s+SET\s.+
INSERT\s+INTO\s+(.|\s)*\sVALUES\s.+
DELETE\s+FROM\s.+

To me, false negatives are much worse than false positive in this situation. Depends on what you want, but knowing that requests are performed from the client's side means that I have a (maybe limited) direct access to the database at some point. It's huge.

Handling multiple lines is going to make the pattern matching very slow, so you might want to group all those in a single item, so as to lower the number of tests, but that will increase the number of false positives so check that regexes attempt to match the shortest pattern first or it's going to be a mess)

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind if I ask where/what is your work ?

I currently work for Zionsecurity. My professional profile can be found here: https://vincentcox.com/linkedin.

In this case, you might want to put them all on there own lines, chaining them like that makes it hard to read and doesn't make it faster (and that will avoid the random spaces after the '|' 😏 )

I'm affraid that the parser reads line by line, causing this to break. I do agree that readability is bad of the current searchregex.

So what you want to know is just where those are and their content. I'd personally go for:
...

Will add it to the roadmap so I don't forget:

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)?

That would be the final regex, am I correct?

To me, false negatives are much worse than false positive in this situation. Depends on what you want, but knowing that requests are performed from the client's side means that I have a (maybe limited) direct access to the database at some point. It's huge.

You are 100% right, I do agree with you!

Thanks for taking (a lot) of time for writing your response and motivating your suggestions. I will add you to the top contributors, I really appreciate it!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm affraid that the parser reads line by line, causing this to break. I do agree that readability is bad of the current searchregex.

Well, I was actually talking about turning the regex into 4 independant regexes, so that should be fine.

Soooo... I hadn't actually tested the regexes and had written on the fly as they more intended as examples than actual implementations.

This (SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)? is probably not a good idea (first of all because it doesn't parse, but also because it will produce way too many false positives on render, making the field effectively useless).

A working variant could be good if it was to run on strings extracted from the source code, but running it on the code itself is probably going to produce so many false positives each time that no one is going to pay attention to the field's results (so maybe in a later version, but right now I don't think such a feature is supported). You'd need to test it though, as it's mostly assumptions based on how I think the regex is going to behave. You could have a good surprise. Working version :

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)?

Added starting delimiters detection (to run against raw code and avoid detection of single vars whose name ends with a regex' start). This might actually provoke some false negatives, but I think most cases are handled here:

[;'\"\s()](SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)?
  • ; If SQL requests are chained in a single string
  • ' & " string start & end detection
  • \s spaces & friends
  • () might occur in case of nested SQL Query

Or, for more specifics, you could split them up in 4 regexes (but as said in my previous post, you usually want to avoid multiple-lines matches, they make the overall pattern matching much slower):

[;'\"\s()]SELECT\s(.|\s)+\sFROM\s
[;'\"\s()]UPDATE\s(.|\s)+\sSET\s
[;'\"\s()]INSERT\s+INTO\s(.|\s)+\sVALUES\s
[;'\"\s()]INSERT\s+INTO\s(.|\s)+\sSET\s
[;'\"\s()]DELETE\s+FROM\s(.|\s)+

And yes, that's pretty much the first suggestion I made. If you want to display the matches until the end of the line, you could add a .*, but that's just a cosmetic change (try to display the whole matched pattern).


Quick explanation on the + positions:

you could place them after pretty much everything ([;'\"\s()], \s and (.|\s)), but you usually want to avoid any ambiguity on what matches what (because any ambiguity might cost you extra steps to solve them depending on your implementations), so there is no reason to put a + behind a \s when the following/previous match also has a + and uses the same char ((.|\s)+).
With [;'\"\s()], it's just that we don't really care what is matched before one character, so no need to generate a result bigger tan necessary.


If the website you linked matches the same way stacoan does, you might want to look if there is a way to ask for the shortest match instead of the longest (it's different algorithms, I don't know if both are implemented) and give a way to ask to use one or the other. Here, it would really change a lot on the output and be much more interesting in many cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I optimized your regex here (I replaced \s with [ \t] because new-lines were included with \s making everything match. But after that replacement it works perfectly:
https://regex101.com/r/xOWvH4/1

If you give your go, I will add it to the list 👍

About the seperate queries:
I tried to do the same: https://regex101.com/r/vZSRys/1, but it doesn't seem to work.

I really appreciate all the effort on explaining how you came to these regex's, therefore I added you in the top contributor's list: 19f956f

The develop will be merged to master this weekend if all go well.

Everyday I am amazed how great the INFOSEC community is 🎉 .

vincentcox added a commit that referenced this pull request Apr 2, 2018
* Update readme file

- Table of contents
- Releases documentation

* improved regexes (#52)

* Update readme file (#51)

- Table of contents
- Releases documentation

* Update db_search_words.txt

* Update src_search_words.txt

* Update exclusion_list.txt

* Update src_search_words.txt

* Update db_search_words.txt

* Owasp integration

* Various fixes and improvements
- Update wordlists
- Converted searchwords code into object oriented code
- Comments are now placed in the report

* Change port to less common ports

Fix for #54

* Fix browser issue + cleaner logging

#54

* Added Contributor + screenshot readme

- Added Ayowel to top contributor's
- Added section "How does the tool works?"

* Download demo report immediately when clicking on the link (#57)

* Update readme file (#51)

- Table of contents
- Releases documentation

* Download demo report immediately when clicking on the link

* Fix mac (#60)

Mac issue: #54

* Server fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants