improved regexes #52

eur0pa · 2018-03-19T10:33:01Z

No description provided.

- Table of contents - Releases documentation

vincentcox · 2018-03-19T10:41:11Z

Thanks for your commit! I will have a look this evening.

vincentcox · 2018-03-21T08:38:26Z

I was comparing if there were differences in the report. Actually, your commit gives more results + all the findings in the current version.

The only "aesthetic" "issue" is shown below:

Actually, it's not an issue, but I think I will add a feature that outputs what keyword matches with the regex so it's easier to see in the line what matches.

Good commit, I will merge it with develop and add that feature this week. Thanks for helping out!

eur0pa · 2018-03-21T08:59:54Z

Glad it was useful—those are my go-to regexes when I manually go through stuff : )

Ayowel

Quick notes on the regexes

Ayowel · 2018-03-24T22:03:45Z

src/config/db_search_words.txt

-(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api'
-(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test'
-(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat'
+passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score


Add support for pwd ? Handle camel case to detect vars that might hold interesting information at some point ? Is passw actually used in applications to justify the ? ?

(P|p)(wd|assw(d|ord))

Maybe the case is handled, didn't check in the code.

Thanks for your comment. In a hour I will push a new commit to the development branch with also updated wordlists.

Maybe the case is handled, didn't check in the code. -> The code checks all regex's case insensitive.

Ayowel · 2018-03-24T22:07:04Z

src/config/db_search_words.txt

+passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score
+(private|secret|api|aws)[_-]?key|||80
+https?:|||7
+(db|database)[_-]?(passw(d|ord)?|secret)|||80


same note on pwd

(db|database)[_-]?(p(wd|assw(d|ord)?)|secret)

Ayowel · 2018-03-24T22:08:28Z

src/config/src_search_words.txt

-(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api'
-(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test'
-(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat'
+passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score


pwd again ?

p(wd|assw(d|ord)?)

Ayowel · 2018-03-24T22:39:03Z

src/config/src_search_words.txt

Caution with spaces, might also want to handle newlines and tabs (and non-blocking spaces, and all the annoying stuff that can be put in there).
You might also want to ensure that case is not taken into account as sql does not consider it either.

INSERT +INTO +.* +VALUES

Maybe also try to detect cases where an sql query is performed by inserting vars directly in the string as it also provides an access point (there WILL be problems with false positives, newline/tab handling, ... there is only so much one can do wit a type 3 grammar and this was quickly thrown together and is by no mean a proper regex for this usage) :

(SELECT|UPDATE|CREATE|CALL|PREPARE) +.*['"] *\+

I don't see why their would be update or create in a client, but you never know.

Good point!

So your suggestion is to use (SELECT|UPDATE|CREATE|CALL|PREPARE) +.*['"] *\+ ?

I will go for this one: https://regex101.com/r/cYSPbB/1

Many thanks for your idea, it drove me in the right direction!

Commit is pushed. I will discuss the "pwd" keyword at work, I'm also unsure if it would add any value.

The code checks all regex's case insensitive.

great

I will discuss the "pwd" keyword at work, I'm also unsure if it would add any value.

Not the kind of keyword that would produce a lot of noise, and pwd is not all that rare a var name to hold sensitive data.

Mind if I ask where/what is your work ?

I will go for this one: https://regex101.com/r/cYSPbB/1

Not sure which field you're talking about. Is it this one ?

(SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+)| (UPDATE\s[\w]+\sSET\s[\w\,\'\=]+)| (INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+)| (DELETE\sFROM\s[\d\w\'\=]+)

In this case, you might want to put them all on there own lines, chaining them like that makes it hard to read and doesn't make it faster (and that will avoid the random spaces after the '|' 😏 )

SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+ UPDATE\s[\w]+\sSET\s[\w\,\'\=]+ INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+ DELETE\sFROM\s[\d\w\'\=]+

And as I said, there might be more than one space (typo, newline and tabs for readability, ...), so you actually want to have all your \s with a + :

SELECT\s+[\w\*\)\(\,\s]+\s+FROM\s+[\w]+ UPDATE\s+[\w]+\s+SET\s[\w\,\'\=]+ INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+ DELETE\s+FROM\s+[\d\w\'\=]+

Plus, it is not uncommon to name tables and columns with _, which is not supported by your current regexes. Overall, it is possible to use pretty much any character in a table name (which does not mean it is recommended, but that's an other point), so using .* or .+ instead of a complex rule would probably help (and considering that newlines aren't usually handled with a ., you might want to write (.|\n)+) :

SELECT\s+(.|\n)+\s+FROM\s+[\w]+ UPDATE\s+.+\s+SET\s[\w\,\'\=]+ INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+ DELETE\s+FROM\s+[\d\w\'\=]+

Note: [.] matches . and not a, no need to write [\.]

Overall, there is no need to be too selective on those regexes, the simple fact that the requests are made from a client is already a proof that there is a problem ; your goal with those is only to know :

Which requests can be done:

If I know that I can make a SELECT, I can perform extraction

If I know that I can make an INSERT, I can corrupt the DB

If I know that I can make a UPDATE/DELETE/DROP, I hope that you've saved your DB recently

and so on...

If an SQL injection is possible in a field as a regular user

So what you want to know is just where those are and their content. I'd personally go for:

SELECT\s(.|\s)+\sFROM\s.+ UPDATE\s+(.|\s)+\s+SET\s.+ INSERT\s+INTO\s+(.|\s)*\sVALUES\s.+ DELETE\s+FROM\s.+

To me, false negatives are much worse than false positive in this situation. Depends on what you want, but knowing that requests are performed from the client's side means that I have a (maybe limited) direct access to the database at some point. It's huge.

Handling multiple lines is going to make the pattern matching very slow, so you might want to group all those in a single item, so as to lower the number of tests, but that will increase the number of false positives so check that regexes attempt to match the shortest pattern first or it's going to be a mess)

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)?

Mind if I ask where/what is your work ?

I currently work for Zionsecurity. My professional profile can be found here: https://vincentcox.com/linkedin.

In this case, you might want to put them all on there own lines, chaining them like that makes it hard to read and doesn't make it faster (and that will avoid the random spaces after the '|' 😏 )

I'm affraid that the parser reads line by line, causing this to break. I do agree that readability is bad of the current searchregex.

So what you want to know is just where those are and their content. I'd personally go for:
...

Will add it to the roadmap so I don't forget:

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)?

That would be the final regex, am I correct?

To me, false negatives are much worse than false positive in this situation. Depends on what you want, but knowing that requests are performed from the client's side means that I have a (maybe limited) direct access to the database at some point. It's huge.

You are 100% right, I do agree with you!

Thanks for taking (a lot) of time for writing your response and motivating your suggestions. I will add you to the top contributors, I really appreciate it!

I'm affraid that the parser reads line by line, causing this to break. I do agree that readability is bad of the current searchregex.

Well, I was actually talking about turning the regex into 4 independant regexes, so that should be fine.

Soooo... I hadn't actually tested the regexes and had written on the fly as they more intended as examples than actual implementations.

This (SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)? is probably not a good idea (first of all because it doesn't parse, but also because it will produce way too many false positives on render, making the field effectively useless).

A working variant could be good if it was to run on strings extracted from the source code, but running it on the code itself is probably going to produce so many false positives each time that no one is going to pay attention to the field's results (so maybe in a later version, but right now I don't think such a feature is supported). You'd need to test it though, as it's mostly assumptions based on how I think the regex is going to behave. You could have a good surprise. Working version :

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)?

Added starting delimiters detection (to run against raw code and avoid detection of single vars whose name ends with a regex' start). This might actually provoke some false negatives, but I think most cases are handled here:

[;'\"\s()](SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)?

; If SQL requests are chained in a single string

' & " string start & end detection

\s spaces & friends

() might occur in case of nested SQL Query

Or, for more specifics, you could split them up in 4 regexes (but as said in my previous post, you usually want to avoid multiple-lines matches, they make the overall pattern matching much slower):

[;'\"\s()]SELECT\s(.|\s)+\sFROM\s [;'\"\s()]UPDATE\s(.|\s)+\sSET\s [;'\"\s()]INSERT\s+INTO\s(.|\s)+\sVALUES\s [;'\"\s()]INSERT\s+INTO\s(.|\s)+\sSET\s [;'\"\s()]DELETE\s+FROM\s(.|\s)+

And yes, that's pretty much the first suggestion I made. If you want to display the matches until the end of the line, you could add a .*, but that's just a cosmetic change (try to display the whole matched pattern).

Quick explanation on the + positions:

you could place them after pretty much everything ([;'\"\s()], \s and (.|\s)), but you usually want to avoid any ambiguity on what matches what (because any ambiguity might cost you extra steps to solve them depending on your implementations), so there is no reason to put a + behind a \s when the following/previous match also has a + and uses the same char ((.|\s)+).
With [;'\"\s()], it's just that we don't really care what is matched before one character, so no need to generate a result bigger tan necessary.

If the website you linked matches the same way stacoan does, you might want to look if there is a way to ask for the shortest match instead of the longest (it's different algorithms, I don't know if both are implemented) and give a way to ask to use one or the other. Here, it would really change a lot on the output and be much more interesting in many cases.

I optimized your regex here (I replaced \s with [ \t] because new-lines were included with \s making everything match. But after that replacement it works perfectly:
https://regex101.com/r/xOWvH4/1

If you give your go, I will add it to the list 👍

About the seperate queries:
I tried to do the same: https://regex101.com/r/vZSRys/1, but it doesn't seem to work.

I really appreciate all the effort on explaining how you came to these regex's, therefore I added you in the top contributor's list: 19f956f

The develop will be merged to master this weekend if all go well.

Everyday I am amazed how great the INFOSEC community is 🎉 .

* Update readme file - Table of contents - Releases documentation * improved regexes (#52) * Update readme file (#51) - Table of contents - Releases documentation * Update db_search_words.txt * Update src_search_words.txt * Update exclusion_list.txt * Update src_search_words.txt * Update db_search_words.txt * Owasp integration * Various fixes and improvements - Update wordlists - Converted searchwords code into object oriented code - Comments are now placed in the report * Change port to less common ports Fix for #54 * Fix browser issue + cleaner logging #54 * Added Contributor + screenshot readme - Added Ayowel to top contributor's - Added section "How does the tool works?" * Download demo report immediately when clicking on the link (#57) * Update readme file (#51) - Table of contents - Releases documentation * Download demo report immediately when clicking on the link * Fix mac (#60) Mac issue: #54 * Server fix

vincentcox and others added 6 commits March 11, 2018 10:48

Update readme file (#51)

49ac37a

- Table of contents - Releases documentation

Update db_search_words.txt

c8ad3b1

Update src_search_words.txt

0896307

Update exclusion_list.txt

7a1151f

Update src_search_words.txt

a4026ae

Update db_search_words.txt

5c73e7d

vincentcox changed the base branch from master to develop March 21, 2018 08:39

vincentcox merged commit 484b42d into vincentcox:develop Mar 21, 2018

Ayowel reviewed Mar 24, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improved regexes #52

improved regexes #52

eur0pa commented Mar 19, 2018

vincentcox commented Mar 19, 2018

vincentcox commented Mar 21, 2018

eur0pa commented Mar 21, 2018

Ayowel left a comment

Ayowel Mar 24, 2018

vincentcox Mar 25, 2018

Ayowel Mar 24, 2018

Ayowel Mar 24, 2018

Ayowel Mar 24, 2018 •

edited

Loading

vincentcox Mar 25, 2018

vincentcox Mar 25, 2018

vincentcox Mar 25, 2018

Ayowel Mar 25, 2018 •

edited

Loading

vincentcox Mar 28, 2018

Ayowel Mar 31, 2018

vincentcox Apr 1, 2018

improved regexes #52

improved regexes #52

Conversation

eur0pa commented Mar 19, 2018

vincentcox commented Mar 19, 2018

vincentcox commented Mar 21, 2018

eur0pa commented Mar 21, 2018

Ayowel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ayowel Mar 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ayowel Mar 25, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ayowel Mar 24, 2018 •

edited

Loading

Ayowel Mar 25, 2018 •

edited

Loading