Skip to content

Commit

Permalink
#25, #26: fix GFM style autolinks parsing
Browse files Browse the repository at this point in the history
One problem was caused by too aggressive normal text lexing, which
won the autolink rule. The other one was just poor link regex,
it is improved to match the added examples.
  • Loading branch information
valich committed Jul 19, 2018
1 parent bad1fbc commit 4b12d4e
Show file tree
Hide file tree
Showing 6 changed files with 1,148 additions and 1,030 deletions.
2,049 changes: 1,030 additions & 1,019 deletions src/org/intellij/markdown/flavours/gfm/lexer/_GFMLexer.java

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions src/org/intellij/markdown/flavours/gfm/lexer/gfm.flex
Expand Up @@ -188,7 +188,9 @@ TAG_END = "</" {TAG_NAME} {WHITE_SPACE}* ">"
SCHEME = [a-zA-Z]+
AUTOLINK = "<" {SCHEME} ":" [^ \t\f\n<>]+ ">"
EMAIL_AUTOLINK = "<" [a-zA-Z0-9.!#$%&'*+/=?\^_`{|}~-]+ "@"[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])? (\.[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)* ">"
GFM_AUTOLINK = (("http" "s"? | "ftp")"://" | "www.") ({ALPHANUM}([a-zA-Z0-9-]*{ALPHANUM})? ".")+ [a-zA-Z]{2,6} ("/"[a-zA-Z0-9.,+%_&!?#=-]+)* "/"?

HOST_PART={ALPHANUM}([a-zA-Z0-9_-]*{ALPHANUM})?
GFM_AUTOLINK = (("http" "s"? | "ftp" | "file")"://" | "www.") {HOST_PART} ("." {HOST_PART})* (":" [0-9]+)? ("/"[a-zA-Z0-9.,+%_&()!?#=-]+)* [a-zA-Z0-9#/]

%state TAG_START, AFTER_LINE_START, PARSE_DELIMITED, CODE

Expand Down Expand Up @@ -285,7 +287,7 @@ GFM_AUTOLINK = (("http" "s"? | "ftp")"://" | "www.") ({ALPHANUM}([a-zA-Z0-9-]*{A

{GFM_AUTOLINK} { return GFMTokenTypes.GFM_AUTOLINK; }

{ALPHANUM}+ / {WHITE_SPACE}+ {GFM_AUTOLINK} {
{ALPHANUM}+ (({WHITE_SPACE}+ | "_"+) {ALPHANUM}+)* / {WHITE_SPACE}+ {GFM_AUTOLINK} {
return Token.TEXT;
}

Expand Down
3 changes: 2 additions & 1 deletion test/data/html/puppetApache.pos.txt
Expand Up @@ -1873,7 +1873,8 @@ Specifies an array of <code md-src-pos="55005..55018">path =&gt; URI</code>
<code md-src-pos="56049..56057">params</code>
is an optional parameter. It allows to provide the ProxyPass key=value parameters (Connection settings).
<code md-src-pos="56163..56171">setenv</code>
is optional and is an array to set environment variables for the proxy directive, for details see http://httpd.apache.org/docs/current/mod/mod_proxy.html#envsettings</p>
is optional and is an array to set environment variables for the proxy directive, for details see <a href="http://httpd.apache.org/docs/current/mod/mod_proxy.html#envsettings" md-src-pos="56270..56337">http://httpd.apache.org/docs/current/mod/mod_proxy.html#envsettings</a>
</p>
<h5 md-src-pos="56339..56362">
<code md-src-pos="56344..56362">proxy_dest_match</code>
</h5>
Expand Down
14 changes: 8 additions & 6 deletions test/data/html/ruby17351.txt
Expand Up @@ -11,8 +11,8 @@ This is a single .war file deployment of <strong>Knowtify Log Analytics</strong>
<ol>
<li>
<p>
Red Hat 7 family (version 7.x Red Hat Enterprise Linux, CentOS, Fedora, Scientific Linux, Oracle Linux and others) Commands for the Red Hat 6 family, where different, are noted in square brackets [RH6: ~command~ ] see: <a href="http://www.unixmen.com/postgresql-9-4-released-install-centos-7/,">http://www.unixmen.com/postgresql-9-4-released-install-centos-7/,</a>
and <a href="http://www.postgresql.org/download/linux/redhat/">http://www.postgresql.org/download/linux/redhat/</a>
Red Hat 7 family (version 7.x Red Hat Enterprise Linux, CentOS, Fedora, Scientific Linux, Oracle Linux and others) Commands for the Red Hat 6 family, where different, are noted in square brackets [RH6: ~command~ ] see: <a href="http://www.unixmen.com/postgresql-9-4-released-install-centos-7/">http://www.unixmen.com/postgresql-9-4-released-install-centos-7/</a>
, and <a href="http://www.postgresql.org/download/linux/redhat/">http://www.postgresql.org/download/linux/redhat/</a>
</p>
<ol>
<li>
Expand Down Expand Up @@ -540,7 +540,7 @@ Download TAR.GZ from Elastic: <code>wget https://download.elastic.co/logstash/lo
</ol>
</li>
<li>
install the solr_http output plugin (info at https://<a href="www.elastic.co/guide/en/logstash/current/plugins-outputs-solr_http.html">www.elastic.co/guide/en/logstash/current/plugins-outputs-solr_http.html</a>
install the solr_http output plugin (info at <a href="https://www.elastic.co/guide/en/logstash/current/plugins-outputs-solr_http.html">https://www.elastic.co/guide/en/logstash/current/plugins-outputs-solr_http.html</a>
):<ol>
<li>
<code>cd /opt/logstash-1.5.4</code>
Expand Down Expand Up @@ -624,7 +624,7 @@ Test in a newly opened command prompt with <code>cd %JAVA_HOME%</code>
</ol>
</li>
<li>
Install the Windows Event Log input plugin (info at https://<a href="www.elastic.co/guide/en/logstash/current/plugins-inputs-eventlog.html#plugins-inputs-eventlog-type">www.elastic.co/guide/en/logstash/current/plugins-inputs-eventlog.html#plugins-inputs-eventlog-type</a>
Install the Windows Event Log input plugin (info at <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-eventlog.html#plugins-inputs-eventlog-type">https://www.elastic.co/guide/en/logstash/current/plugins-inputs-eventlog.html#plugins-inputs-eventlog-type</a>
):<ol>
<li>
<code>cd %LOGSTASH_HOME%</code>
Expand All @@ -635,7 +635,7 @@ install the plugin: <code>bin\plugin install logstash-input-eventlog</code>
</ol>
</li>
<li>
Install the solr_http output plugin (info at https://<a href="www.elastic.co/guide/en/logstash/current/plugins-outputs-solr">www.elastic.co/guide/en/logstash/current/plugins-outputs-solr</a>
Install the solr_http output plugin (info at <a href="https://www.elastic.co/guide/en/logstash/current/plugins-outputs-solr">https://www.elastic.co/guide/en/logstash/current/plugins-outputs-solr</a>
_http.html):<ol>
<li>
<code>cd %LOGSTASH_HOME%</code>
Expand All @@ -657,7 +657,9 @@ Create a run.bat file in %LOGSTASH_HOME%\bin. It should contain this command <co
</li>
<li>
To install Logstash to run as a Windows Service:<ol>
<li>Download NSSM from http://nssm.cc/</li>
<li>
Download NSSM from <a href="http://nssm.cc/">http://nssm.cc/</a>
</li>
<li>Unzip NSSM-??.zip to a temporary directory</li>
<li>Copy the win32 or win64 version (as appropriate) of nssm.exe to %LOGSTASH_HOME%\bin</li>
<li>
Expand Down
32 changes: 31 additions & 1 deletion test/data/parser/gfmAutolink.md
Expand Up @@ -4,6 +4,7 @@ http://aa.aa
https://aa.aa
ftp://aa.aa
ssh://aa.aa
file://aa.aa
www.aa.aa/
www.aa.aa//
www.aa.aa/script.php?a=b&x=%20y#hashtag
Expand All @@ -14,4 +15,33 @@ www.aa.aa/script.php?a=b&x=%20y#hashtag

[www.not.rendered.as.autolink.too][www.and.this.too]

Google Group: https://groups.google.com/forum/#!forum/markdown-here
Google Group: https://groups.google.com/forum/#!forum/markdown-here

JetBrains Account portal
https://account.jetbrains.com
JetBrains Account portal https://account.jetbrains.com
JetBrains Online Store https://www.jetbrains.com/store

Check out library called Exposed: https://github.com/JetBrains/Exposed.

Credits for the testdata to https://mathiasbynens.be/demo/url-regex:

http://142.42.1.1:8080/

http://142.42.1.1:8080

http://localhost:8080

http://localhost.com:8080

http://foo.com/blah_(wikipedia)#cite-1

http://foo.com/blah_(wikipedia)_blah#cite-1

http://foo.com/(something)?after=parens

http://code.google.com/events/#&product=browser

http://a.b-c.de

https://foo_bar.example.com/
74 changes: 73 additions & 1 deletion test/data/parser/gfmAutolink.txt
Expand Up @@ -14,6 +14,8 @@ Markdown:MARKDOWN_FILE
Markdown::(':')
Markdown:TEXT('//aa.aa')
Markdown:EOL('\n')
Markdown:GFM_AUTOLINK('file://aa.aa')
Markdown:EOL('\n')
Markdown:GFM_AUTOLINK('www.aa.aa/')
Markdown:EOL('\n')
Markdown:GFM_AUTOLINK('www.aa.aa/')
Expand Down Expand Up @@ -56,4 +58,74 @@ Markdown:MARKDOWN_FILE
Markdown:TEXT('Google Group')
Markdown::(':')
WHITE_SPACE(' ')
Markdown:GFM_AUTOLINK('https://groups.google.com/forum/#!forum/markdown-here')
Markdown:GFM_AUTOLINK('https://groups.google.com/forum/#!forum/markdown-here')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:TEXT('JetBrains Account portal')
WHITE_SPACE(' ')
Markdown:EOL('\n')
Markdown:GFM_AUTOLINK('https://account.jetbrains.com')
Markdown:EOL('\n')
Markdown:TEXT('JetBrains Account portal')
WHITE_SPACE(' ')
Markdown:GFM_AUTOLINK('https://account.jetbrains.com')
Markdown:EOL('\n')
Markdown:TEXT('JetBrains Online Store')
WHITE_SPACE(' ')
Markdown:GFM_AUTOLINK('https://www.jetbrains.com/store')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:TEXT('Check out library called Exposed')
Markdown::(':')
WHITE_SPACE(' ')
Markdown:GFM_AUTOLINK('https://github.com/JetBrains/Exposed')
Markdown:TEXT('.')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:TEXT('Credits for the testdata to')
WHITE_SPACE(' ')
Markdown:GFM_AUTOLINK('https://mathiasbynens.be/demo/url-regex')
Markdown::(':')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://142.42.1.1:8080/')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://142.42.1.1:8080')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://localhost:8080')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://localhost.com:8080')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://foo.com/blah_(wikipedia)#cite-1')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://foo.com/blah_(wikipedia)_blah#cite-1')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://foo.com/(something)?after=parens')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://code.google.com/events/#&product=browser')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('http://a.b-c.de')
Markdown:EOL('\n')
Markdown:EOL('\n')
Markdown:PARAGRAPH
Markdown:GFM_AUTOLINK('https://foo_bar.example.com/')

0 comments on commit 4b12d4e

Please sign in to comment.