Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support syntax highlighting #67

Closed
mgrabovsky opened this issue Jun 27, 2014 · 19 comments
Closed

Support syntax highlighting #67

mgrabovsky opened this issue Jun 27, 2014 · 19 comments

Comments

@mgrabovsky
Copy link

It would be nice to have at least a basic support for highlighting code, i.e., some kind of polyglot, catch-all that would highlight the most frequently used keywords, string and number literals, symbols, etc.

A more robust option would be to use a third-party highlighting library with either (a) supplying the language in the markup (e.g. [code lang=ruby]puts "hello world"[/code]), or (b) employing a probabilistic language recognition algorithm. This may not go well with the current light-weight, few-dependencies philosophy, though.

@infamousbutterly
Copy link

I second this.

@Admin-Kaf
Copy link

I planned to do that on my fork some day™ using the markup {{code goes here}} since tinyboard don't use [tags] like that. Here's three important points:

  1. We/I have to make sure that any markup inside the code markup is escaped including the {{ }} itself.
  2. I don't really know how to choose the language… Maybe {{(python) code}} or {{[python] code}}.
  3. For the SyntaxHighlighter plugin I know this one: http://alexgorbatchev.com/SyntaxHighlighter/ but maybe someone knows a better one.

Completely unrelated but I also plan to find some way to allow latex with the markup $$latex$$.

@mgrabovsky
Copy link
Author

What's wrong with BBCode-like tags? I don't understand how the current parser works, but it should be straightforward to integrate it into a reasonably well-written one. Personally, I think syntax like {{sys.exit()}} is ugly and, as you've pointed out, doesn't easily allow for specifying the language.

If the BBCode-like option is deemed not feasible, would something like Markdown's syntax (indenting) be possible? If not, then at least something that's easily discernible within the bounds of the system. Such as the following (requiring {{{ and }}} to be alone on their lines, aside from the optional language specifier):

Some comment regarding the following code here, pay good attention:
{{{ haskell
main :: IO ()
main = putStrLn $ show $ 9958431258 * 15432274
}}}
And that's it.

As for highlighting, I found highlight.js in my bookmarks which features automatic language recognition (that would eliminate the need for a language specifier and allow for simpler syntax) and seems to be in active development. See also its test page.

One more thing to consider: Should only blocks of code be allowed, or should inline code be supported, as well?

@czaks
Copy link
Member

czaks commented Jun 30, 2014

The current parser are simple dumb regular expressions (i.e. parser doesn't exist).

There's ongoing issue how to solve eg. this bit of code (assuming [code] is a code tag and [b] is a bold tag): [code=php]print("I like [b]bbcode[/b]!");[/code]

I actually got some idea about it now, will reply later :)

@czaks
Copy link
Member

czaks commented Jun 30, 2014

Ok, let's suppose, that we start processing. We do a regexp search for [code]text[/code], save the text into a variable called “0” and replace it with [postprocess]0[/postprocess]

Now, we can add color codes etc. for variables “0”, “1”, etc., wrap them into proper markup, eg. < code> or < pre>

Then, we launch all the another markup things, doing bolds, replacing -- with – etc.

At the end, we search for all [postprocess]number[/postprocess] and replace them with variable of a given number. This way the code doesn't get messed up.

About a library for highlighting, I know Geshi: http://qbnz.com/highlighter/ . I used it, but it has a licensing problem: it's released under a copyleft GNU GPL license (which may, or may not be a problem — we already have a php-gettext library, GPL licensed, which is conditionally loaded, when a native gettext library doesn't exist). Maybe there exists some Kate (KDE Advanced Text Editor) coloring db parser for php (I know that one exists for Haskell).

We can also offload the highlighting to Javascript (moving all the syntax definitions to be downloaded by the client).

@ctrlcctrlv
Copy link
Member

8chan.co already has this through my own patch that uses JavaScript highlighting.

$config['additional_javascript'][] = 'js/code_tags/run_prettify.js'; // https://code.google.com/p/google-code-prettify/
$config['markup'][] = array("/\[code\](.+?)\[\/code\]/ms", "<code><pre class='prettyprint' style='display:inline-block'>\$1</pre></code>");

Produces markup like https://8chan.co/b/res/2009.html#2043

@ghost
Copy link

ghost commented Jun 30, 2014

GNU-licensed software must not be a problem - I'd rather not use a piece of software at all than seeing it become anti-GNU only because the original developer is currently a StopNerd.

On topic: I discourage using a javascript library if it's not vastly better than a php solution, and GeSHi looks pretty fine for me.

@Admin-Kaf
Copy link

GeSHI looks awesome.

For the markup, I find it not really logical to use a [code][/code] when everything else use '''this''' or this. We should follow the already existing markup (inspired from the wiki one: https://en.wikipedia.org/wiki/Help:Wiki_markup#Text_formatting ) or replace everything by bbcode or markdown. Except that wiki use which is completely retarded.

{{_}} is a block tag anyway so I see no problem to obligate it to be alone on a line like mgrabovsky suggested. Like this:
{{\n
{{\n
code{{with shitty {'''syntax'''}}}\n
}}\n
}}\n

It's important to only take the ousides {{}} we must be able to use them inside the code.

@mgrabovsky
Copy link
Author

GeSHi is all right, Wikimedia sites use it, too. As for the syntax, I still think that [code] or Markdown's triple-backquote are the only viable options.

@czaks — Personally, I'm not happy about the architecture of the parser, so I'd rather leave that to the more experienced people.

@Admin-Kaf — You're trying to enforce consistency where there's none. Let's look at the current markup:

  • '''strong''', ''italic'' and ==heading== from MediWiki (presumably),
  • **spoiler** from somewhere else.

As you pointed out, Wikipedia uses XML-like tags <source lang="..."> (or <syntaxhighlight>), which, I too believe, would not be entirely appropriate in this setting.

Also, I don't understand your statement about {{...}} being a block tag, but my point was that the two braces themselves are barely visible in a text, therefore ugly. Moreover, as you've also been so kind to show, it might become a chore to even parse the syntax.

@Admin-Kaf
Copy link

{{ }} is a block (as in block vs inline) tag because you will never have code and normal text on the same line. The code will take place in a new line in a block (with numerated lines and stuff) and then the text will continue on the line after.

My point was that every tinyboard/vichan markups are two times the same symbol for opening and ending which is cool because it's very quick to type and kinda logical sometimes ( is commented in the config file and can be added for underline and I added myself --…-- for ). So I tried to find the same for the code. {} are the characters that make me think the most of programming stuff. It could have been [[…]] or //…// or \…\ or anything really. And $$…$$ for latex because it's basically what's used for formulas in latex.

I know it's disturbing for people from 4chan or forums but it's already the way it's made and when you know it, it's very convenient. (And it's also very funny to see newfags trying to make strong text.)

@mgrabovsky
Copy link
Author

Oh, I understand. It still doesn't change the fact that it's hard to see in text and might be hard to parse.

But the way, just a correction: LaTeX encourages \(...\) and \[...\] for math mode; $...$ and $$...$$ are deprecated.

@ctrlcctrlv
Copy link
Member

why not

```lang
 ex
```

like github?

@mgrabovsky
Copy link
Author

Yes, that's what I was suggesting by “Markdown's triple-backquote”.

@Admin-Kaf
Copy link

Or a double one:

Also since the opening tag is the same than the ending how can it differentiate:

code

code bis

and this? :
code with
shit
inside

@mgrabovsky:
Close enough, it's two times the same character it's suits perfectly for latex.

@Admin-Kaf
Copy link

Ok github answered itself. (it seems that a double on is enough for github to be taken as a code tag)
We should however take into account the `` being alone on a line because it can be used for SQL for exemple.

@ctrlcctrlv
Copy link
Member

If any of you wants to submit a PR that would be good. I would support GeSHi and Github style markup by default.

This is not high on my todo list right now.

@DirectorCleese
Copy link

Easiest solution: Have a page that lists what the markup is. I don't care if it is in Cantonese. I am finding your github more easily than a markup list, you might be doing it wrong.

On that note, while harder to remember for new users, I agree short character sets are best since they are less verbose.

@czaks
Copy link
Member

czaks commented Mar 31, 2015

The problem with that is that we still don't have a proper parser, just a bunch for regular expressions. I have a workaround in mind.

For example, let's suppose that [b]...[/b] means bold text and [code=Lisp]...[/code] means code markup. Not that i'm suggesting those, let's take it for granted.

How about such a code:

[code=Lisp][b]LOL DONGS[/b][/code]

What would our current “parser” do? It would probably encode it as:

<code><font color='red'>&lt;b&gt;LOL DONGS&lt;/b&gt;</font></code>

...if markup was run first, or:

<code><font color='red'><b>LOL DONGS</b></font></code>

...if the regexp things were run first.

@ctrlcctrlv how do you solve it in infinity?

@czaks
Copy link
Member

czaks commented Apr 12, 2015

Ok, one of the recent commits introduced one of the syntaxes: triple backquote, like github and [code], to be manually enabled by board admin. We still don't have a code highlighting solution, but it should be easy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants