New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better way to parse message and emotes? #11

Closed
celluj34 opened this Issue Jun 29, 2015 · 26 comments

Comments

Projects
None yet
5 participants
@celluj34
Contributor

celluj34 commented Jun 29, 2015

I was just curious, does anyone have a better method of parsing the message with the emotes array? Here's how I do it, wondering how everyone else does it.

function parseMessage(message, emotes) {
    var emoteArray = _.chain(emotes)
        .map(function(emote, index) {
            var charIndex = _.map(emote, function(chars) {
                var indexes = chars.split("-");

                return {
                    url: "http://static-cdn.jtvnw.net/emoticons/v1/" + index + "/1.0",
                    startIndex: parseInt(indexes[0]),
                    endIndex: parseInt(indexes[1]) + 1
                };
            });

            return charIndex;
        })
        .flatten()
        .sortBy(function(item) {
            return -1 * item.startIndex;
        })
        .value();

    if(emoteArray.length === 0) {
        return message;
    }

    var newMessage = message;

    _.each(emoteArray, function(emote) {
        var emoteName = newMessage.substring(emote.startIndex, emote.endIndex);

        var leftPart = newMessage.substring(0, emote.startIndex);
        var middlePart = makeImage(emoteName, emote.url);
        var rightPart = newMessage.substring(emote.endIndex);

        newMessage = leftPart + middlePart + rightPart;
    });

    return newMessage;
}


function makeImage(name, url) {
    return _s.sprintf("<img alt='%1$s' title='%1$s' src='%2$s' />", name, url);
}

It works by splitting the emote positions into a flattened array with every instance of said emote. Then I work backwards for every emote and string replace the position into an anchor tag (makeImage method).

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jun 29, 2015

This is the ugly code that I had used in the previous version for a stream chat.

stream chat

    function formatEmotes(text, emotes) {
        var splitText = text.split('');
        for(var i in emotes) {
            var e = emotes[i];
            for(var j in e) {
                var mote = e[j];
                if(typeof mote == 'string') {
                    mote = mote.split('-');
                    mote = [parseInt(mote[0]), parseInt(mote[1])];
                    var length =  mote[1] - mote[0],
                        empty = Array.apply(null, new Array(length + 1)).map(function() { return '' });
                    splitText = splitText.slice(0, mote[0]).concat(empty).concat(splitText.slice(mote[1] + 1, splitText.length));
                    splitText.splice(mote[0], 1, '<img class="emoticon" src="http://static-cdn.jtvnw.net/emoticons/v1/' + i + '/3.0">');
                }
            }
        }
        return splitText.join('');
    }

It essentially splits the text into individual characters and then sets all characters that are part of the emote into an empty character '' and changes the first index into an <img> tag. Doing it this way means going through each emote altering the indexes as given in one array, going through each emote however many times it needs to. Finally it joins it all together and returns it.

Rough example:

var message = 'OpieOP haha Kappa lel',
    emotes = {356:['0-5'],25:['12-16']};
    messageWithEmotes = formatEmotes(message, emotes);

messageWithEmotes comes out as:

<img class="emoticon" src="http://static-cdn.jtvnw.net/emoticons/v1/356/3.0"> haha <img class="emoticon" src="http://static-cdn.jtvnw.net/emoticons/v1/25/3.0"> lel
@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 1, 2015

slightly condensed version:

function parseMessage2(message, emotes) {
    var newMessage = message;

    _.chain(emotes)
        .map(function(emote, index) {
            var charIndex = _.map(emote, function(chars) {
                var indexes = chars.split("-");

                return {
                    url: "http://static-cdn.jtvnw.net/emoticons/v1/" + index + "/1.0",
                    startIndex: parseInt(indexes[0]),
                    endIndex: parseInt(indexes[1]) + 1
                };
            });

            return charIndex;
        })
        .flatten()
        .sortBy(function(item) {
            return -1 * item.startIndex;
        })
        .each(function(emote) {
            var emoteName = newMessage.substring(emote.startIndex, emote.endIndex);

            var leftPart = newMessage.substring(0, emote.startIndex);
            var middlePart = makeImage(emoteName, emote.url);
            var rightPart = newMessage.substring(emote.endIndex);

            newMessage = leftPart + middlePart + rightPart;
        });

    return newMessage;
}

@AlcaDesign did some benchmarks against our methods and yours was significantly faster.

Over 1,000,000 iterations with the following message: ha ha catch scarLOVE scarMEGA scarMEGA catch me now sucker! Kappa MiniK KappaPride the totals were:

celluj34: mean time: .00017021, total time 17.021s
alcadesign: mean time: .000058041, total time: 5.8041s

EDIT: forgot to include the actual emote array in the tests... 😞

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 1, 2015

Also, with the string hi, your code was still 4-5x faster than mine.

I'm looking into trying to use sprintf, but I need to whiteboard out some code. http://www.diveintojavascript.com/projects/javascript-sprintf

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 1, 2015

Woo! 😛

:feelsgood:

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 4, 2015

@AlcaDesign, I have more findings. Turns out I was reading the previous benchmark results incorrectly, so I downloaded a benchmark lib from npm just for this. This is a long one, so bear with me.

First, the results:

[10492:0703/193002:INFO:CONSOLE(0)] message is: `hi`
[10492:0703/193002:INFO:CONSOLE(0)] emote list: `null`
[10492:0703/193013:INFO:CONSOLE(0)] alca design parse  x 4,125,533 ops/sec ┬▒0.68% (92 runs sampled)
[10492:0703/193025:INFO:CONSOLE(0)] ltr string concat  x 300,610 ops/sec ┬▒1.22% (94 runs sampled)
[10492:0703/193036:INFO:CONSOLE(0)] unordered char rep x 4,199,769 ops/sec ┬▒0.16% (88 runs sampled)
[10492:0703/193036:INFO:CONSOLE(0)] Fastest is unordered char rep

[10492:0703/193058:INFO:CONSOLE(0)] message is: `this is a very long sentence. this is someone talking to someone else. very normal string. no emotes here`
[10492:0703/193058:INFO:CONSOLE(0)] emote list: `null`
[10492:0703/193109:INFO:CONSOLE(0)] alca design parse  x 1,104,231 ops/sec ┬▒0.27% (97 runs sampled)
[10492:0703/193120:INFO:CONSOLE(0)] ltr string concat  x 313,424 ops/sec ┬▒1.51% (94 runs sampled)
[10492:0703/193131:INFO:CONSOLE(0)] unordered char rep x 1,113,697 ops/sec ┬▒0.24% (93 runs sampled)
[10492:0703/193131:INFO:CONSOLE(0)] Fastest is unordered char rep

[10492:0703/193226:INFO:CONSOLE(0)] message is: `hi there Kappa some words scarMEGA scarLOVE scarLOVE with some emotes too :P :P :P `
[10492:0703/193226:INFO:CONSOLE(0)] emote list: `[object Object]`
[10492:0703/193237:INFO:CONSOLE(0)] alca design parse  x 16,486 ops/sec ┬▒1.48% (97 runs sampled)
[10492:0703/193248:INFO:CONSOLE(0)] ltr string concat  x 50,155 ops/sec ┬▒0.37% (98 runs sampled)
[10492:0703/193300:INFO:CONSOLE(0)] unordered char rep x 54,047 ops/sec ┬▒0.27% (102 runs sampled)
[10492:0703/193300:INFO:CONSOLE(0)] Fastest is unordered char rep

[10492:0703/193320:INFO:CONSOLE(0)] message is: `scarMEGA scarRIP scarWHY scarLOVE scarLURK scarHYPE scarWUT scarW dethxCREEPER dethxLOVE dethxCLUB dethxCC dethxHSJ dethxCUBE dethxNUT dethxGASM :) :( :D >( :| O_o `
[10492:0703/193320:INFO:CONSOLE(0)] emote list: `[object Object]`
[10492:0703/193331:INFO:CONSOLE(0)] alca design parse  x 4,930 ops/sec ┬▒0.32% (98 runs sampled)
[10492:0703/193342:INFO:CONSOLE(0)] ltr string concat  x 19,089 ops/sec ┬▒0.16% (99 runs sampled)
[10492:0703/193354:INFO:CONSOLE(0)] unordered char rep x 16,550 ops/sec ┬▒0.11% (102 runs sampled)
[10492:0703/193354:INFO:CONSOLE(0)] Fastest is ltr string concat

Now, I originally ran these benchmarks, and I could have sworn yours was faster. I'm not sure if I was dreaming or what, but the numbers (probably) don't lie.

Here's the code for all 3 methods:

var Benchmark = require(benchmark); //https://www.npmjs.com/package/benchmark

function emitMessage(channel, user, message, action) {
    console.log("message is: `" + message + "`");
    console.log("emote list: `" + user["emotes"] + "`");

    var suite = new Benchmark.Suite;

    suite.add("alca design parse ", function() {
        formatEmotes(message, user["emotes"]);
    }).add("ltr string concat ", function() {
        parseMessage(message, user["emotes"]);
    }).add("unordered char rep", function() {
        parseMessage2(message, user["emotes"]);
    }).on("cycle", function(event) {
        console.log(String(event.target));
    }).on("complete", function() {
        console.log("Fastest is " + this.filter("fastest").pluck("name") + "\n");
    }).run({
        async: true
    });
}

function formatEmotes(text, emotes) {
    var splitText = text.split("");
    for(var i in emotes) {
        var e = emotes[i];
        for(var j in e) {
            var mote = e[j];
            if(typeof mote == "string") {
                mote = mote.split("-");
                mote = [parseInt(mote[0]), parseInt(mote[1])];
                var length = mote[1] - mote[0],
                    empty = Array.apply(null, new Array(length + 1)).map(function() {
                        return "";
                    });
                splitText = splitText.slice(0, mote[0]).concat(empty).concat(splitText.slice(mote[1] + 1, splitText.length));
                splitText.splice(mote[0], 1, "<img class=\"emoticon\" src=\"http://static-cdn.jtvnw.net/emoticons/v1/" + i + "/3.0\">");
            }
        }
    }
    return splitText.join("");
}

function parseMessage(message, emotes) {
    var newMessage = "";
    var lastEndIndex = 0;

    _.chain(emotes)
        .map(function(emote, index) {
            var charIndex = _.map(emote, function(chars) {
                var indexes = chars.split("-");

                var startIndex = parseInt(indexes[0]);
                var endIndex = parseInt(indexes[1]) + 1;
                var name = message.substring(startIndex, endIndex);

                return {
                    url: makeImage(name, "http://static-cdn.jtvnw.net/emoticons/v1/" + index + "/1.0"),
                    startIndex: startIndex,
                    endIndex: endIndex
                };
            });

            return charIndex;
        })
        .flatten()
        .sortBy(function(item) {
            return item.startIndex;
        })
        .each(function(emote) {
            newMessage += (message.substring(lastEndIndex, emote.startIndex) + emote.url);

            lastEndIndex = emote.endIndex;
        });

    return newMessage + message.substring(lastEndIndex);
}

function parseMessage2(message, emotes) {
    var newMessage = message.split("");

    for(var emoteIndex in emotes) {
        var emote = emotes[emoteIndex];

        for(var charIndexes in emote) {
            var emoteIndexes = emote[charIndexes];

            if(typeof emoteIndexes == "string") {
                emoteIndexes = emoteIndexes.split("-");
                emoteIndexes = [parseInt(emoteIndexes[0]), parseInt(emoteIndexes[1])];

                for(var i = emoteIndexes[0]; i <= emoteIndexes[1]; ++i) {
                    newMessage[i] = "";
                }

                newMessage[emoteIndexes[0]] = "<img class=\"emoticon\" src=\"http://static-cdn.jtvnw.net/emoticons/v1/" + emoteIndex + "/3.0\">";
            }
        }
    }

    return newMessage.join("");
}

The first is your method, unadulterated.

The second is an array-push style, where I push the text from the last known emote end index to the current emote start index, then the url itself, continuing on until complete, grabbing the end of the string, if any.

The third is a modified version of your first, which doesn't use Array.apply, Array.splice, or Array.concat, which may account for the speedup.

Other notes, most of the overhead of all of these methods could be avoided by not running them if there are no emotes at all. Doing the following before iterating over the emote array yielded a 10x improvement over the fastest method when there were no emotes:

if(!user["emotes"] || user["emotes"].length === 0) {
    return message;
}

Feel free to use any of mine if you'd like. Or if you have any questions I would be happy to answer them.

@celluj34 celluj34 closed this Jul 4, 2015

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 4, 2015

whoop didn't mean to close

@celluj34 celluj34 reopened this Jul 4, 2015

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 4, 2015

Seems a little silly to be worried about the speed of it when it's not like we're loading the roughly all 600-300 chat messages per second that's across all Twitch servers input 1 page. It just wouldn't be needed. Should take a look at my example. I'm using my emote function there. I used no jQuery/underscore with it.. just tmi.js with a little bit of hard work over a few hours 😎

Edit: I did change it to use an html entity encoder thing I found online on StackOverflow somewhere, but that's it

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 4, 2015

Very true. Perhaps it's a case of premature optimization, but chat is the
most active part of any bot so reducing time spent on that is better than
nothing.

I use underscore for other things so I don't have a problem with the
dependency, but a plain JavaScript implementation is important too.

Just thought I'd put it out there for anyone else wanting to write against
the api.

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 4, 2015

Are you intending to make an entire, useful bot in HTML like on a page? (other than something like NW.js)

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 4, 2015

Yes, I'm using electron as my app container for my mod app and (separately)
bot using tmi.js. You can check out my repo here:
https://github.com/celluj34/OpenTwitchTools/

OpenMod is definitely more complete, as I started that first and is simpler.

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 28, 2015

Hey @AlcaDesign, one more thing I wanted to run by you. Do you do any automatic parsing of urls? I've not found a solution and I'd like to know how twitch does it.

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 28, 2015

@celluj34 I haven't had the need to implement URL parsing yet. Maybe try looking at how Markdown parsers detect standalone urls?

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 28, 2015

@celluj34 I also (badly) added BTTV support to my emote parser.

BTTV support

This chat is set for testing using TwitchPlaysPokemon's chat on Splinxes.com. It has limited capability as it's just for show. I could write it up better in a gist if you'd like

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 28, 2015

That would be awesome! I have BTTV support on my to-do list so if you've
already got something I'd love to have a look.

Joseph Cellucci
Software Developer
ITA Group
celluj34@gmail.com

On Tue, Jul 28, 2015 at 12:44 AM, Jacob Foster notifications@github.com
wrote:

@celluj34 https://github.com/celluj34 I also (badly) added BTTV support
to my emote parser.

[image: BTTV support]
https://camo.githubusercontent.com/e8da74caaa58d2a7bd62fa09317269b10261f61d/687474703a2f2f692e6779617a6f2e636f6d2f61623337316263373032646262313338396232386263646230626163353736372e676966

This chat is set for testing using TwitchPlaysPokemon's chat on
Splinxes.com http://splinxes.com/. It has limited capability as it's
just for show. I could write it up better in a gist if you'd like


Reply to this email directly or view it on GitHub
#11 (comment).

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 28, 2015

Do you use underscore or lodash?

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 28, 2015

Of course. I prefer underscore, but only because that's what I found first.

Joseph Cellucci
Software Developer
ITA Group
celluj34@gmail.com

On Tue, Jul 28, 2015 at 12:48 AM, Jacob Foster notifications@github.com
wrote:

Do you use underscore or lodash?


Reply to this email directly or view it on GitHub
#11 (comment).

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 28, 2015

But lodash Kreygasm

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Jul 28, 2015

They're working on merging them :) https://github.com/underdash/underdash

Joseph Cellucci
Software Developer
ITA Group
celluj34@gmail.com

On Tue, Jul 28, 2015 at 1:10 AM, Jacob Foster notifications@github.com
wrote:

But lodash [image: Kreygasm]
https://camo.githubusercontent.com/4e16f6566810b3090ff4f28095ff0913992310bd/68747470733a2f2f7374617469632d63646e2e6a74766e772e6e65742f656d6f7469636f6e732f76312f34312f312e30


Reply to this email directly or view it on GitHub
#11 (comment).

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Jul 28, 2015

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Aug 1, 2015

@celluj34 I don't know if you've found anything, but this looks good.

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Aug 1, 2015

@chevex

This comment has been minimized.

chevex commented Aug 20, 2015

I'm having a bit of a rough time with these as well. I got into a hairy mess with recursive regular expressions, but I've been down that road enough times that I knew to just get out early and use a real parsing library. I'm gonna take a look at the one you linked @AlcaDesign, though looking it I think it might be overkill.

@Schmoopiie

This comment has been minimized.

Member

Schmoopiie commented Oct 12, 2015

@celluj34 and @AlcaDesign, I added your code on the website to help the new developers, I will close this issue. If you don't want your code to be listed on the website, PM me on Twitter.

Thank you! 👍

@Schmoopiie Schmoopiie closed this Oct 12, 2015

@AlcaDesign

This comment has been minimized.

Member

AlcaDesign commented Oct 12, 2015

👍

@celluj34

This comment has been minimized.

Contributor

celluj34 commented Oct 12, 2015

:lgtm:
On Oct 12, 2015 12:21 AM, "Jacob Foster" notifications@github.com wrote:

[image: 👍]


Reply to this email directly or view it on GitHub
#11 (comment).

@Kequc

This comment has been minimized.

Kequc commented Aug 4, 2018

I wrote a solution before finding this thread that is maybe helpful for some people.

My project relied on having an array of characters that were written to the screen like a typewriter. So it seemed like a good idea to break it up into text strings one character long, and emote ids represented as numbers. Then while typing, it was able to discern "ok this is a character, this is an emote" as it goes.

This finds all indexes of the given emote code, inside of a string. It's basically indexOf but returns all indexes.

function indexesOf (text, code) {
    const result = [];
    let index = -1;

    while (true) {
        index = text.indexOf(code, index + 1);
        if (index === -1) break;
        result.push(index);
    }

    return result;
}

The structure of my emotes is built somewhere else in my code, based on the user and the channel. It's a plain object where keys are an available emote code and values are the emote id. For example it could look like this:

const emotes = {
    PartyHat: 965738,
    EarthDay: 959018,
    TombRaid: 864205,
    PopCorn: 724216
};

I use the following function to build an array where emote codes exist at the indexes they were found in the string. I use this array while breaking the text up into characters later.

function findEmotes (text, emotes) {
    const found = [];

    for (const code of Object.keys(emotes)) {
        for (const index of indexesOf(text, code)) {
            found[index] = code;
        }
    }

    return found;
}

Then finally I break the string down into characters and emotes.

export function getCharArray (text) {
    const emotes = getEmotes();
    const found = findEmotes(text, emotes);
    const charArray = [];

    for (let i = 0; i < text.length; i++) {
        if (found[i]) {
            // this is an emote
            const id = emotes[found[i]];
            charArray.push(id);
            i += (found[i].length - 1);
        } else {
            // this is text
            charArray.push(text[i]);
        }
    }

    return charArray;
}

For me it's exactly what I needed because then my typewriter effect just needs to go through every item in the charArray, check whether it is a number, and if so insert a emote. Otherwise just insert the text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment