## Simple kafka setup for the Slack API using Python

### Prerequisits

* Install kafka (http://kafka.apache.org/downloads.html)
* Install kafka-python
    * pip install kafka-python
* Install slack-api for python
    * pip install slackclient
* Get a slack api token
    * (https://api.slack.com/docs/oauth-test-tokens)
* Start zookeeper-server (in kafka directory)
    * ./bin/zookeeper-server-start.sh config/zookeeper.properties
* Start kafka-server (in kafka directory)
    * ./bin/kafka-server-start.sh config/server.properties

In [None]:
# This is how you query the Slack team for all channels 
# TODO: See if DM channels are listed using different api call

import os
import time
from slackclient import SlackClient

token = 'your-token-here' 


channels = [channel_dict['id'] for channel_dict in sc.api_call("channels.list")['channels']]
print channels
    

In [None]:
%%writefile example.py
#!/home/kevin/slackpstone/bin/python
import threading, logging, time

from kafka import KafkaConsumer, KafkaProducer

# Replace the #! shebang with your env
# This is a simple Kafka setup using python
# On one thread we set up a producer and a topic called 'my-topic' and send two messages each second
# Example from https://github.com/dpkp/kafka-python/blob/master/example.py

# On another thread we set up a consumer and read the topic
class Producer(threading.Thread):
    daemon = True

    def run(self):
        producer = KafkaProducer(bootstrap_servers='localhost:9092')

        while True:
            producer.send('my-topic', b"test")
            producer.send('my-topic', b"\xc2Hola, mundo!")
            time.sleep(1)


class Consumer(threading.Thread):
    daemon = True

    def run(self):
        consumer = KafkaConsumer(bootstrap_servers='localhost:9092',
                                 auto_offset_reset='earliest')
        consumer.subscribe(['my-topic'])

        for message in consumer:
            print (message)


def main():
    threads = [
        Producer(),
        Consumer()
    ]

    for t in threads:
        t.start()

    time.sleep(20)

if __name__ == "__main__":
    logging.basicConfig(
        format='%(asctime)s.%(msecs)s:%(name)s:%(thread)d:%(levelname)s:%(process)d:%(message)s',
        level=logging.INFO
        )
main()

In [None]:
!python example.py

In [None]:
# This is how you read from channel history
# In this case we write to file

output_example =  open('slackpstone-channel-output.txt', 'w')

for channel in channels:
    channel_history = sc.api_call("channels.history", channel=channel, count="100000")
    for message_dict in channel_history['messages']:
        if 'user' in message_dict:
            output_example.write('{}\t{}\t{}\n'.format(
                message_dict['text'].replace('\n','').encode('utf-8'),
                    message_dict['user'], message_dict['ts']))
output_example.close()

# Example of the stuff we wrote to file
c = 0
with open('slackpstone-channel-output.txt', 'r') as f:
    for line in f:
        if c<10:
            print line.strip().split('\t')
            c += 1
        else:
            break

In [None]:
# This is how you get the team id from the slack api
sc.api_call('team.info')['team']['id']

In [None]:
%%writefile slack_example.py
#!/home/kevin/slackpstone/bin/python

# Integrating slack api and kafka
from slackclient import SlackClient
from kafka import KafkaConsumer, KafkaProducer
import threading, logging, time

producer = KafkaProducer(bootstrap_servers='localhost:9092')
c = 0
token = 'your-token-here'
sc = SlackClient(token)
team_id = sc.api_call('team.info')['team']['id']

# First we go through all the history
# I'm using the team_id as the topic name
channels = [channel_dict['id'] for channel_dict in sc.api_call("channels.list")['channels']]
for channel in channels:
    channel_history = sc.api_call("channels.history", channel=channel, count="100000")
    for message_dict in channel_history['messages']:
        if 'user' in message_dict:
            message = '{}\t{}\t{}\t{}\n'.format(
                message_dict['text'].replace('\n','').encode('utf-8'),
                channel, message_dict['user'], message_dict['ts'])
            producer.send(team_id, message)
            c += 1

# Second, we set up a Real Time Messaging API connection and listen for text messages
# TODO: Look into serialization with avro
# TODO: Look at encoding issues
# TODO: Iterate on message structure, what if any other messages we would like to send to kafka
# TODO: Look at emoji, reactions, etc:
if sc.rtm_connect():
    while True:
        latest = sc.rtm_read()
        if latest:
            if 'text' in latest[0]:
                message = '{}\t{}\t{}\t{}\n'.format(
                    latest[0]['text'].replace('\n','').encode('utf-8'), 
                    latest[0]['channel'], latest[0]['user'],
                    latest[0]['ts'])
                producer.send(team_id, message)
                c += 1
                print 'Sent {} messages'.format(c)
        time.sleep(5)

In [None]:
!python slack_example.py

## Looking Ahead

* Deployment (Flask ap?)
* Maybe using Flask + AWS Elastic Beanstalk? (http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create-deploy-python-flask.html)
* Can we write a bot that you add to your channel that streamlines this process?
* Can we use the same bot to serve our esul