Skip to content

tspannhw/pulsar-mastodon-sink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pulsar-mastodon-sink

Mastodon data streaming

Run the app

python3 stream.py

2023-01-17 17:28:17.197 INFO  [0x16ec2b000] HandlerBase:72 | [persistent://public/default/mastodon-partition-0, ] Getting connection from pool
2023-01-17 17:28:17.200 INFO  [0x16ec2b000] ProducerImpl:190 | [persistent://public/default/mastodon-partition-0, ] Created producer on broker [127.0.0.1:56776 -> 127.0.0.1:6650]
20230117222817.0

JSON Schema

class mastodondata(Record):
    language = String()
    created_at = String()
    ts = Float()
    uuid = String()
    uri = String()
    url = String()
    favourites_count = Integer()
    replies_count = Integer()
    reblogs_count = Integer()
    content = String()
    username = String()
    accountname = String()
    displayname = String()
    note = String()
    followers_count = Integer()
    statuses_count = Integer()



Schemafied Data

bin/pulsar-client consume "persistent://public/default/mastodon" -s "mreader2" -n 0

----- got message -----
key:[20230117222421_069e9afb-7b4a-481e-b5d3-40c03e421415], properties:[], content:{
 "language": "ja",
 "created_at": "2023-01-17 22:24:18+00:00",
 "ts": 20230117222421.0,
 "uuid": "20230117222421_069e9afb-7b4a-481e-b5d3-40c03e421415",
 "uri": "https://mstdn.jp/users/hikara/statuses/109706887715535436",
 "url": "https://mstdn.jp/@hikara/109706887715535436",
 "favourites_count": 0,
 "replies_count": 0,
 "reblogs_count": 0,
 "content": "<p>\u3093\u30fc\u307e\u3063\uff01</p>",
 "username": "hikara",
 "accountname": "hikara@mstdn.jp",
 "displayname": "\u3070\u3076\u30732023",
 "note": "<p>B70 \u541b\u306e\u6b66\u52c7\u4f1d\u3067\u3059</p>",
 "followers_count": 1640,
 "statuses_count": 51576
}

----- got message -----
key:[20230117223725_f68a866e-1df1-4d5d-b977-eb151535f240], properties:[], content:{
 "language": "en",
 "created_at": "2023-01-17 22:37:25.347000+00:00",
 "ts": 20230117223725.0,
 "uuid": "20230117223725_f68a866e-1df1-4d5d-b977-eb151535f240",
 "uri": "https://mastodon.social/users/PaaSDev/statuses/109706939291797415",
 "url": "https://mastodon.social/@PaaSDev/109706939291797415",
 "favourites_count": 0,
 "replies_count": 0,
 "reblogs_count": 0,
 "content": "<p>I am working on an Apache Pulsar streaming application in Python to ingest mastodon messages.   <a href=\"https://github.com/tspannhw/pulsar-mastodon-sink\" target=\"_blank\" rel=\"nofollow noopener noreferrer\"><span class=\"invisible\">https://</span><span class=\"ellipsis\">github.com/tspannhw/pulsar-mas</span><span class=\"invisible\">todon-sink</span></a></p>",
 "username": "PaaSDev",
 "accountname": "PaaSDev",
 "displayname": "",
 "note": "",
 "followers_count": 0,
 "statuses_count": 1
}

Get Schema from Pulsar Schema Registry


bin/pulsar-admin schemas get persistent://public/default/mastodo
n
{
  "version": 5,
  "schemaInfo": {
    "name": "mastodon",
    "schema": {
      "type": "record",
      "name": "mastodondata",
      "fields": [
        {
          "name": "language",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "created_at",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "ts",
          "type": [
            "null",
            "float"
          ]
        },
        {
          "name": "uuid",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "uri",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "url",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "favourites_count",
          "type": [
            "null",
            "int"
          ]
        },
        {
          "name": "replies_count",
          "type": [
            "null",
            "int"
          ]
        },
        {
          "name": "reblogs_count",
          "type": [
            "null",
            "int"
          ]
        },
        {
          "name": "content",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "username",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "accountname",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "displayname",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "note",
          "type": [
            "null",
            "string"
          ]
        },
        {
          "name": "followers_count",
          "type": [
            "null",
            "int"
          ]
        },
        {
          "name": "statuses_count",
          "type": [
            "null",
            "int"
          ]
        }
      ]
    },
    "type": "JSON",
    "properties": {}
  }
}

REFERENCE

https://hachyderm.io/api/v1/streaming/public

About

Mastodon data streaming

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages