Skip to content

Latest commit

 

History

History
196 lines (126 loc) · 6.21 KB

CONFIG.md

File metadata and controls

196 lines (126 loc) · 6.21 KB

Configuration format

Before using Pipitor, you need to make a configuration file to describe its behavior.

You can write the configuration in TOML, JSON or Dhall (optional behind dhall feature) formats. The configuration file should be named Pipitor.toml, Pipitor.json or Pipitor.dhall respectively.

Here is a cheatsheet of the configuration items along with their default values (except for mandatory ones) in TOML format:

database_url = "pipitor.sqlite3"
skip_duplicate = false

[websub] # optional
callback = "https://your-domain.example/websub/" # required if `[websub]` exists
bind = "tcp://0.0.0.0:8080" # default: uses `$LISTEN_FDS`
renewal_margin = 3600

[twitter]
user = 12345 # required
stream = true

[twitter.client]
identifier = "Your application's API key"
secret = "Your application's API secret"

[twitter.list] # optional
id = 12345 # required if `[twitter.list]` exists
interval = 1
delay = 1

[[rule]]
topics = [
    "https://example.com/feed",
    12345,
] # required
filter = "foo" # default: matches all entries
exclude = "bar" # default: does not match any entry

Following sections are detailed explanation of each item.

database_url

Optional. Path to place the database file. If unspecified, Pipitor will look for $DATABASE_URL value in the .env file or the environment variable, and then falls back on ./pipitor.sqlite3.

skip_duplicate

Optional. Whether to skip broadcasting duplicate entries.

A duplicate entry is an entry with exactly same content as an entry broadcasted in the past. For example, if an account Tweets "hello" and Pipitor Retweets that Tweet and then the account Tweets "hello" again, the second "hello" Tweet is considered a duplicate.

Defaults to false (broadcast all entries).

websub

Optional. Configuration of the WebSub subscriber server.

websub.callback

URI to use as the prefix of callback URIs of the subscriber server.

If you set this to https://example.com/websub/ for example, the callback URI will look like https://example.com/websub/UWfFAW_8wUQ.

websub.bind

Optional. Bind address of the subscriber server.

This takes an internet socket address like tcp://127.0.0.1:8080 or a path to Unix domain socket like unix:///path/to/socket.

If unspecified, Pipitor will attempt to use $LISTEN_FDS.

websub.renewal_margin

Optional. Duration between expiration time of subscriptions and timing that the subscriber server attempts to renew the subscriptions.

For example, if a subscription is going to expire at 2020-01-01T12:00:00Z and renewal_margin is set to one hour, the subscriber server will send a renewal request of the subscription to the hub at 2020-01-01T11:00:00Z.

A Duration value is a record like { secs = 1, nanos = 500000000 } or a single number which is a shorthand for { secs = .., nanos = 0 }. secs is a number of seconds of the duration and nanos is additional nanoseconds to the duration.

Defaults to 1 hour.

twitter

twitter.client

OAuth API key and secret of the Twitter App.

This takes a record like { identifier = "API key", secret = "API secret" }.

twitter.user

User ID of a Twitter account to retrieve Tweets as.

Thi account is used to make List and streaming API requests.

twitter.stream

Optional. Whether to use the streaming API.

Defaults to true.

twitter.list

Optional.

twitter.list.id

ID of a List to retrieve Tweets from.

Running the pipitor twitter-list-sync command fills the List with the accounts in rule[].topics list of the configuration.

twitter.list.interval

Optional. Duration (websub.renewal_margin) of intervals between API requests to retrieve Tweets from the List.

Defaults to 1 second.

twitter.list.delay

Optional. The maximum Duration of time to subtract from since_id parameter of API requests.

When retrieving Tweets from a List, the bot sets the since_id parameter to the largest Tweet ID the bot has received until then, in order to reduce bandwidth usage (without this, the bot would end up getting 200 Tweets every second, most of which are duplicate).

However, the Tweet IDs are not completely sorted in chronological order. Instead, they are roughly sorted, and a new Tweet with smaller ID than the above since_id may appear after the last request. To catch such Tweets, the bot may subtract a small amount of time from since_id as follows: since_id := min(since_id, now() - delay), where now() represents the current timestamp in the Snowflake ID space.

More accurately, Tweet IDs are k-sorted: for every two Tweets posted within k seconds of each other, their IDs fall in the same k-second frame in the ID space. Twitter has said in their blog (over 10 years before this writing though) that they were aiming to keep the k below 1 second. So twitter.list.delay value of 1 second should be sufficient, provided that the local time is in sync with Twitter's server time.

Defaults to 1 second.

rule

A list of rules to describe the topics to retrieve entries from, the entries to be broadcasted and the bot accounts to broadcast them.

rule[].topics

A list of user IDs of accounts to retrieve Tweets of and URIs of WebSub topics to retrieve entries from.

rule[].outbox

A list of user IDs of accounts to broadcast the entries as.

While you can specify multiple Twitter accounts as outbox of a single rule, it is advised to review Twitter's automation rules before doing so.

rule[].filter

Optional. Regular expression filter to match entries to be broadcasted.

This takes a record like { title = "..", text = ".." } (where text is optional) or a single text which is a shorthand for { title = ".." }. title will match against the title of feed entries and body of Tweets. text will match against the content and summary of feed entries.

If unspecified, all entries from the topics will be broadcasted.

rule[].exclude

Optional. Regular expression to filter out entries that have matched filter.

The format of values is the same as filter.