Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite in Python 3 #73

Closed
Krinkle opened this issue Jun 22, 2021 · 3 comments
Closed

Rewrite in Python 3 #73

Krinkle opened this issue Jun 22, 2021 · 3 comments

Comments

@Krinkle
Copy link
Member

Krinkle commented Jun 22, 2021

Rewrites are often a terrible idea for large projects, but I think it might be called for here. Firstly, it's a fairly small and simple project. It's only a few hundred lines of code, and the minimal required complexity is fairly low. Basically all we do is:

  • Connect to event source (irc.wikimedia.org currently, perhaps worth moving to EventStreams as part of the rewrite, or shorlty thereafter. This is something I would know how to do in Python, but not in C#).
  • Connect to the main server (Libera) and channels (feed channel + control channel).
  • For each incoming message:
    • Apply a few simple boolean filters to the meta data.
    • Run 1 database query to determine whether the page title, edit summary, or username match a watch list.
    • If accepted, format a string, and send it to the feed channel.

Apart from that, we have a few basic control commands for restarting and adding/removing entries on the watchlists (documentation), which perform some additional maintenance tasks such as querying the MediaWiki API once for namespace prefixes, a list of known bots and admins, and some interface messages for helping to determine whether something is an "automatic edit summary" with special meanings (blanked page, replaced contents, parse parameters for log events such as blocked/move/protections, etc.). The latter would not be needed anymore if we use EventStreams.

The database is currently Sqlite, and migrating that to support a shared MySQL database (#17) has long been blocked on familiarity with C# libaries and confidence in adding checking in additional DLL dependencies.

We currently have significant problems with the bot simply staying online, such as:

  • (#cvn-wikidata loses RCReader connection #72) the source connection to irc.wikimedia.org frequently ends up lost in mysterious ways, despite auto-reconnect and auto-rejoin being enabled in the IRC library that we use.
  • (Error "IRC: Closing Link" should be handled #64) the destination connection to Libera Chat often gets in a confused state after netsplits where it is not authenticated with NickServ, and it doesn't recover from this on its own, requiring a restart.

I'm hoping that the Python libraries for IRC are more mature and have this part just solved without requiring any attention. The cvn-clerkbot by comparison, which uses python-twisted, does not appear to have suffered from any connection problems. Although having said that, it doesn't send many messages, and we dont pay close attention to it, so this is something we'll have to see.

More broadly, me personally, I will feel much more motivated to fix bugs and make improvements in a language I actually understand and have good resources (and people) to lean on to help me with anything I don't know. I have absolutely no desire to learn more than the most basic of C# as I simply have no other outlet for applying that knowledge within my current job and the range of other open-source projects I maintain or contribute to. I also hope that by using Python, we'll have more people in our community able to contribute.

@Krinkle Krinkle added the meta label Jun 22, 2021
@Krinkle
Copy link
Member Author

Krinkle commented Jul 10, 2021

# Monday, July 5th, 2021

06:06 ⇐ •cvn-clerkbot quit (~cvn-clerk@cvn/bot/cvn-clerkbot) *.net *.split
06:06 → cvn-clerkbot joined (~cvn-clerk@185.15.56.20)

06:11 cvn-clerkbot → Guest7260

19:35 ⇐ Guest7260 quit (~cvn-clerk@185.15.56.20) *.net *.split
19:39 → Guest7260 joined (~cvn-clerk@185.15.56.20)

# Saturday, July 10th, 2021

16:08 Krinkle: !quit
16:08 ⇐ Guest7260 quit (~cvn-clerk@185.15.56.20) Quit: Ordered by Krinkle
16:10 → cvn-clerkbot joined (~cvn-clerk@cvn/bot/cvn-clerkbot)

cvn-clerkbot, which uses python-twisted for IRC, lost its nick name again and did not self-correct in any way. Alternatives to consider:

Things to consider:

  • Message buffering to avoid flood kick.
  • Message splitting against maxlength.
  • Connect and authenticate with NickServ, then join channels.
  • Automatic re-authenticate and nick regaining/ghosting as-needed to deal with net splits, plus re-joining of channels to deal with restricted channels that can only be joined when authenticated.

See also EventSource as used by Pywikibot, which has an example of good error handling as part of the loop.

Ref https://github.com/wikimedia/pywikibot/blob/2b8402a66e28ae4be30f74deb9e4e72ac529ef69/pywikibot/comms/eventstreams.py#L288-L303

@legoktm
Copy link
Member

legoktm commented Jul 11, 2021

Message buffering to avoid flood kick.

irc3 doesn't have this, we implemented it manually in wikibugs. I briefly searched the limnoria docs and didn't see anything obvious either (their flood stuff is about users flooding with !commands).

I do wonder if this is something that can be handled on the network side, getting some sort of higher flood limit or exemption.

Message splitting against maxlength.

Isn't this a think that should be done by the client, so that colors and whatnot are properly split or truncated? wikibugs has manual truncation logic that selects which projects should be listed when announcing a task, cutting off less important ones.

Connect and authenticate with NickServ, then join channels.

All 3 libraries support SASL, so this shouldn't ever be an issue.

Automatic re-authenticate and nick regaining/ghosting as-needed to deal with net splits, plus re-joining of channels to deal with restricted channels that can only be joined when authenticated.

We never did this for wikibugs, ib3 has mixins for this, appears limnoria does too. SASL should ensure that you're always authenticated when trying to join channels.

@Krinkle Krinkle pinned this issue Sep 18, 2021
@Krinkle Krinkle unpinned this issue Jan 16, 2023
@Krinkle
Copy link
Member Author

Krinkle commented Jan 17, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants