Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Have a way to periodically check exit/reconfigure conditions, even in the absence of tweet events #22

Closed
yar opened this Issue · 28 comments

8 participants

@yar
yar commented

It would be great to be able to periodically verify if more stream filters need to be added, or if a graceful exit is requested. Currently there seems to be no way, because the code is only executed when a tweet status arrives.

@stve
Owner

Check out 1.1.0.rc2. I've added an on_interval callback that creates a periodic timer inside the reactor. So you can do something like this:

client = TweetStream::Client.new

client.on_interval(30) do
  if condition?
    client.stop
  end
end

client.track("something")  do |status|
  puts "#{status.text}"
end

I haven't added a way to restart or modify the tracked attributes while it's running yet, but that's the general plan. Pull requests welcome.

@masterkain

Can please someone show how to properly change tracking attributes in the on_interval event? When I issue stop my script closes.

@stve
Owner

I haven't gotten the functionality to the point where you can do that yet. There isn't much left to do. The underpinnings are basically this: when you invoke track or any of the other methods on the TweetStream::Client it starts the EventMachine reactor and initiates the http connection. Stopping and starting requires a new connection/request to the stream. My general plan is to separate the connection from the reactor so that you can do something like this:

client = TweetStream::Client.new

client.on_interval(30) do
  if condition?
    client.track("something else")
    client.restart
  end
end

client.track("something")  do |status|
  puts "#{status.text}"
end

The interval was a first step towards that goal. I haven't worked on the restart method just yet. Pull requests welcome :octocat:

@auxbuss

Did you make any further progress with this?

@lunsher

Do somebody have idea how to do it better?

@auxbuss

Depends what you mean by better, I guess. I made some notes in #14 about how I'm handling things operationally, but that's probably not for everybody, though.

Another method might be to send a tweet via cron periodically and capture it in a monitoring daemon.

spagalloco's inclusion of on_interval is probably worth exploring. I haven't tried it myself yet. The method I though of using with it is to send a "heartbeat" message to a dedicated log via fluent, say, or simply touch a file, and monitor it for lack of updates.

Unless you self destruct the process, which has an inherent flaw if the process itself hangs, some mechanism will be required to gather the pid for destruction.

There's no simple way to tell whether an event loop is dead unless it sends out a periodic "I'm alive" message or you throw something at it and get something back within a defined timeframe. Either way, you have to monitor the response. So the on_interval self-destruct is probably as easy as it gets.

@lunsher

Target is to control and self update stream in scope of list of following users + some other checks. This require ability to self control: to able to stop and restart stream from itself(ideally from on_interval callback).
Simple variant is console command:
ruby stream_loader.rb restart

But how to call it from "on_interval" ?

In #14 you've mentioned intresting approach. As my main goal is to make stream downtime as small as possible - can you share your variant of rake task and/or upstart script?

PS I am intrested in "follow" method and reloading stream with updated list of users.

@auxbuss

I've added some notes to #14.

Once you have start and stop commands for the stream, it's easy to manage it however you like. But if you need to connect and disconnect to twitter, then I don't think tweetstream is designed to do that internaly -- and monkey patching it to do so would be a lot of brittle hacking, I suspect.

You also need to be careful about connecting and disconnecting to and from twitter streams too frequently. You will be rate limited (420) for doing so -- I know, because it happens to me sometimes when testing :o The streams are designed to be kept open.

@stve
Owner

In hindsight, I wish I hadn't added the on_interval method. All it really does is create a EM::PeriodicTimer. Since that time, we've added the ability to run a TweetStream::Client inside an already running EventMachine reactor. https://github.com/intridea/tweetstream/blob/master/lib/tweetstream/client.rb#L324

That opens up a lot more flexibility for you to add custom timers, as well as additional EM code like channels, queues, etc.

To give you guys a general update on things, we're in the process of readying TweetStream 2.0 which is slated to include @sferik's Twitter gem integration, a replacement for twitter-stream which I'll be releasing in the coming days (named: em-twitter), and possibly the Site Stream functionality. While, I don't anticipate refactoring TweetStream's connect method to support stream updates initially, I've built it into em-twitter and is the step in the direction needed to support this.

@lunsher

Nice to hear. :) For me the most cool feature would be ability to quickly reload stream with new params without ability to restart daemon(as it adds downtime).
Especially in parallel with using :monitor=>true feature.
Right now as it was mentioned daemonized stream aften fails with parsing error and i have to have separate periodical rake task to define moment to stop->start stream daemon with new params.

@stve
Owner

@lunsher yep, that's the direction we're heading.

@lunsher

Really cool! I can be you beta tester :)

@sferik
Owner

@spagalloco I agree about on_interval. Any reason not to remove it in 2.0?

@stve
Owner

I see no reason to keep it in 2.0

@stve
Owner

For anyone who's curious, I've released em-twitter. Suggestions and feedback welcome.

@auxbuss

This is excellent. Does it also work with userstream?

@stve
Owner

@auxbuss it should work with any host/path. I haven't tested it with userstream specifically but I've pushed the em_twitter branch which replaces twitter-stream with EM-twitter if you want to try it. I'll be working on that integration branch over the weekend. You might want to take a look at the Connection#update method which will update the params and reconnect.

@lunsher

Just installed this new EM-twitter branch - after adding accessor for stream to access it from client - update function works like a charm!

@lunsher

@spagalloco i think on_interval should be kept - it just makes things solid within single event_machine reactor.
I can setup there some useful checks and perform reloading stream with new params and this is great, especially in daemon mode.

Right now i am happy with something like:

@twitter_daemon.on_interval(20) do
  @twitter_daemon.stream.update( :params=>{:follow => new_list_to_follow} )
end 

Can it be somthing more clear without this callback?

@stve
Owner

@lunsher

You can still do timers, just need to use EM to do so:

EM.run do

  client = TweetStream::Client.new

  timer = EM::PeriodicTimer.new(20) do
    client.stream.update(:params=>{:follow => new_list_to_follow})
    # we'll be adding better methods here vs. exposing em-twitter's client, 
    # probably something like this:
    # 
    # client.params[:follow] = new_list_to_follow
    # client.restart
  end

  client.follow(list_to_follow) do |status|
    puts status.inspect
  end

end

The nice thing about not relying on an internal timer is that you can declare multiple timers for various intervals. The internal timer didn't allow you to do that. Ultimately, since on_interval was just a bare-bones implementation of EventMachine's periodic timer, I'd rather not include it, but could be convinced otherwise if enough people feel it's useful.

@lunsher

Nice! Thank you!

@auxbuss

Yes, this is very nice. I wish I'd understood this earlier.

Personally, I would be inclined to leave EM exposed, perhaps with some handy helper functions. I'm well aware of good OO practice, but twitter listeners are fundamentally scripts, and so being able to reach inside without having to contort oneself is a benefit.

@lunsher

Just tryed it in daemon mode but it looks like freezed on start - so script is not returning. The same is on stop.

EM.run do

timer = EM::PeriodicTimer.new(20) do

end

@twitter_daemon = TweetStream::Daemon.new(...)

@twitter_daemon.on_timeline_status do |status|
end

@twitter_daemon.follow(...)

end

Dont you know how to deal with it?

@stve
Owner

I thought that would have worked, but perhaps the EM event loop is preventing the daemon from kicking off. Another thing you could try is to create the timer in the on_inited callback:

client = TweetStream:: Daemon.new
client.on_inited do
  timer = EM::PeriodicTimer.new(20) do
    client.stream.update(:params=>{:follow => new_list_to_follow})
  end
end

client.follow(list_to_follow) do |status|
  puts status.inspect
end
@lunsher

looks working. thanks :)

@sferik sferik closed this
This was referenced
@denen99

For updating a track would it be client.stream.update(:params=>{:track => "#hash1,#hash2"}) as an example ?

Also curious how this is better than client.stop_stream and then re-running client.track again, dont both disconnect from the stream (which is not advised by twitter to do too frequently). Unless there is no way to update your stream query without disconnecting of course.

@stve
Owner

The only way to modify the stream is to reconnect. What client.stream.update will do is modify the parameters and then perform an immediate reconnect. You can of course accomplish via other means (stop_stream, then track) but there's really no difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.