Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to pause and upause subjects #982

Closed
mschwamb opened this issue Jun 16, 2015 · 17 comments
Closed

Ability to pause and upause subjects #982

mschwamb opened this issue Jun 16, 2015 · 17 comments

Comments

@mschwamb
Copy link

I got told to move this issue over here.

What happens if a science team puts in way more data than the classification rated can actually handle on then site and decide to switch tactics wanting to pause most of the live subjects except a small subset - ie Planet Four going for classifying full seasons during stargazing to targeting one region and classifying all the data for four seasons of it - switching to the new scheme meant pausing most of the season 1 data we uploaded

currently you'd have to make a new subject set and merge the previous classifications already in hand in processing. Is there any way/menu to pause and unpause subjects from the workflow editor or the backend?

@camallen
Copy link
Contributor

Our plan in panoptes was to hand over control of subject data management to project collaborators and thus to have all subjects active the whole time. This will fit the majority of use cases but you raise a good point.

Perhaps this is really about best practices to organise large amounts of subject data into sets and then being able to choose to activate/deactivate those sets for selection by linking to workflows. This setup would give you the control you desire. Failing that we could add an active state on the subject that would enable subjects to be served to volunteers. This state could then be modified through the front end like the subject uploader - this will add more data management complexity to the project owner though, linked to #886.

In my mind i think getting the subject data setup correctly in the first instance, noting that normal user will be limited to 10K subjects and will liaise with us to go beyond this amount so we can advise on best practice then.

@aliburchard thoughts re best practice for data management?

@mschwamb
Copy link
Author

Well I'm not sure telling people best practices will help the situation. Especially if it's a group that can't do data analysis it would mean re-uploading the smaller subet of subjects to create a new subject set and losing the previous classifications because I presume the aggregation software woudn't be able to handle that and then having duplicate Talk pages.

Planet Four was a Zooniverse made project with core team helping with data management and discussions - Arfon, Stuart, Chris L. were directly involved in communicating with the science team (I joined after the build was nearly complete so I don't know all the details), the Zooniverse devs were also making the subject data (parsing from large images). It was well known what was going in at launch and a few months later but, we still needed to pause subjects a year after stargazing to get a science output (since the classification rate dropped so significantly - fully expected but not taken into account with the second data upload - probably because we could pause subjects later on if it need be? ). Other projects like Cyclone Center have done that before as well focusing on just a single storm from looking at all storm data for a year.

So I think really having the future ability to pause even if it's through the API to have states like Ouroboros does to make some live or pause a subject will be a very handy capacity.

@mschwamb
Copy link
Author

Since @chrissnyder has done the bulk of pausing and unpausing on Planet Four for us, he might be able to comment about how frequently this is done on other custom projects and about the handiness of such a feature

@camallen
Copy link
Contributor

@mschwamb, I agree slow classification rates can be problematic. I may be wrong but I expect (at least to start with) that most panoptes projects will differ from existing zooniverse projects in data volumes.

@mschwamb
Copy link
Author

@camallen I disagree with you on that (but I may be very uninformed). My understanding was that all Zooniverse projects will use the panoptes back end. New custom built projects will also start using panoptes and that there is the eventual plan to migrate the old live projects.

I agree that maybe to start with a few of the panoptes project builder projects will be smaller in scale, but I don't think all will be. I agree this doesn't need to be at launch feature, but there still is not formula for if you're this type of project then you'll get X classifications per day, so I think having the ability to pause will still be needed even by custom projects since the trend will always be to put in more data than less and that estimates can be off. I'm in my 4th build and I still don't really know what the classification rate for P4:Terrains will be.

@vrooje
Copy link

vrooje commented Jun 17, 2015

I agree with Meg this would be a highly useful feature, and I also agree that research teams' needs regarding subject subsets may evolve over the course of a project.

What if we allowed a "subset" column in a manifest to be used to define these, and for a start just allowed owners & collaborators to specify which column that is, and activate or deactivate individual subsets? That would help a lot, but I think an important additional functionality would be the ability to retroactively add or change the "subset" column in a pre-existing manifest.

I also think it's worth adding a "so you have a LOT of subjects" section to the project building guide...

@edpaget
Copy link
Contributor

edpaget commented Jun 17, 2015

This already exists in the backend. You can 'retire' a subject for a workflow and 'unretire' it later. We'd need to a way to expose this on the front, but I'm not sure how since you'd potentially be looking at a 100k row table of subjects.

@edpaget edpaget closed this as completed Jun 17, 2015
@edpaget edpaget reopened this Jun 17, 2015
@edpaget
Copy link
Contributor

edpaget commented Jun 17, 2015

Ah whoops you got told to move it here.

@mschwamb
Copy link
Author

Yep here's the ticket on the front end from May 5th that was closed today with a response telling me that this was a panoptes api thing - if you want to re-open the ticket on the front-end here

@edpaget
Copy link
Contributor

edpaget commented Jun 17, 2015

Yeah it's something that's already supported. Just without an interface so Chris was wrong about that.

@edpaget edpaget closed this as completed Jun 17, 2015
@camallen
Copy link
Contributor

@edpaget seems we're conflating retired and not active this for selection this way. Subjects with less than retirement classifications will show up as retired in classification dumps and the aggregation engine will have to deal with this too.

@edpaget
Copy link
Contributor

edpaget commented Jun 17, 2015

I disagree. I think they're the same thing. We shouldn't be relying on the retired flag in the dumps anyway. I didn't the see the issue to add it, or I would have opposed it for the same reason as the 'Gold Standard dump'. If you're doing your own aggregations, its totally trivial for you to count the number of classifications for your subject and decide if its retired or not yourself.

The same for the aggregation engine. It should be applying its own retirement rules to subjects instead of relying on Panoptes for that.

I don't want to have to add yet another way of removing a subject from selection when this way works totally fine.

@parrish
Copy link
Contributor

parrish commented Jun 17, 2015

In my experience, most of the pausing/unpausing that we get requests for are for logical groups -- e.g. season, group, year, etc.

As long as we encourage users to create separate sets for these, they could manage them in bulk or simply by not selecting from them.

If you're doing your own aggregations, its totally trivial for you to count the number of classifications for your subject and decide if its retired or not yourself.

That's true only if your retirement is solely based on quantity.

The same for the aggregation engine. It should be applying its own retirement rules to subjects instead of relying on Panoptes for that.

That's true, though Panoptes still needs to hold the canonical state of a subject. It's more of a question about whether we can get away with a boolean on/off state. Limiting pausing to a subject set as a whole by not selecting from it would allow us to not introduce more states. Otherwise, that's what we'll need to support.

@camallen
Copy link
Contributor

In my mind, retired means we have reached a confident answer. Inactive means not available to be classified via our subjects routes, so the same functionality as not having a subject set linked to the workflow.

On possible side effect of retiring would be the retired counters (completed) would not reflect the real state of a project's workflows. I.e. I needed to pause 50% of my data and now it seems i've actually half completed my project.

I agree about adding more features to maintain this active state. However removing subjects from selection querying on a field active=false isn't too much of a change. E.g. here

SetMemberSubject.select(SELECT_FIELDS)
  .joins(subject_set: {workflows: :user_seen_subjects})
  .where(user_seen_subjects: {user_id: user.id}, workflows: {id: workflow.id})
  .where('"set_member_subjects"."active" = true')
  .where.not('? = ANY("set_member_subjects"."retired_workflow_ids")', workflow.id)
  .where.not('"set_member_subjects"."subject_id" = ANY("user_seen_subjects"."subject_ids")')

@mschwamb
Copy link
Author

I wanted to point out that I heavily rely on the counters from the current status page to determine what the status of a project is (and that shows what fraction is complete, active, and paused/inactive) and if the science team needs to push new data live/upload more.

If there is a way that can still show what's paused, completed, and active from a researcher perspective that would be extremely helpful I think.

@edpaget
Copy link
Contributor

edpaget commented Jun 17, 2015

We can generate those counters in other ways. They don't have to come from the Panoptes Rails app. @camallen won't pausing need to be a per workflow thing? So it can't happen as just a flag on the set member subject.

@camallen
Copy link
Contributor

@edpaget yep, you're right most likely unless there is a use case for project wide pausing subjects.

I still think that using retired for workflows is conflating behaviour between retired and inactive for selection. We could mimic the retired_workflow_ids behaviour for something like inactive_for_workflows on set_member_subjects but I agree it adds some complexity.

Why don't we go with using subject sets for logical grouping to provide a targeted set of active subjects that can be controlled by the user in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants