-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to pause and upause subjects #982
Comments
Our plan in panoptes was to hand over control of subject data management to project collaborators and thus to have all subjects active the whole time. This will fit the majority of use cases but you raise a good point. Perhaps this is really about best practices to organise large amounts of subject data into sets and then being able to choose to activate/deactivate those sets for selection by linking to workflows. This setup would give you the control you desire. Failing that we could add an active state on the subject that would enable subjects to be served to volunteers. This state could then be modified through the front end like the subject uploader - this will add more data management complexity to the project owner though, linked to #886. In my mind i think getting the subject data setup correctly in the first instance, noting that normal user will be limited to 10K subjects and will liaise with us to go beyond this amount so we can advise on best practice then. @aliburchard thoughts re best practice for data management? |
Well I'm not sure telling people best practices will help the situation. Especially if it's a group that can't do data analysis it would mean re-uploading the smaller subet of subjects to create a new subject set and losing the previous classifications because I presume the aggregation software woudn't be able to handle that and then having duplicate Talk pages. Planet Four was a Zooniverse made project with core team helping with data management and discussions - Arfon, Stuart, Chris L. were directly involved in communicating with the science team (I joined after the build was nearly complete so I don't know all the details), the Zooniverse devs were also making the subject data (parsing from large images). It was well known what was going in at launch and a few months later but, we still needed to pause subjects a year after stargazing to get a science output (since the classification rate dropped so significantly - fully expected but not taken into account with the second data upload - probably because we could pause subjects later on if it need be? ). Other projects like Cyclone Center have done that before as well focusing on just a single storm from looking at all storm data for a year. So I think really having the future ability to pause even if it's through the API to have states like Ouroboros does to make some live or pause a subject will be a very handy capacity. |
Since @chrissnyder has done the bulk of pausing and unpausing on Planet Four for us, he might be able to comment about how frequently this is done on other custom projects and about the handiness of such a feature |
@mschwamb, I agree slow classification rates can be problematic. I may be wrong but I expect (at least to start with) that most panoptes projects will differ from existing zooniverse projects in data volumes. |
@camallen I disagree with you on that (but I may be very uninformed). My understanding was that all Zooniverse projects will use the panoptes back end. New custom built projects will also start using panoptes and that there is the eventual plan to migrate the old live projects. I agree that maybe to start with a few of the panoptes project builder projects will be smaller in scale, but I don't think all will be. I agree this doesn't need to be at launch feature, but there still is not formula for if you're this type of project then you'll get X classifications per day, so I think having the ability to pause will still be needed even by custom projects since the trend will always be to put in more data than less and that estimates can be off. I'm in my 4th build and I still don't really know what the classification rate for P4:Terrains will be. |
I agree with Meg this would be a highly useful feature, and I also agree that research teams' needs regarding subject subsets may evolve over the course of a project. What if we allowed a "subset" column in a manifest to be used to define these, and for a start just allowed owners & collaborators to specify which column that is, and activate or deactivate individual subsets? That would help a lot, but I think an important additional functionality would be the ability to retroactively add or change the "subset" column in a pre-existing manifest. I also think it's worth adding a "so you have a LOT of subjects" section to the project building guide... |
This already exists in the backend. You can 'retire' a subject for a workflow and 'unretire' it later. We'd need to a way to expose this on the front, but I'm not sure how since you'd potentially be looking at a 100k row table of subjects. |
Ah whoops you got told to move it here. |
Yep here's the ticket on the front end from May 5th that was closed today with a response telling me that this was a panoptes api thing - if you want to re-open the ticket on the front-end here |
Yeah it's something that's already supported. Just without an interface so Chris was wrong about that. |
@edpaget seems we're conflating retired and not active this for selection this way. Subjects with less than retirement classifications will show up as retired in classification dumps and the aggregation engine will have to deal with this too. |
I disagree. I think they're the same thing. We shouldn't be relying on the retired flag in the dumps anyway. I didn't the see the issue to add it, or I would have opposed it for the same reason as the 'Gold Standard dump'. If you're doing your own aggregations, its totally trivial for you to count the number of classifications for your subject and decide if its retired or not yourself. The same for the aggregation engine. It should be applying its own retirement rules to subjects instead of relying on Panoptes for that. I don't want to have to add yet another way of removing a subject from selection when this way works totally fine. |
In my experience, most of the pausing/unpausing that we get requests for are for logical groups -- e.g. season, group, year, etc. As long as we encourage users to create separate sets for these, they could manage them in bulk or simply by not selecting from them.
That's true only if your retirement is solely based on quantity.
That's true, though Panoptes still needs to hold the canonical state of a subject. It's more of a question about whether we can get away with a boolean on/off state. Limiting pausing to a subject set as a whole by not selecting from it would allow us to not introduce more states. Otherwise, that's what we'll need to support. |
In my mind, retired means we have reached a confident answer. Inactive means not available to be classified via our subjects routes, so the same functionality as not having a subject set linked to the workflow. On possible side effect of retiring would be the retired counters (completed) would not reflect the real state of a project's workflows. I.e. I needed to pause 50% of my data and now it seems i've actually half completed my project. I agree about adding more features to maintain this active state. However removing subjects from selection querying on a field SetMemberSubject.select(SELECT_FIELDS)
.joins(subject_set: {workflows: :user_seen_subjects})
.where(user_seen_subjects: {user_id: user.id}, workflows: {id: workflow.id})
.where('"set_member_subjects"."active" = true')
.where.not('? = ANY("set_member_subjects"."retired_workflow_ids")', workflow.id)
.where.not('"set_member_subjects"."subject_id" = ANY("user_seen_subjects"."subject_ids")') |
I wanted to point out that I heavily rely on the counters from the current status page to determine what the status of a project is (and that shows what fraction is complete, active, and paused/inactive) and if the science team needs to push new data live/upload more. If there is a way that can still show what's paused, completed, and active from a researcher perspective that would be extremely helpful I think. |
We can generate those counters in other ways. They don't have to come from the Panoptes Rails app. @camallen won't pausing need to be a per workflow thing? So it can't happen as just a flag on the set member subject. |
@edpaget yep, you're right most likely unless there is a use case for project wide pausing subjects. I still think that using retired for workflows is conflating behaviour between retired and inactive for selection. We could mimic the Why don't we go with using subject sets for logical grouping to provide a targeted set of active subjects that can be controlled by the user in the meantime. |
I got told to move this issue over here.
What happens if a science team puts in way more data than the classification rated can actually handle on then site and decide to switch tactics wanting to pause most of the live subjects except a small subset - ie Planet Four going for classifying full seasons during stargazing to targeting one region and classifying all the data for four seasons of it - switching to the new scheme meant pausing most of the season 1 data we uploaded
currently you'd have to make a new subject set and merge the previous classifications already in hand in processing. Is there any way/menu to pause and unpause subjects from the workflow editor or the backend?
The text was updated successfully, but these errors were encountered: