-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add index fields to subject sets #5859
Conversation
Once we're happy with this, we'll need to port it over to the Python tools. |
Code looks good, though the only thing I haven't figured out yet is how/where/if the PFE Subject Set page creates Manifest files when none is provided. I'm running tests now to... (Basic tests)
Notes for self:
|
Maybe it should append to |
I think overwriting makes sense. ✅ If new Subjects are added to an existing Subject Set, I can see three scenarios...
I have no idea why I spent 10 minutes writing an essay that basically just says "actually overwrite is fine". |
This is a low impact, low priority consideration. In my head right now are the following paths:
Sudden thought: following previous post, what if the project owner uploads the first batch of Subjects with the manifest that looks like
Wait, bigger question: our Subject Set Indexing tool, does it CARE if the metadata field is EDIT: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review
This PR allows the PFE project builder's Subjects upload system to identify the special "index fields" that's relevant to a Subject Set.
- Index fields are identified in the Subject manifest by an
&
ampersand in their metadata field name (in the header row of the manifest.csv), e.g.&someField
. - When a batch of Subjects are uploaded to a Subject Set, if the batch has a Manifest file WITH
&
index fields defined, those index fields will be recorded in the Subject Set's metadata.- See the Testing section for a more solid example.
- When a Subject upload has NO Manifest included, or includes a Manifest file with
&
no index fields, then the upload proceeds normally and ignores all the aforementioned&
index fields considerations.
Testing
Baseline test:
- 5 image files are dragged and dropped into a new, empty Subject Set:
- URL:
https://local.zooniverse.org:3735/lab/1898/subject-sets/4825
- Result: no issues.
- All 5 images are uploaded normally. Each Subject only has the default 'Filename' in its
subject.metadata
. - Subject Set has an empty
subject_set.metadata
configuration.
- All 5 images are uploaded normally. Each Subject only has the default 'Filename' in its
- URL:
Main test:
- 5 images + 1 manifest file (with index fields) are dragged and dropped into a new, empty Subject Set:
- URL: https://local.zooniverse.org:3735/lab/1898/subject-sets/4824
- Manifest file looks like so:
character,!filename,&series,&awesomeness Mei,overwatch-2-mei.jpg,overwatch,10 Mercy,overwatch-2-mercy.jpg,overwatch,7 Tracer,overwatch-2-tracer.jpg,overwatch,9 Geed,ultraman-geed.png,ultraman,8 Orb,ultraman-orb.png,ultraman,10
- Manifest file looks like so:
- Result: no issues, but possible quirk (see dev notes)
- All 5 images are uploaded, with the appropriate metadata fields
- Subject Set has the following
subject_set.metadata
configuration:{ indexFields: "series,awesomeness" }
- URL: https://local.zooniverse.org:3735/lab/1898/subject-sets/4824
Tested on localhost+macOS10+Chrome. LGTM 👍
Dev Notes
❓ While the Subject Set metadata looks fine, there's a possible quirk with how the individual Subject metadata looks like.
In the testing example above, one of the Subjects has a subject.metadata
that looks like...
{
"!filename": "overwatch-2-mei.jpg"
"&awesomeness": "10"
"&series": "overwatch"
"character": "Mei"
}
...and I'm not sure if it's supposed to be...
{
"!filename": "overwatch-2-mei.jpg"
"awesomeness": "10"
"series": "overwatch"
"character": "Mei"
}
Basically I'm not sure if the Subject-searching functionality will go through every Subject looking for a metadata field called series
or &series
. ❓
Status
I'm going to give this a 👍 for now because the code works as is, and I'm not sure if the quirk mentioned in the dev notes is a quirk, or something working as expected.
The only additional check I'm going to do is to see if &
could be a special character in any scenario. It's not an issue with standard CSVs (i.e. if we're working with plain text editors) as far as I can tell, but I need to check with Microsoft Excel or Google Sheets or whatever to see that they don't try to play smart and encode &
as &
during an export operation. This is a VERY niche worry, so don't sweat it for this PR.
I think you're right about trimming each subject's metadata. The field names of the indexed metadata in the Redis search should match the actual field names, which means trimming extra characters from the subjects that are fed to Redis. |
@shaunanoordin I've added a helper to clean up individual subjects too, before uploading them. If this still looks good, I'll merge it on Monday. |
I'm also not sure that we'll use
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Review (Update)
These changes LGTM! I'm giving this a double approval.
- These new changes ensure that any index field that looks like
&someField
will be cleaned up so we getSubject.metadata.someField
(without the ampersand) - Tests look good.
- Baseline "5 images, no manifest" upload works fine.
- Main test of "5 images, manifest with &indexFields" works fine as well 👍
- (test manifest and image files are the exact same as with the initial PR review)
Status
LGTM. I have a very minor suggestion for the code, but this PR is overall ready to be merged on Monday. 👍
I used the wrong character. 🤦♂️ It should be |
Add a new field, subjectSet.metadata.indexFields, to new subject sets. This field contains a comma-separated list of any manifest headings that begin with `&`.
Move subject set helpers to `./helpers/subject-sets`. Add `cleanSubjectData` which trims whitespace and removes leading ampersands from subject metadata fields.
663264c
to
46a4131
Compare
Add a new field,
subjectSet.metadata.indexFields
, to new subject sets. This field contains a comma-separated list of any manifest headings that begin with&
. If none of those headings are present, then this field isn't created.For example, these manifest headings:
will set
subjectSet.metadata.indexFields
toorigin,attribution
.Staging branch URL: https://pr-5859.pfe-preview.zooniverse.org
Required Manual Testing
Review Checklist
rm -rf node_modules/ && npm install
and app works as expected?Optional
ChangeListener
orPromiseRenderer
components with code that updates component state?