Skip to content
This repository has been archived by the owner on Mar 24, 2020. It is now read-only.

Display a Date Record Last Updated at bottom of object pages #643

Closed
hjsyoo opened this issue May 30, 2019 · 40 comments · Fixed by #681 or ucsdlib/damsrepo#98
Closed

Display a Date Record Last Updated at bottom of object pages #643

hjsyoo opened this issue May 30, 2019 · 40 comments · Fixed by #681 or ucsdlib/damsrepo#98

Comments

@hjsyoo
Copy link

hjsyoo commented May 30, 2019

Descriptive summary

Display a Date Record Last Updated on bottom of all object landing pages. Frequently, data or metadata is updated in an object, but the Date:Issued and Publication year are intentionally not changed. Date Last Updated will succinctly indicate to the user when something on the page has been revised.


  • Can make it a label. @hjsyoo will come up with display spec.
  • SC&A is okay with this displaying on their content too.
@hjsyoo hjsyoo added the rdcp label May 30, 2019
@gamontoya gamontoya self-assigned this May 30, 2019
@gamontoya gamontoya changed the title Display a Data Record Last Updated at bottom of object pages Display a Date Record Last Updated at bottom of object pages Jun 11, 2019
@hjsyoo
Copy link
Author

hjsyoo commented Jun 11, 2019

@hjsyoo Post examples here

@mcritchlow
Copy link
Member

@hjsyoo @gamontoya - i think i'm going to start on this soon. I'll look at the internals for getting the date itself, however it would definitely help to have some kind of visual reference for what you're hoping to see when this is in place. Also, to clarify, is this just for object pages, not collection pages?

@mcritchlow mcritchlow self-assigned this Jul 1, 2019
@hjsyoo
Copy link
Author

hjsyoo commented Jul 1, 2019

@mcritchlow Yes, this is just for object pages. On coll pages, we were concerned that a user would have a hard time distinguishing whether the date info refers to updates on the page itself, or to anywhere in the entire collection.
I'm not sure what would work best visually - others might have a better sense for this. One minor consensus seemed to be that it should appear in a different style than the object metadata, and at the bottom of the page somewhere. I looked around briefly for examples on other sites, but had trouble finding any. @gamontoya Do you know of any places I should look?

@mcritchlow
Copy link
Member

Here's an update on my end. In our Solr records for objects, there are a few modified dates available, however they all seem to relate to the Solr document itself, rather than the actual object in the triplestore. Usually, this is probably a safe date to rely on, but it won't always be. A triggered reindex for a collection that results in a new solr document being created for an object would mean a new 'updated' date, even if the record metadata wasn't actually updated.

So, at the moment, there's no quick win here unless we're OK with the 'mostly correct' modified date in the Solr document.

We do also have events being stored for objects in the triplestore, but as I'm starting to look into those a bit more, and discussing the events implementation with @lsitu (many thanks), it is clear that's not a easy win either unfortunately. It seems like historically we don't have a lot of event data, and we don't enforce tracking it consistently and safely, so at the moment it's not something we can depend on.

The Solr modified date might be a good starting point to get something out the door. A proper modified date that's provided by the triplestore appears to be a project unto itself.

@hjsyoo
Copy link
Author

hjsyoo commented Jul 2, 2019

Thank you for the in-depth description of the situation. What are examples of events that would trigger a reindex for a collection, resulting in new updated dates for objects in that collection? An event I would expect to trigger the update would be if we replaced a component file or object-level metadata. Can you give examples of events that I might not expect to cascade into an updated date?

@mcritchlow
Copy link
Member

What are examples of events that would trigger a reindex for a collection, resulting in new updated dates for objects in that collection? An event I would expect to trigger the update would be if we replaced a component file or object-level metadata.

That's a great point, I should try and better clarify this. All of those actions you would expect to trigger a reindex/new Solr document would be reflected in the modified date in the Solr document. Updating metadata, adding components and files, etc.

Can you give examples of events that I might not expect to cascade into an updated date?

So in the past, though I don't think this is the case anymore, an update to the collection record metadata that an object is associated with would result in the reindexing of that object. This makes sense if, say, the collection title changes. But wouldn't if some random note in the collection changes. I believe @lsitu worked with @ucsdlib/domm a while back to resolve most of these unecessary reindex scenarios.

That's why I say I think the Solr modified date should mostly be a safe thing to use, but I was trying to make the point that because it's not directly tied to a modified date from the triplestore (since one doesn't exist) there's a possibility, however slight, for a date that's not directly tied to that object.

@lsitu - can you think of any current examples that might trigger a reindex? As I noted I think you fixed most/all of these previously. But my memory is a bit fuzzy on that.

@lsitu
Copy link
Member

lsitu commented Jul 3, 2019

@mcritchlow While going through our implementation for re-indexing with CLR update in collections, update of the following fields in a collection will trigger solr re-indexing of all objects of that collection and child collection:
Title
Visibility
Finding Aid

However, while doing a full reindex or reindex a collection with no metadata changes, the timestamp of SOLR update will be updated too.

@mcritchlow
Copy link
Member

@lsitu - thanks so much for providing those examples.

@hjsyoo
Copy link
Author

hjsyoo commented Jul 3, 2019

@lsitu @mcritchlow Thanks, the collection fields that trigger updates make sense to me (except Finding Aid, which I don't think RDCP uses). By full reindex, do you mean when all objects are reindexed, then every SOLR document in the DAMS gets an updated date? And am I correct in assuming that there is one SOLR document for every DAMS object?

@lsitu
Copy link
Member

lsitu commented Jul 3, 2019

@hjsyoo Yes, a full reindex will update solr for all objects in dams. And you are right on every object /collection CLR has its own SOLR document.

@hjsyoo
Copy link
Author

hjsyoo commented Jul 3, 2019

@mcritchlow So, I don't think we want to go with the last modified date in the solr document, since any reindexes would generate misleading information on the page - probably better to display no information! But down the road, what do you think would be a better solution? Would it be code development? Or creation of a new property, Date Modified, which editors would update manually?

@mcritchlow
Copy link
Member

mcritchlow commented Jul 3, 2019

@hjsyoo - Yeah, I think the safest way to go would be something along the lines of a lastModified predicate that's added to object/collection records in the triplestore that's managed by damsrepo automatically whenever a record is created/modified. That would align with how a normal relational database maintains this information. We'd then need to get that indexed into Solr in a format we could use for sorting.

@lsitu - You know damsrepo and the various ways people are allowed to create/edit records better than I do. Does that seem reasonable, or is there something else you would recommend?

EDIT: And to your point, if you'd like I can revert the commit that introduced this change so sorting goes back to the way it was (by title).

@lsitu
Copy link
Member

lsitu commented Jul 3, 2019

@mcritchlow Since the way that we are doing in DAMS4 is to add events while object/collection CLR are change (created, updated etc.), we may need to update damspas to utilize those events inserted for this supports.

@mcritchlow
Copy link
Member

@lsitu My impression from our conversation is that we're not reliably tracking those events in all cases, or that in some cases those events can be removed/wiped out. If that's not the case, then yes that would be fantastic if we can leverage that instead of creating something new.

@lsitu
Copy link
Member

lsitu commented Jul 3, 2019

@mcritchlow Yes, you are right that the updated history could be missing. But we have at lease one event with the time stamp for the last update that were made. So if we agree that the event for record creation and the update history won't be the issue, then we can do it this way. Creating something new won't fix the missing event history issue anyway.

@mcritchlow
Copy link
Member

@lsitu :

So if we agree that the event for record creation and the update history won't be the issue, then we can do it this way.

To clarify, do you mean that:

  • We will always have the most recently updated event
  • But that we also may have lost history prior to the most recent update, including the creation date?

But we have at lease one event with the time stamp for the last update that were made

This sounds very promising! and exactly what we need, assuming I'm understanding the situation.

@lsitu
Copy link
Member

lsitu commented Jul 3, 2019

@mcritchlow Yes. That's what I mean.

@mcritchlow
Copy link
Member

@lsitu - Ok thanks, I think we need to unpack this a bit more to move towards a solution, but it sounds like pursuing it via the events triplestore is the way to ideally go.

In the meantime, it's clear that I need to revert the existing change. So I'll put up a PR for that shortly

@hjsyoo
Copy link
Author

hjsyoo commented Jul 11, 2019

@gamontoya @mcritchlow Thank you - that display solution makes perfect sense.

@mcritchlow
Copy link
Member

@hjsyoo - Just FYI, I have a branch up that includes this change as well as the RDCP collection sorting change: #681 (see screenshot for last modified)

Longshou and I are waiting for the QA data sync to finish so we can reindex everything in Solr. Once that's done, I'll ping you to help us test things out if you don't mind.

mcritchlow added a commit that referenced this issue Jul 15, 2019
The last modified date is:
- Only shown on object views (not collections)
- Formatted to YYYY-MM-DD

see: #643
mcritchlow added a commit that referenced this issue Jul 16, 2019
The last modified date is:
- Only shown on object views (not collections)
- Formatted to YYYY-MM-DD

see: #643
mcritchlow added a commit that referenced this issue Jul 17, 2019
The last modified date is:
- Only shown on object views (not collections)
- Formatted to YYYY-MM-DD

see: #643
mcritchlow added a commit that referenced this issue Jul 26, 2019
The last modified date is:
- Only shown on object views (not collections)
- Formatted to YYYY-MM-DD

see: #643
mcritchlow added a commit that referenced this issue Aug 2, 2019
The last modified date is:
- Only shown on object views (not collections)
- Formatted to YYYY-MM-DD

see: #643
@hweng hweng closed this as completed in #681 Aug 2, 2019
hweng pushed a commit that referenced this issue Aug 2, 2019
The last modified date is:
- Only shown on object views (not collections)
- Formatted to YYYY-MM-DD

see: #643
@gamontoya gamontoya reopened this Sep 25, 2019
@gamontoya
Copy link

@mcritchlow When you have time, can you confirm if

dams:eventDate2019-09-25T08:53:06+0000</dams:eventDate>
dams:detailNo client specified</dams:detail>
dams:outcomesuccess</dams:outcome>

would trigger a new "Last Modified" date? Ryan replaced component 9 and the date was not updated:

@mcritchlow
Copy link
Member

It's possible we may need some adjustment to how damsrepo determines the last modified date.

@lsitu - can you take a look at this example in comparison to the recent adjustments to the logic whenever you have a chance? ucsdlib/damsrepo@c7e0447

@lsitu
Copy link
Member

lsitu commented Sep 27, 2019

@mcritchlow Yes, it seems like there is a gap there and we only discussed and handle the case for record created and record edited, while the timestamp @gamontoya noted above is from the event checksum calculated of the file in component 9:

<dams:type>checksum calculated</dams:type>
<dams:eventDate>2019-09-25T08:53:06+0000</dams:eventDate>
<dams:detail>No client specified</dams:detail>
<dams:outcome>success</dams:outcome>

I would expect a file modified or file added even to be added but I can't find it in the record. It looks like there could be a bug somewhere for file events. And comparing those different types of even types will be a little tricky in the style sheet. Do we want another ticket to fix the event bug first?

@mcritchlow
Copy link
Member

@lsitu - Yeah, based on your findings, another ticket to sort out the file event bug makes good sense. Do you have enough info to go on to write up the ticket in damsrepo?

@lsitu
Copy link
Member

lsitu commented Sep 27, 2019

@mcritchlow Not yet but I'll take a look and create a ticket for it.

@gamontoya
Copy link

@lsitu @mcritchlow Thank you guys ⛑

@lsitu
Copy link
Member

lsitu commented Sep 27, 2019

@gamontoya @mcritchlow I've found the problem and created ticket ucsdlib/damsrepo#96.

@gamontoya
Copy link

@lsitu Excellent detective work! 🕵

@lsitu
Copy link
Member

lsitu commented Sep 30, 2019

@gamontoya @mcritchlow With the file events logged in PR ucsdlib/damsrepo#97, I think we can calculate the "Last Modified" date by comparing those timestamps for file added, file modified and file deleted with the timestamps for record created and record edited. But the file addedevent is always added after therecord created`, so we may need to determine which timestamp should be showing up for the "Last Modified" date. What would you suggest?

@mcritchlow
Copy link
Member

mcritchlow commented Sep 30, 2019

@lsitu - is it fair to say the time difference between the example you gave: record created -> file added is going to be on the order of seconds/minutes difference? Essentially, is it just that the file addition step during ingest comes after the record itself is created?

If so, I suppose my instinct is to say we should still take the most recent date available for last modified to keep things as simple as possible, which in your example sounds like it would be file added?

@lsitu
Copy link
Member

lsitu commented Sep 30, 2019

@mcritchlow I think normally the record created -> file added could be very close. But for giant files or complex objects with lots of files, it could be ingested in different days.
I think if the record created timestamp is more important, and the file is ingested on the same day, we can use the record created as the "Last Modified" date. What do you think?

@mcritchlow
Copy link
Member

mcritchlow commented Sep 30, 2019

@lsitu - thanks, that's very helpful. I think perhaps we're at a point where some input from @gamontoya and others is probably needed.

I admittedly lean towards the simplicity of the most recent timestamp we have available, even if there's a gap. Because, in a sense, the record is modified each time those file ingests successfully complete. I can also see an argument the other way around :)

@lsitu
Copy link
Member

lsitu commented Sep 30, 2019

@mcritchlow It sounds good. I think we just need to decide so that we all in the same page. @gamontoya What would you suggest for it? Thanks.

@hjsyoo
Copy link
Author

hjsyoo commented Sep 30, 2019

I'm not sure I follow the difference well enough, so bear with me and ignore as needed... For Last Modified, we ideally would see the date that any updates to metadata or data were completed (and presumably made available to the user), so wouldn't it make the most sense to use File Added or File Modified, if that's the last event in the sequence?

@gamontoya
Copy link

@lsitu @mcritchlow I would go for the most recent update - as simple as that.

@lsitu
Copy link
Member

lsitu commented Sep 30, 2019

@gamontoya @hjsyoo Thanks. Let's move forward with the most recent timestamps for file changes then.

@lsitu
Copy link
Member

lsitu commented Oct 1, 2019

@mcritchlow Basing on the discussions above, I've added PR ucsdlib/damsrepo#98 to take file events into account for last modified date. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.