Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add last updated tag to docs #245

Closed
tcmorris opened this issue May 29, 2018 · 11 comments
Closed

Add last updated tag to docs #245

tcmorris opened this issue May 29, 2018 · 11 comments
Labels
community/pr status/stale Marked as stale due to inactivity type/feature

Comments

@tcmorris
Copy link
Contributor

It would be useful to know when the last time a page within the documentation was updated at a glance.

@abjerner
Copy link
Contributor

@tcmorris Good idea 👍

But I'm afraid this requires quite a bit of changes. The current implementation downloads the docs repository as a ZIP file:

https://github.com/umbraco/UmbracoDocs/archive/master.zip

The file is then extracted on the Our server. Unfortunately all files share the same timestamp - probably for when the ZIP file was generated by GitHub.

If we need the correct timestamps, the current implementation should possibly be updated to clone the docs repository locally on the Our server, and fetch the pages and timestamps from the local Git repository directly. An advantage of this could also be that we might be able to get a list of users who helped writing the page. But ... I'd imagine it would take quite some time to implement.

@nul800sebastiaan
Copy link
Member

Actually fetching the git repo is definitely something I've been wanting to experiment with, and LibGit2Sharp is really pretty good and not too bad to implement! Might also help with making overwriting files easier (only changes files, for example). Currently the unzip+overwrite operation sometimes gets stuck on a locked file and then it's all over until a manual trigger fixes it again.

@abjerner
Copy link
Contributor

I couldn't help but play a bit around with this.

If we set up a local clone of the UmbracoDocs repository, we can't really use the timestamps of the files on disk, as these doesn't reflect when the file was changed on GitHub.

We can use git blame (eg. through LibGit2Sharp), which let's us both determine when the last commit was made and the people who have contributed to the current version of the file. But even though we can get the timestamp of the last commit, the changes are rarely deployed to the Our website right a way. So perhaps we really want to show the timestamp for when the file was actually deployed. @nul800sebastiaan any thoughts on how to best determine this?

A plus side of using git blame is that we can highlight the people that helped making a given file. Eg. for the cheatsheets index page:

https://github.com/umbraco/UmbracoDocs/blob/master/Cheatsheets/index.md

The contributors of this page are:

  • Rune Hem Strand (13 lines)
  • Shannon (2 lines)
  • Liam Laverty (1 line)

The same GitHub version for the blame can be found here:

https://github.com/umbraco/UmbracoDocs/blame/master/Cheatsheets/index.md

However as we're getting this information directly from Git, it's just names and email addresses, but not actual GitHub usernames. An email address might not even match the email address of the GitHub user making the commit/push. That makes it hard to match them against Our users.

For instance, Rune has used his Umbraco corporate email address for the commits, so assuming he has used the same email address for his Our account, we can couple his commits with his Our account. Shannon on the other hand has used his GMail account, so if he has used his Umbraco email address for his Our account, we won't be able to couple his commits with his Our account. We may be able to query some of this information in the GitHub API instead, but I haven't look into that part yet.

Anyways, by playing a bit around with LibGit2Sharp, I now have the following information. The part with memberId is just a mockup, as I obviously don't have the actual our members database to search.

image

The JSON meta data could be stored in a separate file on disk, or just as an extra field for the file in Examine. This makes sure we don't have to query the local Git repository all the name. Over time, the JSON meta data could also contain other information - eg. a proper breadcrumb, as the title of a page doesn't always mach the filename.


So far my playground code looks like this:

var repo = new Repository(dir);

var filePath = "Cheatsheets/index.md";


var hunks = repo.Blame(filePath).ToArray();

var lastModified = hunks.Max(x => x.FinalCommit.Committer.When);

<h3>Info</h3>
<pre><strong>Last modified: </strong> @lastModified.ToLocalTime().ToString("yyyy-MM-dd HH:mm")</pre>

<h3>Contributors</h3>

JObject meta = new JObject();

JArray contributors = new JArray();

meta.Add("lastWriteTime", lastModified.ToUniversalTime().ToString(TimeUtils.Iso8601DateFormat));

meta.Add("contributors", contributors);



foreach (var contributor in hunks.GroupBy(x => x.FinalCommit.Committer.Email).OrderByDescending(x => x.Sum(y => y.LineCount))) {

JObject c = new JObject();
contributors.Add(c);

c.Add("name", contributor.First().FinalCommit.Committer.Name);
c.Add("lines", contributor.Sum(x => x.LineCount));

if (contributor.First().FinalCommit.Committer.Email == "rune@umbraco.com") {
    c.Add("memberId", 115917);
}

}


<ul>
@foreach (var contributor in hunks.GroupBy(x => x.FinalCommit.Committer.Email).OrderByDescending(x => x.Sum(y => y.LineCount))) {
    <li>@contributor.First().FinalCommit.Committer.Name => @contributor.Sum(x => x.LineCount)</li>
}
</ul>


<h3>Hunks</h3>
<table border="1">
<tbody>
@foreach (BlameHunk hunk in repo.Blame(filePath)) {
    <tr>
	<td style="white-space: nowrap;">@hunk.FinalCommit.Committer.Name</td>
	<td style="white-space: nowrap;">@hunk.FinalCommit.Committer.When.ToLocalTime().ToString("yyyy-MM-dd HH:mm")</td>
	<td style="white-space: nowrap;">@hunk.FinalCommit.Sha</td>
	<td style="white-space: pre;">@hunk.FinalCommit.MessageShort</td>
	<td>@hunk.LineCount</td>
    </tr>
}
</tbody>
</table>

<h3>Jason</h3>
<pre>@meta</pre>

@nul800sebastiaan
Copy link
Member

But even though we can get the timestamp of the last commit, the changes are rarely deployed to the Our website right a way. So perhaps we really want to show the timestamp for when the file was actually deployed.

I'm not sure what you mean, the changes usually get deployed immediately when they are merged to master.
Of course they could've been in a PR for a while but in my mind that doesn't really make a big difference, the changes were made at the time Git says they were made, a deploy did not change them.

It's unfortunate that we can't use the committer's information but if we can get the commit Id then perhaps we can still query the GitHub API for the associated user? Not sure how much overhead this would be. It should of course be cached locally so we only query each commit once. Do you think that is a possibility?

@nul800sebastiaan
Copy link
Member

Does it make sense to put the metadata in the yaml header that the docs team has been working on?

@abjerner
Copy link
Contributor

I'm not sure what you mean, the changes usually get deployed immediately when they are merged to master.

I thought deploying from master to the Our website was a manual process. If not, then we could just use to timestamp from Git 👍


We do have the SHA hash for each, so we can look up individual commits via the API:

https://api.github.com/repos/umbraco/UmbracoDocs/commits/a3d412c89017c2a0e17afe10be6e0e5d8f450d81

Looking up each commit and then querying them later locally should indeed be possible. Caching could simply be just storing a JSON file on disk for each commit. And then a Hangfire job to pull new commits with a certain interval. I'm pretty sure that could work 😄

For reference, I think the documentation repo has just over 3100 commits.


We might be able to update the YAML bit in the Markdown files. I haven't looked at the code yet in their PR though.

Although to avoid merge conflicts in the local Git repo, we can't really add the meta data automatically, so we either need to create a copy of each Markdown file - or just store the automatic meta data in a JSON file.

@nul800sebastiaan
Copy link
Member

Looking up each commit and then querying them later locally should indeed be possible.

Sweet! This is great, might hit some API limits at some point but we can do the archive pretty slowly.

we can't really add the meta data automatically, so we either need to create a copy of each Markdown file

Ah yes, I didn't think about that, it would needs to be merged or something, but that's much too complicated. A separate metadata file might be easiest indeed.

@abjerner
Copy link
Contributor

I had another look at this, and realized that Our Umbraco now does clone/pull the documentation via Git, so we can now query the local repository as we discussed earlier.

This means that we can do something like:

BlameHunkCollection hunks = repo.Blame(filePath);

DateTimeOffset lastModified = hunks.Max(x => x.FinalCommit.Committer.When);

A downside here is that it apparently takes libgit2sharp about 700 ms (on my machine) to query the file I'm testing with (but also as discussed earlier, we should be able to cache our way out of this).

@tcmorris @nul800sebastiaan I may look into a PR for this. If so, any ideas for where the last edited date should be displayed?

Although they're using a horrible date format, Microsoft does something like this:
image

Including the last edited date in search results may however require some extra work. I haven't yet looked into how this would be done best.


Attributing each user who contributed to the list is now also somewhat possible, but there are still a few issues left. We do have the SHA hash, so we can look up the commits via the GitHub API.

But it turns out that not all commits are attributed to a specific GitHub user (probably because a user with a given email may not exist on GitHub, and a commit can pushed by another user).

@nul800sebastiaan any thoughts on whether this is actually something we want and that we/I should go forward with?

@abjerner
Copy link
Contributor

Small PoC, although not fully automatic yet. And only based on GitHub data - not Our members.

image

@nul800sebastiaan
Copy link
Member

That is wonderful @abjerner ! Hopefully we'll have some time next week when we're all together to wrap this one up and ship it!

@umbrabot
Copy link

Hiya @tcmorris,

Just wanted to let you know that we noticed that this issue got a bit stale and might not be relevant any more.

We will close this issue for now but we're happy to open it up again if you think it's still relevant (for example: it's a feature request that's not yet implemented, or it's a bug that's not yet been fixed).

To open it this issue up again, you can write @umbrabot still relevant in a new comment as the first line. It would be super helpful for us if on the next line you could let us know why you think it's still relevant.

For example:

@umbrabot still relevant
This bug can still be reproduced in version 8.9.0

This will reopen the issue in the next few hours.

Thanks, from your friendly Umbraco GitHub bot 🤖 🙂

@umbrabot umbrabot added the status/stale Marked as stale due to inactivity label Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community/pr status/stale Marked as stale due to inactivity type/feature
Projects
None yet
Development

No branches or pull requests

4 participants