Skip to content
This repository has been archived by the owner on Feb 27, 2024. It is now read-only.

[PROCESS] Term Data Model and Form #24

Open
vinomaster opened this issue Oct 7, 2020 · 9 comments
Open

[PROCESS] Term Data Model and Form #24

vinomaster opened this issue Oct 7, 2020 · 9 comments
Assignees

Comments

@vinomaster
Copy link
Contributor

Issue/Feature Description

We need to simplify the current Issue Template for Terms.

Proposed Solution

  1. Use Markdown
Term/Phrase: <value>
Definition: <value>
Usage example: <external reference>
Relavent Communities: <value>
Tags: <value>
  1. We should eliminate the existing Scope and Concept forms
@vinomaster
Copy link
Contributor Author

@dhh1128 Do we want each submission to be via github? What about batch submissions? How would we handle? I suggest a PR approach via a dedicated folder in the repo.

  1. All Issue based submission can be accepted by CTWG and dropped into that folder for handling.
  2. All batch submissions would create discrete .md files 1/term in that folder.
  3. We would not accept term.md files that contain multiple terms.

Thoughts?

@RieksJ
Copy link
Contributor

RieksJ commented Oct 13, 2020

Are we already clear about what it is we will be doing with those inputs, what the results of our doing will be and who will be using such results (for what purposes)? Did I miss something?

@vinomaster
Copy link
Contributor Author

No we are not clear -- those are process questions. Since we do not have a process in place I suggested that we place the .md file in a submission folder.

We discussed as a team that we will take a first pass at grabbing Sovrin (automated input) and Bedrock (manual input) as a start test case.

vinomaster added a commit to vinomaster/concepts-and-terminology that referenced this issue Oct 19, 2020
@vinomaster
Copy link
Contributor Author

As per our last mtg, I have submitted sample content from the BBU Glossary. Instead using a separate issue for each term I prepared a PR that places candidate terms in the submissions folder. The CTWG can now process these submissions and insert the content into whatever internal data store tool is used.

I believe @dhh1128 will be submitting Sovrin content to continue this sample exercise.

vinomaster added a commit that referenced this issue Oct 19, 2020
Issue #24 Bedrock submission samples.
@vinomaster
Copy link
Contributor Author

Will keep this issue open to allow @dhh1128 to submit his Sovrin changes.

@dhh1128
Copy link
Contributor

dhh1128 commented Oct 19, 2020

ETA on my part = EOD today

@dhh1128
Copy link
Contributor

dhh1128 commented Oct 20, 2020

Okay, I have a PR that represents a first pass extraction from the Sovrin Glossary: #26. The extraction was done with a script that I can modify and re-run; I'd like to improve the content before a merge, so please don't merge until we discuss.

Some questions I have, and things I want to discuss/fix before this sort of thing gets merged:

  1. Is "submissions" a simple triage bucket, or does it deserve to be a permanent archive? (I'm assuming that submitted terms get turned into canonical data through a combination of manual and automated transformations, and that the glossary generation process runs off canonical data, not raw submissions. Do we agree?)
  2. Many of the terms I've extracted need hyperlinks to one another. I believe @RieksJ has a way to do that with a slight tweak to the markup, to match the cross-linking feature he has from the Docusaurus approach. I'd like to discuss whether this can be added to our template.
  3. Capitalization. The Sovrin Glossary follows the convention of capitalizing all proper terms (a convention also used in many legal contracts in English). I don't like this convention because it's not natural. It shows up in the filenames I generated for the markdown.
  4. I didn't include any usage examples. However, I could probably generate examples from the Sovrin Governance Framework. Is that valuable?
  5. I didn't check, but I believe a few terms in my submission might overlap terms from Bedrock that Dan submitted. The communities are different (though perhaps somewhat overlapping). How do we avoid confusion when two labels have the same value but point to different things? (I know how to do it later, in canonical data; I'm just asking how to handle it during submission.)

@RieksJ
Copy link
Contributor

RieksJ commented Oct 20, 2020 via email

vinomaster added a commit to vinomaster/concepts-and-terminology that referenced this issue Oct 21, 2020
vinomaster added a commit that referenced this issue Oct 21, 2020
Issue #24 moved bedrock terms to bbu folder
@vinomaster
Copy link
Contributor Author

My POV:

  1. Is "submissions" a simple triage bucket, or does it deserve to be a permanent archive?

    • The glossary generation process will run off canonical data, not raw submissions.
    • The submissions folder is for capturing raw data prior to injecting it into the CTWG internal storage data model (TBD).
    • There are several submission input mechanisms:
      1. GitHub Issue which would need to be manually copied into a submission folder entry OR directly into CTWG internal storage data model
      2. Pull-request approach (batch) where "n" new terms are submitted as raw data for evaluation. This comes in several flavors;
      - Manual generation of terms in submissions/xxx where xxx is the name of a sub-folder containing new terms.
      - Automated extraction of terms in submissions/xxx where xxx is the name of a sub-folder containing new terms. This is accompanied by a new entry in the code/yyy folder where yyy contains the code specific to this extraction effort.
      - Automated extract of terms in submissions/xxx where xxx is the name of a sub-folder containing new terms. This is accompanied by a new entry in the code/yyy folder where yyy contains the code specific to this extraction effort. This entry also comes with some degree of job scheduling capability and management.
  2. Many of the terms I've extracted need hyperlinks to one another. I believe @RieksJ has a way to do that with a slight tweak to the markup, to match the cross-linking feature he has from the Docusaurus approach. I'd like to discuss whether this can be added to our template.
    - I am ok with template changes for this. We will have several until we get the model and process down.
    - Maybe the propoer step is a script that runs against the submission folder and preps new terms and submits them into CTWG internal storage data model and then deletes from submission folder. This to me is internal CTWG team process activity to manage from raw to canonical .

  3. Capitalization. The Sovrin Glossary follows the convention of capitalizing all proper terms (a convention also used in many legal contracts in English). I don't like this convention because it's not natural. It shows up in the filenames I generated for the markdown.I didn't include any usage examples. However, I could probably generate examples from the Sovrin Governance Framework. Is that valuable?
    - I do not like caps in files names so I always used filename convention "lowercaseword_lowercaseword"
    - Adding sample usage is a process thing. Do we require it? Makes sense for Issue based submissions but for autogenerated terms I would waive that and deal with in inside the canonical maturation process.

  4. I didn't check, but I believe a few terms in my submission might overlap terms from Bedrock that Dan submitted. The communities are different (though perhaps somewhat overlapping). How do we avoid confusion when two labels have the same value but point to different things? (I know how to do it later, in canonical data; I'm just asking how to handle it during submission.)
    - I would assume (at least initially) we allow for overlaps and deal with it via links in canonical data. These overlaps or relationships are something of a positive data point (artifact) that comes from this exercise.

  5. I was unable to determine best way to list tags. We need a convention that makes downstream search and glossary generation easier.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants