Scan omitted Grammar tagging in many instances #60

destatez · 2016-11-24T20:18:48Z

We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from section "I. GENERAL." at the beginning of the XML file.

Most of the current instances of tagging occur after the <form...> tag-pair and the <etym...> tag-pair and before the first <sense...> tag-pair, but there are also current instances that a a part of the contents of a <sense...> tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the <sense...> tag-pair or whether they should be "replaced" wherever they occur.

cbearden · 2016-11-25T00:27:03Z

Hi David, This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew text that weren't enclosed in <foreign> and to add the <foreign> tags. Possibly they could be adapted to this purpose as well. I may not get to that immediately, so others may beat me to the punch with a different approach. I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again. All the best, Chuck

…

On Thu, Nov 24, 2016 at 2:18 PM, David Statezni ***@***.***> wrote: We should identify below, all of the grammar abbreviations that occur which should have the grammar tagging around them. e.g. adv., for an adverb. A script should be able to be developed which can do a global replace (inclusion of the tagging) for each instance that is not already tagged. The list of these can be extracted from the frontal material. Most of the current instances of tagging occur after the <form...> tag-pair and the <etym...> tag-pair and before the first <sense...> tag-pair, but there are also current instances that a a part of the contents of a <sense...> tag-pair. A decision will need to made when developing and running this script, whether the "replacements" should only before the <sense...> tag-pair or whether they should be "replaced" wherever they occur. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#60>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St> .

destatez · 2016-11-25T01:02:26Z

Charles That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors. Dave On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden <notifications@github.com> wrote:

…

Hi David, This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew text that weren't enclosed in <foreign> and to add the <foreign> tags. Possibly they could be adapted to this purpose as well. I may not get to that immediately, so others may beat me to the punch with a different approach. I think there are a number of ways we could use scripts or XQuery to make the analysis and fixing of the markup faster. My immediate focus is on making the document valid TEI/OSIS again. All the best, Chuck On Thu, Nov 24, 2016 at 2:18 PM, David Statezni ***@***.***> wrote: > We should identify below, all of the grammar abbreviations that occur > which should have the grammar tagging around them. e.g. adv., for an > adverb. A script should be able to be developed which can do a global > replace (inclusion of the tagging) for each instance that is not already > tagged. The list of these can be extracted from the frontal material. > > Most of the current instances of tagging occur after the <form...> > tag-pair and the <etym...> tag-pair and before the first <sense...> > tag-pair, but there are also current instances that a a part of the > contents of a <sense...> tag-pair. A decision will need to made when > developing and running this script, whether the "replacements" should only > before the <sense...> tag-pair or whether they should be "replaced" > wherever they occur. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#60 >, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#60 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St> .

cbearden · 2016-11-26T00:54:15Z

Hi Dave, Would it be good to have a channel for general communications about the project, so as not to overload the Github 'issues' feature with more general topics? I don't know any way to contact you other than responding to this issue. There is a Google Group ("TExT: Abbott-Smith Project"), but the last posts in it were from me, about my efforts to tag Greek & Hebrew with <foreign>, about a year ago. For instance, I don't know anything about the work of manual review that is evidently going on (which is great news!). I'd like to get the XML file into valid shape, but I don't want to make life harder for those trying to merge my work with the results of their manual review. Also, I think we'll need to discuss some markup choices. Would it make sense to use the Google Group for general coordination and discussion, or is there another, better channel? All the best, Chuck On Thu, Nov 24, 2016 at 7:02 PM, David Statezni <notifications@github.com> wrote:

…

Charles That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors. Dave On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden ***@***.*** > wrote: > Hi David, > > This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and > 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew > text that weren't enclosed in <foreign> and to add the <foreign> tags. > Possibly they could be adapted to this purpose as well. I may not get to > that immediately, so others may beat me to the punch with a different > approach. > > I think there are a number of ways we could use scripts or XQuery to make > the analysis and fixing of the markup faster. My immediate focus is on > making the document valid TEI/OSIS again. > > All the best, > Chuck > > On Thu, Nov 24, 2016 at 2:18 PM, David Statezni < ***@***.***> > wrote: > > > We should identify below, all of the grammar abbreviations that occur > > which should have the grammar tagging around them. e.g. adv., for an > > adverb. A script should be able to be developed which can do a global > > replace (inclusion of the tagging) for each instance that is not already > > tagged. The list of these can be extracted from the frontal material. > > > > Most of the current instances of tagging occur after the <form...> > > tag-pair and the <etym...> tag-pair and before the first <sense...> > > tag-pair, but there are also current instances that a a part of the > > contents of a <sense...> tag-pair. A decision will need to made when > > developing and running this script, whether the "replacements" should > only > > before the <sense...> tag-pair or whether they should be "replaced" > > wherever they occur. > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub > > <https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60 > >, > > or mute the thread > > <https://github.com/notifications/unsubscribe-auth/ > AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St> > > . > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60#issuecomment-262859288>, > or mute the thread > <https://github.com/notifications/unsubscribe- auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St> > . > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#60 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAaEFrHkjuVkkc_JnjnERffTnINe-zn7ks5rBjOigaJpZM4K78St> .

destatez · 2016-11-26T01:51:58Z

Charles I just got connected to that Google Group. That sounds like a good means of communications. We really need to get Chapel and possibly Todd connected to it, since they are the leads. I cc'd then on this reply. I am just an editor and tool-guy. Dave On Fri, Nov 25, 2016 at 5:54 PM, Charles Bearden <notifications@github.com> wrote:

…

Hi Dave, Would it be good to have a channel for general communications about the project, so as not to overload the Github 'issues' feature with more general topics? I don't know any way to contact you other than responding to this issue. There is a Google Group ("TExT: Abbott-Smith Project"), but the last posts in it were from me, about my efforts to tag Greek & Hebrew with <foreign>, about a year ago. For instance, I don't know anything about the work of manual review that is evidently going on (which is great news!). I'd like to get the XML file into valid shape, but I don't want to make life harder for those trying to merge my work with the results of their manual review. Also, I think we'll need to discuss some markup choices. Would it make sense to use the Google Group for general coordination and discussion, or is there another, better channel? All the best, Chuck On Thu, Nov 24, 2016 at 7:02 PM, David Statezni ***@***.***> wrote: > Charles > > That sounds like a plan. I have been using perl to do all sorts of global > replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. > My thoughts on this particular topic and Issue 59, were to wait until all > manual editing is complete and use the scripts to "catch" any that were > missed by the editors. > > Dave > > On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden < ***@***.*** > > > wrote: > > > Hi David, > > > > This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and > > 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew > > text that weren't enclosed in <foreign> and to add the <foreign> tags. > > Possibly they could be adapted to this purpose as well. I may not get to > > that immediately, so others may beat me to the punch with a different > > approach. > > > > I think there are a number of ways we could use scripts or XQuery to make > > the analysis and fixing of the markup faster. My immediate focus is on > > making the document valid TEI/OSIS again. > > > > All the best, > > Chuck > > > > On Thu, Nov 24, 2016 at 2:18 PM, David Statezni < > ***@***.***> > > wrote: > > > > > We should identify below, all of the grammar abbreviations that occur > > > which should have the grammar tagging around them. e.g. adv., for an > > > adverb. A script should be able to be developed which can do a global > > > replace (inclusion of the tagging) for each instance that is not > already > > > tagged. The list of these can be extracted from the frontal material. > > > > > > Most of the current instances of tagging occur after the <form...> > > > tag-pair and the <etym...> tag-pair and before the first <sense...> > > > tag-pair, but there are also current instances that a a part of the > > > contents of a <sense...> tag-pair. A decision will need to made when > > > developing and running this script, whether the "replacements" should > > only > > > before the <sense...> tag-pair or whether they should be "replaced" > > > wherever they occur. > > > > > > — > > > You are receiving this because you are subscribed to this thread. > > > Reply to this email directly, view it on GitHub > > > <https://github.com/translatable-exegetical-tools/ > Abbott-Smith/issues/60 > > >, > > > or mute the thread > > > <https://github.com/notifications/unsubscribe-auth/ > > AAaEFpYzXKuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St> > > > . > > > > > > > — > > You are receiving this because you authored the thread. > > Reply to this email directly, view it on GitHub > > <https://github.com/translatable-exegetical-tools/ > Abbott-Smith/issues/60#issuecomment-262859288>, > > or mute the thread > > <https://github.com/notifications/unsubscribe- > auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St> > > . > > > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60#issuecomment-262861621>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AAaEFrHkjuVkkc_ JnjnERffTnINe-zn7ks5rBjOigaJpZM4K78St> > . > — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#60 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AQAi7wqOE0uxH7DT2JjL3ANVH0PASNStks5rB4M4gaJpZM4K78St> .

destatez · 2016-11-27T00:46:07Z

Charles It's taking some time to get approved for that Google Group, though I thought that I had received a message that I was. So, I can't answer you via a post against your latest topic. You can either wait until I get approved, or you can pass me your email address and I can send to a message about what the editors are doing. Your pick Dave

…

On Thu, Nov 24, 2016 at 6:02 PM, David Statezni ***@***.***> wrote: Charles That sounds like a plan. I have been using perl to do all sorts of global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do the job. My thoughts on this particular topic and Issue 59, were to wait until all manual editing is complete and use the scripts to "catch" any that were missed by the editors. Dave On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden ***@***.*** > wrote: > Hi David, > > This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and > 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew > text that weren't enclosed in <foreign> and to add the <foreign> tags. > Possibly they could be adapted to this purpose as well. I may not get to > that immediately, so others may beat me to the punch with a different > approach. > > I think there are a number of ways we could use scripts or XQuery to make > the analysis and fixing of the markup faster. My immediate focus is on > making the document valid TEI/OSIS again. > > All the best, > Chuck > > On Thu, Nov 24, 2016 at 2:18 PM, David Statezni ***@***.*** > > > wrote: > > > We should identify below, all of the grammar abbreviations that occur > > which should have the grammar tagging around them. e.g. adv., for an > > adverb. A script should be able to be developed which can do a global > > replace (inclusion of the tagging) for each instance that is not already > > tagged. The list of these can be extracted from the frontal material. > > > > Most of the current instances of tagging occur after the <form...> > > tag-pair and the <etym...> tag-pair and before the first <sense...> > > tag-pair, but there are also current instances that a a part of the > > contents of a <sense...> tag-pair. A decision will need to made when > > developing and running this script, whether the "replacements" should > only > > before the <sense...> tag-pair or whether they should be "replaced" > > wherever they occur. > > > > — > > You are receiving this because you are subscribed to this thread. > > Reply to this email directly, view it on GitHub > > <https://github.com/translatable-exegetical-tools/Abbott- > Smith/issues/60>, > > or mute the thread > > <https://github.com/notifications/unsubscribe-auth/AAaEFpYzX > KuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St> > > . > > > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#60 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St> > . >

dowens76 · 2016-11-27T08:49:53Z

Dave, you've already been approved for the group using your Gmail address. I approved you almost immediately. Try sending an email to
text-abbott-smith-project@googlegroups.com.

cbearden · 2016-11-27T14:35:34Z

Hi Dave, I was able to see your post to the group with the subject "Group Acceptance". Looks like you are able to post now. If you didn't get a copy of the reply in your email inbox, perhaps you just need to edit your email preference settings for the group. I'm looking forward to hearing about what's going on with the dictionary. I see you're with Wycliffe, which is very cool. All the best, Chuck On Sat, Nov 26, 2016 at 6:46 PM, David Statezni <notifications@github.com> wrote:

…

Charles It's taking some time to get approved for that Google Group, though I thought that I had received a message that I was. So, I can't answer you via a post against your latest topic. You can either wait until I get approved, or you can pass me your email address and I can send to a message about what the editors are doing. Your pick Dave On Thu, Nov 24, 2016 at 6:02 PM, David Statezni ***@***.***> wrote: > Charles > > That sounds like a plan. I have been using perl to do all sorts of > global replacements for the ULB, UDB, Notes, tW, etc. Either tool can do > the job. My thoughts on this particular topic and Issue 59, were to wait > until all manual editing is complete and use the scripts to "catch" any > that were missed by the editors. > > Dave > > On Thu, Nov 24, 2016 at 5:27 PM, Charles Bearden < ***@***.*** > > wrote: > >> Hi David, >> >> This sounds good. I have a pair of scripts ('scripts/find_foreign.py' and >> 'scripts/fix_foreign.py') that I used to find segments of Greek & Hebrew >> text that weren't enclosed in <foreign> and to add the <foreign> tags. >> Possibly they could be adapted to this purpose as well. I may not get to >> that immediately, so others may beat me to the punch with a different >> approach. >> >> I think there are a number of ways we could use scripts or XQuery to make >> the analysis and fixing of the markup faster. My immediate focus is on >> making the document valid TEI/OSIS again. >> >> All the best, >> Chuck >> >> On Thu, Nov 24, 2016 at 2:18 PM, David Statezni < ***@***.*** >> > >> wrote: >> >> > We should identify below, all of the grammar abbreviations that occur >> > which should have the grammar tagging around them. e.g. adv., for an >> > adverb. A script should be able to be developed which can do a global >> > replace (inclusion of the tagging) for each instance that is not already >> > tagged. The list of these can be extracted from the frontal material. >> > >> > Most of the current instances of tagging occur after the <form...> >> > tag-pair and the <etym...> tag-pair and before the first <sense...> >> > tag-pair, but there are also current instances that a a part of the >> > contents of a <sense...> tag-pair. A decision will need to made when >> > developing and running this script, whether the "replacements" should >> only >> > before the <sense...> tag-pair or whether they should be "replaced" >> > wherever they occur. >> > >> > — >> > You are receiving this because you are subscribed to this thread. >> > Reply to this email directly, view it on GitHub >> > <https://github.com/translatable-exegetical-tools/Abbott- >> Smith/issues/60>, >> > or mute the thread >> > <https://github.com/notifications/unsubscribe-auth/AAaEFpYzX >> KuN2WnlTwc30m5BGEpyVZOXks5rBfEpgaJpZM4K78St> >> > . >> > >> >> — >> You are receiving this because you authored the thread. >> Reply to this email directly, view it on GitHub >> <https://github.com/translatable-exegetical-tools/ Abbott-Smith/issues/60#issuecomment-262859288>, >> or mute the thread >> <https://github.com/notifications/unsubscribe- auth/AQAi7-cDL0zaXuULRaQDfVHM-R-z22K-ks5rBitYgaJpZM4K78St> >> . >> > > — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#60 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAaEFhnJTNgXWJ0cb_lVkZYUeFMg4S0Aks5rCNLQgaJpZM4K78St> .

toddlprice · 2016-12-30T09:23:08Z

Re: par. 2 of the 1st post: Yes, I do think that the grammar abbreviations even in the Sense sections should be tagged. This might be a bit beyond the original scope of making a digital representation of A-S, so perhaps this should wait until Stage 2 and be considered part of the UGL. What I mean is that I see use for it where the grammar tags in UGL can be linked to UGG so that these grammatical concepts are explained in our Grammar. That is beyond the Stage 1 goal.

toddlprice · 2017-01-18T12:55:36Z

Just to clarify, as part of digitizing A-S, we do want the grammar abbreviations to have tagging around them. This is valid and needed for stage 1. But linking those tags to UGG needs to wait until stage 2.

destatez · 2017-01-19T18:41:23Z

I have run across an issue on this topic. I have done searches of the XML looking for the POS "keywords" and have found instances of these that are a part of a description, as well as what I would call viable instances. I have attached some examples of the search output and need a little clarification on what should be and what shouldn't be tagged. The keywords that I used were as follows. The search would find any word that started with the keyword. That was why I had to qualify some to preclude others from appearing in the search.
adj, adv, article, conj, interj, num, part, prep, pron, subst, art. (and NOT article), super (and NOT superscript), noun (and NOT pron), verb (and NOT adv)

Non-tagged-POS.txt

toddlprice · 2017-01-20T09:50:22Z

I think the examples in your txt file (verb, part and art) should not be tagged. It looks like ptcp. should be tagged since it is used in lexical entries rather than in 'running text'.

destatez · 2017-01-20T18:54:18Z

I am concerned about the current state of the pos tags in A-S. There are currently 53 different ”values” that are tagged in the XML (see A_S_XML_pos_instance_text.txt). {I combined instances that were abbreviations or variations of abbreviations for those listed} There are total of 357 instances where these are tagged, with 29 of these being within the sense data (see A_S_pos_sense_Instances.txt). The remainder are within the orth data or etym data, which is where I would have expected them. My questions, as relates to automating the tagging of the XML file are:

Should I tag only instances that are in within the orth or etym data, or should I also include the instances in the sense data?
What text should I search for to do this tagging? {I put the list from the Issue in file: Possible_pos_values.txt, where I moved article, part, and verb to the DO NOT include list.} Could you review this list and move any other values to the DO NOT include list that you believe I should “ignore” for this tagging.

A_S_XML_pos_instance_text.txt

A_S_pos_sense_Instances.txt

Possible_pos_values.txt

toddlprice · 2017-01-23T09:10:31Z

Only tag what is in orth and etmy data.

destatez · 2017-01-27T09:02:56Z

Updated XML with only13 changes needed, when scope was reduced to orth & etym

destatez closed this as completed Jan 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scan omitted Grammar tagging in many instances #60

Scan omitted Grammar tagging in many instances #60

destatez commented Nov 24, 2016 •

edited

Loading

cbearden commented Nov 25, 2016 via email

destatez commented Nov 25, 2016 via email

cbearden commented Nov 26, 2016 via email

destatez commented Nov 26, 2016 via email

destatez commented Nov 27, 2016 via email

dowens76 commented Nov 27, 2016

cbearden commented Nov 27, 2016 via email

toddlprice commented Dec 30, 2016

toddlprice commented Jan 18, 2017

destatez commented Jan 19, 2017

toddlprice commented Jan 20, 2017

destatez commented Jan 20, 2017

toddlprice commented Jan 23, 2017

destatez commented Jan 27, 2017

Scan omitted Grammar tagging in many instances #60

Scan omitted Grammar tagging in many instances #60

Comments

destatez commented Nov 24, 2016 • edited Loading

cbearden commented Nov 25, 2016 via email

destatez commented Nov 25, 2016 via email

cbearden commented Nov 26, 2016 via email

destatez commented Nov 26, 2016 via email

destatez commented Nov 27, 2016 via email

dowens76 commented Nov 27, 2016

cbearden commented Nov 27, 2016 via email

toddlprice commented Dec 30, 2016

toddlprice commented Jan 18, 2017

destatez commented Jan 19, 2017

toddlprice commented Jan 20, 2017

destatez commented Jan 20, 2017

toddlprice commented Jan 23, 2017

destatez commented Jan 27, 2017

destatez commented Nov 24, 2016 •

edited

Loading