don't classify desc as romanization if it is in data["categories"] #230
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Noticed some spurious romanizations. Consider for example Reconstruction:Latin/sufferio.
parse_word_head()
was going through the cleaned version of{{la-verb|4|*sufferiō|*sufferīv|*suffert}} {{lb|la|Proto-Romance}}
, which is*sufferiō (present infinitive *sufferīre, perfect active *sufferīvī, supine *suffertum); fourth conjugation (Proto-Romance)
, and was determiningProto-Romance
to be a romanization. This can be ruled out as a romanization since{{lb}}
generates the categoryProto-Romance
, which is indata["categories"]
. So this PR adds a check to make sure cases like this are avoided.Now, the spurious romanization is gone, but there now is a
DEBUG: unrecognized head form: Proto-Romance
that gets logged as thedesc
ends up not being classified. I left this alone as I'm not sure what the general system is for these classifications. (It's possible this check should be moved up higher in thefor desc_i, desc in enumerate(new_desc):
loop, depending what that system is...)(There happens to be an unrelated Lua execution error on this page when invoking
{{VL-conj-4th}}
, but that is neither here nor there for this PR.)