Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output Format #69

Open
JorKidd opened this issue Mar 9, 2023 · 1 comment
Open

Output Format #69

JorKidd opened this issue Mar 9, 2023 · 1 comment

Comments

@JorKidd
Copy link

JorKidd commented Mar 9, 2023

I'm trying to parse sentences from a corpus, and in the output there are numbers as shown in the attached image (e.g., '0', '0', '1,2', etc.). A file describing the output format stated:

%f[N1,N2,N3...]` | Display the N1, N2, and N3rd background, "," as the elimita

%FC[N1,N2,N3...] | The N1, N2, and N3rd background, C is displayed as a delimiter.However, if the element is empty, the display will be omitted. (Example) F-[0,1,2]

In this case, are these what the numbers are displaying? If so, what do the numbers mean exactly?

Also, is there any dependency tagging that we can use to extract nominal phrases?

MeCab Output

@polm
Copy link

polm commented Mar 10, 2023

Your question looks very similar to SamuraiT/mecab-python3#91.

To answer some specific points:

  • MeCab doesn't provide dependency parses or similar sentence structure information, it is not that kind of "parser"
  • the numbers in the last column are pitch accent data from UniDic, but that depends on your config settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants