Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError #5

Open
melissasunnivahill opened this issue Jan 11, 2024 · 4 comments
Open

TypeError #5

melissasunnivahill opened this issue Jan 11, 2024 · 4 comments

Comments

@melissasunnivahill
Copy link

I've been attempting to run analyses from the textcomplexity library but keep getting the following error:

TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'

Here's a deeper look at what's happening:
!txtcomplexity -i conllu 'output.conllu'

Traceback (most recent call last):
File "/usr/local/bin/txtcomplexity", line 12, in textcomplexity.cli.main()
File "/usr/local/lib/python3.10/dist-packages/textcomplexity/cli.py", line 194, in main sentences, graphs = zip(*conllu.read_conllu_sentences(f, ignore_case=args.ignore_case))
File "/usr/local/lib/python3.10/dist-packages/textcomplexity/utils/conllu.py", line 16, in read_conllu_sentences for sentence, sent_id in _read_conllu(f, ignore_case):
File "/usr/local/lib/python3.10/dist-packages/textcomplexity/utils/conllu.py", line 66, in _read_conllu sentence.append(UdToken(*fields))

TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'

Is this related to an error in my conllu file or how I'm using the textcomplexity library? Any help would be much appreciated! :)

@tsproisl
Copy link
Owner

Could you share the first few couple of lines from your input file?

@melissasunnivahill
Copy link
Author

Sure! Here is what the first few lines of my conllu file looks like:

1 # # X XX _ 2 dep _ _
2 Mixtures mixture VERB VBZ _ 2 ROOT _ _
3

SPACE	_SP	_	2	dep	_	_

4 The the DET DT _ 6 det _ _
5 next next ADJ JJ _ 6 amod _ _
6 time time NOUN NN _ 2 npadvmod _ _
7 you you PRON PRP _ 8 nsubj _ _
8 are be AUX VBP _ 6 relcl _ _
9 at at ADP IN _ 8 prep _ _
10 the the DET DT _ 11 det _ _
11 beach beach NOUN NN _ 9 pobj _ _
12 , , PUNCT , _ 2 punct _ _
13 pick pick VERB VB _ 2 conj _ _
14 up up ADP RP _ 13 prt _ _
15 a a DET DT _ 16 det _ _
16 handful handful NOUN NN _ 13 dobj _ _
17 of of ADP IN _ 16 prep _ _
18 sand sand NOUN NN _ 17 pobj _ _
19 . . PUNCT . _ 2 punct _ _
20

@tsproisl
Copy link
Owner

I don’t know if GitHub messed with the formatting, but it seems like the third token is a newline character? The txtcomplexity tool assumes that token information is on a single line, i.e. it cannot deal with tokens that contain literal newline characters and therefore span multiple lines. Out of curiosity: Do you happen to know how that file was created?

@melissasunnivahill
Copy link
Author

That makes sense, thanks! And my PhD advisor wrote a python program to convert txt files to conllu format; happy to add the code if you're interested in looking at it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants