TypeError #5

melissasunnivahill · 2024-01-11T23:08:07Z

I've been attempting to run analyses from the textcomplexity library but keep getting the following error:

TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'

Here's a deeper look at what's happening:
!txtcomplexity -i conllu 'output.conllu'

Traceback (most recent call last):
File "/usr/local/bin/txtcomplexity", line 12, in textcomplexity.cli.main()
File "/usr/local/lib/python3.10/dist-packages/textcomplexity/cli.py", line 194, in main sentences, graphs = zip(*conllu.read_conllu_sentences(f, ignore_case=args.ignore_case))
File "/usr/local/lib/python3.10/dist-packages/textcomplexity/utils/conllu.py", line 16, in read_conllu_sentences for sentence, sent_id in _read_conllu(f, ignore_case):
File "/usr/local/lib/python3.10/dist-packages/textcomplexity/utils/conllu.py", line 66, in _read_conllu sentence.append(UdToken(*fields))

TypeError: UdToken.__new__() missing 9 required positional arguments: 'form', 'lemma', 'upos', 'xpos', 'feats', 'head', 'deprel', 'deps', and 'misc'

Is this related to an error in my conllu file or how I'm using the textcomplexity library? Any help would be much appreciated! :)

The text was updated successfully, but these errors were encountered:

tsproisl · 2024-01-22T07:44:22Z

Could you share the first few couple of lines from your input file?

melissasunnivahill · 2024-01-22T19:15:46Z

Sure! Here is what the first few lines of my conllu file looks like:

1 # # X XX _ 2 dep _ _
2 Mixtures mixture VERB VBZ _ 2 ROOT _ _
3

SPACE	_SP	_	2	dep	_	_

4 The the DET DT _ 6 det _ _
5 next next ADJ JJ _ 6 amod _ _
6 time time NOUN NN _ 2 npadvmod _ _
7 you you PRON PRP _ 8 nsubj _ _
8 are be AUX VBP _ 6 relcl _ _
9 at at ADP IN _ 8 prep _ _
10 the the DET DT _ 11 det _ _
11 beach beach NOUN NN _ 9 pobj _ _
12 , , PUNCT , _ 2 punct _ _
13 pick pick VERB VB _ 2 conj _ _
14 up up ADP RP _ 13 prt _ _
15 a a DET DT _ 16 det _ _
16 handful handful NOUN NN _ 13 dobj _ _
17 of of ADP IN _ 16 prep _ _
18 sand sand NOUN NN _ 17 pobj _ _
19 . . PUNCT . _ 2 punct _ _
20

tsproisl · 2024-01-22T20:14:11Z

I don’t know if GitHub messed with the formatting, but it seems like the third token is a newline character? The txtcomplexity tool assumes that token information is on a single line, i.e. it cannot deal with tokens that contain literal newline characters and therefore span multiple lines. Out of curiosity: Do you happen to know how that file was created?

melissasunnivahill · 2024-01-23T06:15:24Z

That makes sense, thanks! And my PhD advisor wrote a python program to convert txt files to conllu format; happy to add the code if you're interested in looking at it :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError #5

TypeError #5

melissasunnivahill commented Jan 11, 2024

tsproisl commented Jan 22, 2024

melissasunnivahill commented Jan 22, 2024

tsproisl commented Jan 22, 2024

melissasunnivahill commented Jan 23, 2024

TypeError #5

TypeError #5

Comments

melissasunnivahill commented Jan 11, 2024

tsproisl commented Jan 22, 2024

melissasunnivahill commented Jan 22, 2024

tsproisl commented Jan 22, 2024

melissasunnivahill commented Jan 23, 2024