pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 #83

tjiagoM · 2019-07-07T19:23:16Z

Hello,

I have to run multiple enrichments, over different groups of genes, so I just have a big for loop which goes over all these group of genes, and for each one just runs:

enr = gp.enrichr(gene_list=list(genes_array.astype('<U3')),
                         organism='human',
                         description='test',
                         gene_sets='Reactome_2016',
                         cutoff=1)

Once in a while I have this error:

Traceback (most recent call last):                       
File "my_script.py", line 83, in <module>
  cutoff=1)
File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 391, in enrichr
  enr.run()
File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 331, in run
  shortID, res = self.get_results(genes_list)
File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 169, in get_results
  res = pd.read_csv(StringIO(response.content.decode('utf-8')),sep="\t")
File "/home_location/miniconda/envs/env-general/lib/python3.6/site-packages/pandas/io/parsers.py", line 702, in parser_f
  return _read(filepath_or_buffer, kwds)
File "/home_location/miniconda/envs/env-general/lib/python3.6/site-packages/pandas/io/parsers.py", line 435, in _read
  data = parser.read(nrows)
File "/home_location/miniconda/envs/env-general/lib/python3.6/site-packages/pandas/io/parsers.py", line 1139, in read
  ret = self._engine.read(nrows)
File "/home_location/miniconda/envs/env-general/lib/python3.6/site-packages/pandas/io/parsers.py", line 1995, in read
  data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

I'm having a huge difficulty to isolate the error because this doesn't happen always for the same group of genes. Could anyone give an hint about what the problem could be, as I've started using gseapy only very recently?

If I cannot find the source of error I guess it's fine because I've been able to run for all the groups by just repeating the code... Which is quite annoying as I don't know whether some enrichment might be wrong. What could I be missing here?

The text was updated successfully, but these errors were encountered:

zqfang · 2019-07-08T01:23:22Z

How many gene groups are you querying? You got this problem because this line of code:

 res = pd.read_csv(StringIO(response.content.decode('utf-8')),sep="\t")

I don't know what happens. But I suggest the reason is network latency. gseapy wait for a long time
to get back results from Enricher server. I'll take a time to look at this

tjiagoM · 2019-07-08T07:53:45Z

Yeah, for some groups I have a few hundreds, but I ended up not saving any group because it constantly changes. I will try to run again and see for which groups it stops this time.

Now that you talk about it, sometimes gseapy was failing because of a connection reset exception, and I solved this by just adding a few milliseconds of sleep before calling enrichr() each time. Could it be that that response read by StringIO has some error/warning from the API request, and that's why pandas cannot read it properly?

tjiagoM · 2019-07-08T13:41:24Z

@zqfang I was going to create a new issue, but I'm now receiving another error in an inconsistent way (a bit like the error in this issue). Do you think it might be related to this?
Apologies for just throwing the exceptions here, but they just randomly appear, so maybe you might know better how to help me.

Traceback (most recent call last):
  File "07_explain_communitites.py", line 84, in <module>
    cutoff=0.05)
  File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 391, in enrichr
    enr.run()
  File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 309, in run
    gss = self.parse_genesets()
  File "/home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 68, in parse_genesets
    enrichr_library = self.get_libraries()
  File "home_location/.local/lib/python3.6/site-packages/gseapy/enrichr.py", line 183, in get_libraries
    libs = [lib['libraryName'] for lib in libs_json['statistics']]
KeyError: 'statistics'

zqfang · 2019-07-09T07:57:53Z

I think the problems you’ve had are for the same reason: the Enrichr server could not handle gseapy’s high concurrent requests from same IP address in a short time. It seems that user will be blocked to prevent API abuse. So, when you try to get the data back, you will get nothing. I have no other way to improve this, except adding sleep after each querying. Do you have any ideas?

tjiagoM · 2019-07-09T10:17:10Z

I see, thanks for the help anyway!

I'd say if you have a timeout from the Enrichr server, or some error in the returning answer from Enrichr, maybe just catch that and show to the user that the problem is with the Enrichr server (and maybe suggest wait a bit or reduce the number of requests). Otherwise all these errors will surely just bring confusion when the problem is actually simple, as you pointed out.

zqfang · 2019-07-10T06:06:34Z

Well, good idea. Warning should be printed out to the console if nothing gets back. Enrichr server are now upgrading. If you still have the same problem, then you need to re-run.

tsnetterfield · 2019-09-17T14:05:17Z

I am also getting the same error that @tjiagoM posted above executing the following on a list of about 50 genes:

en_rnk_1=gp.enrichr(gene_list=rnk1_en,description='test',gene_sets='NCI-Nature_2016',outdir='./GSEA Files/Selected Gene Sets')

I updated to the latest release and am still getting this issue, is there still a problem with the server that is causing this?

tsnetterfield · 2019-09-26T18:30:38Z

I have waited a week and I am still getting the same error?

`2019-09-26 14:28:42,305 Error fetching enrichment results: TRRUST_Transcription_Factors_2019
---------------------------------------------------------------------------
ParserError                               Traceback (most recent call last)
<ipython-input-59-902aeaec60e8> in <module>
----> 1 en_rnk_1=gp.enrichr(gene_list=rnk1_en,gene_sets='TRRUST_Transcription_Factors_2019',outdir='./GSEA Files/Selected Gene Sets')

~\Anaconda3\lib\site-packages\gseapy\enrichr.py in enrichr(gene_list, gene_sets, organism, description, outdir, background, cutoff, format, figsize, top_term, no_plot, verbose)
    415     enr = Enrichr(gene_list, gene_sets, organism, description, outdir,
    416                   cutoff, background, format, figsize, top_term, no_plot, verbose)
--> 417     enr.run()
    418 
    419     return enr

~\Anaconda3\lib\site-packages\gseapy\enrichr.py in run(self)
    354                 self._logger.debug("Start Enrichr using library: %s" % (self._gs))
    355                 self._logger.info('Analysis name: %s, Enrichr Library: %s' % (self.descriptions, self._gs))
--> 356                 shortID, res = self.get_results(genes_list)
    357                 # Remember gene set library used
    358             res.insert(0, "Gene_set", self._gs)

~\Anaconda3\lib\site-packages\gseapy\enrichr.py in get_results(self, gene_list)
    182         if not response.ok:
    183             self._logger.error('Error fetching enrichment results: %s'%self._gs)
--> 184         res = pd.read_csv(StringIO(response.content.decode('utf-8')), sep="\t")
    185         return [job_id['shortId'], res]
    186 

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
    700                     skip_blank_lines=skip_blank_lines)
    701 
--> 702         return _read(filepath_or_buffer, kwds)
    703 
    704     parser_f.__name__ = name

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
    433 
    434     try:
--> 435         data = parser.read(nrows)
    436     finally:
    437         parser.close()

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1137     def read(self, nrows=None):
   1138         nrows = _validate_integer('nrows', nrows)
-> 1139         ret = self._engine.read(nrows)
   1140 
   1141         # May alter columns / col_dict

~\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
   1993     def read(self, nrows=None):
   1994         try:
-> 1995             data = self._reader.read(nrows)
   1996         except StopIteration:
   1997             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas/_libs/parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2`

Any insight into why this may be happening?

zqfang · 2019-09-28T15:06:00Z

@tsnetterfield , Sorry for replying late. could you please install the lastest PR and try again? I've update the data that pandas read. Hope this will fix the problem you have

tsnetterfield · 2019-09-28T16:22:39Z

@zqfang Thanks for getting back to me! I updated my Python to 3.7.4 and am still getting the same error I posted above.

zqfang · 2019-09-29T01:14:38Z

@tsnetterfield , Please install the lastest gseapy using the this line of code:

pip install git+git://github.com/zqfang/gseapy.git#egg=gseapy

make sure that you are using v0.9.16

tsnetterfield · 2019-09-29T17:17:49Z

@zqfang When I do this in Anaconda Prompt this is the first line that comes up:

Requirement already satisfied: gseapy from git+git://github.com/zqfang/gseapy.git#egg=gseapy in c:\users\tatiana\anaconda3\lib\site-packages (0.9.15)

Anaconda seems to only see the 0.9.15 development version for some reason.

armadillocommander · 2019-09-29T17:58:06Z

You cannot install the same package with different version twice. Uninstall old one first.

…

Sent from my iPhone

On Sep 29, 2019, at 10:17 AM, tsnetterfield ***@***.***> wrote: @zqfang When I do this in Anaconda Prompt this is the first line that comes up: Requirement already satisfied: gseapy from git+git://github.com/zqfang/gseapy.git#egg=gseapy in c:\users\tatiana\anaconda3\lib\site-packages (0.9.15) Anaconda seems to only see the 0.9.15 development version for some reason. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

tsnetterfield · 2019-09-29T18:14:38Z

@armadillocommander thanks for the tip! I uninstalled and now have version 0.9.16. However, I am still getting the exact same parser error from above.

zqfang · 2019-09-30T02:20:56Z

@tsnetterfield , do you mind share me with your gene list input? I can't reproduce your bug

tsnetterfield · 2019-09-30T14:32:08Z

my_gene_list.txt

Hi @zqfang, attached is the list I was trying to run. I tried a different list just now and got the same error.

zqfang · 2019-10-08T03:15:14Z

@tsnetterfield , sorry for replying late. I was on vacation. However, I still could not reproduce the error you've got using the same code:

en_rnk_1=gp.enrichr(gene_list="my_gene_list.txt" ,description='test',gene_sets='NCI-Nature_2016',outdir='./GSEA Files/Selected Gene Sets')

Even I run the code for 50 times, it did not break.

zqfang · 2020-05-02T23:42:24Z

close now. this issue should be gone now

Eddy265 · 2021-02-23T18:46:40Z

Alternately, you can save the file as CSV UTF-8 (Comma delimited)

smartup10 · 2021-07-03T02:14:14Z

I had the same error I arranged regularizing the data in csv file.

zqfang added a commit that referenced this issue Sep 28, 2019

#83, pandas.errors.ParserError

aff4af2

zqfang added a commit that referenced this issue May 2, 2020

a new soulution to ParserError, #83

51649b4

zqfang closed this as completed May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 #83

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 #83

tjiagoM commented Jul 7, 2019

zqfang commented Jul 8, 2019

tjiagoM commented Jul 8, 2019 •

edited

Loading

tjiagoM commented Jul 8, 2019

zqfang commented Jul 9, 2019

tjiagoM commented Jul 9, 2019

zqfang commented Jul 10, 2019 •

edited

Loading

tsnetterfield commented Sep 17, 2019

tsnetterfield commented Sep 26, 2019 •

edited

Loading

zqfang commented Sep 28, 2019 •

edited

Loading

tsnetterfield commented Sep 28, 2019

zqfang commented Sep 29, 2019

tsnetterfield commented Sep 29, 2019

armadillocommander commented Sep 29, 2019 via email

tsnetterfield commented Sep 29, 2019

zqfang commented Sep 30, 2019

tsnetterfield commented Sep 30, 2019

zqfang commented Oct 8, 2019

zqfang commented May 2, 2020

Eddy265 commented Feb 23, 2021

smartup10 commented Jul 3, 2021

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 #83

pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2 #83

Comments

tjiagoM commented Jul 7, 2019

zqfang commented Jul 8, 2019

tjiagoM commented Jul 8, 2019 • edited Loading

tjiagoM commented Jul 8, 2019

zqfang commented Jul 9, 2019

tjiagoM commented Jul 9, 2019

zqfang commented Jul 10, 2019 • edited Loading

tsnetterfield commented Sep 17, 2019

tsnetterfield commented Sep 26, 2019 • edited Loading

zqfang commented Sep 28, 2019 • edited Loading

tsnetterfield commented Sep 28, 2019

zqfang commented Sep 29, 2019

tsnetterfield commented Sep 29, 2019

armadillocommander commented Sep 29, 2019 via email

tsnetterfield commented Sep 29, 2019

zqfang commented Sep 30, 2019

tsnetterfield commented Sep 30, 2019

zqfang commented Oct 8, 2019

zqfang commented May 2, 2020

Eddy265 commented Feb 23, 2021

smartup10 commented Jul 3, 2021

tjiagoM commented Jul 8, 2019 •

edited

Loading

zqfang commented Jul 10, 2019 •

edited

Loading

tsnetterfield commented Sep 26, 2019 •

edited

Loading

zqfang commented Sep 28, 2019 •

edited

Loading