Data Preprocessing #6

giriallada · 2019-04-20T10:39:12Z

Hey I have query about the data preprocessing part for model 4 and 5 . Whenever I try to preprocess the data this is what i end up with
Traceback (most recent call last): File "process_English.py", line 290, in <module> reviews = pd.read_csv(reviews_csv,header = 1) #skip first row (of header) File "/home/giri/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f return _read(filepath_or_buffer, kwds) File "/home/giri/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/giri/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__ self._make_engine(self.engine) File "/home/giri/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/home/giri/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__ self._reader = parsers.TextReader(src, **kwds) File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__ File "pandas/_libs/parsers.pyx", line 751, in pandas._libs.parsers.TextReader._get_header pandas.errors.ParserError: Passed header=1 but only 1 lines in file
I have preprocessed the data the data using the steps which abisee gave but I dont understand the csv part in ur method

The text was updated successfully, but these errors were encountered:

theamrzaki · 2019-04-20T18:31:11Z

the file expects an excel file , what is the format of the file that you have used ?

theamrzaki · 2019-04-26T03:37:25Z

in line 290 in preprocess data

  reviews_csv =cnn_stories_dir + "\ArabicBook00.csv"
  reviews = pd.read_csv(reviews_csv)  #header = 1 has been removed
  reviews = reviews.filter(['content', 'title'])
  reviews = reviews.dropna()
  reviews = reviews.reset_index(drop=True)
  reviews.head()

this reads an excel file in a csv format with the below format

content	title
content of first article	title of first article
content of second article	title of second article
content of third article	title of third article

this is a link of Reviews.csv
that you can use for this task

If you need the ready made output of this preprocessed data to help you getting started you can use this link , you can also follow this repo for more info about preprocessing data

this preprocessed data contains

folder for training
folder for testing
folder for validation
file of vocab

these folders contains the data in a binary chunked format

theamrzaki · 2019-07-06T09:02:19Z

We would close this issue for inactivity , however feel free to post a new issue , if a new problem appears , i would truly like to help you if a new problem arises.

theamrzaki added the Data Processing text dataset issues label Apr 20, 2019

theamrzaki added Model 4 Model 5 labels Apr 26, 2019

theamrzaki closed this as completed Jul 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Preprocessing #6

Data Preprocessing #6

giriallada commented Apr 20, 2019

theamrzaki commented Apr 20, 2019

theamrzaki commented Apr 26, 2019

theamrzaki commented Jul 6, 2019

Data Preprocessing #6

Data Preprocessing #6

Comments

giriallada commented Apr 20, 2019

theamrzaki commented Apr 20, 2019

theamrzaki commented Apr 26, 2019

theamrzaki commented Jul 6, 2019