Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code breaks locally but runs fine remotely on hadoop cluster #2211

Closed
my-umd opened this issue Jun 25, 2021 · 2 comments
Closed

code breaks locally but runs fine remotely on hadoop cluster #2211

my-umd opened this issue Jun 25, 2021 · 2 comments

Comments

@my-umd
Copy link

my-umd commented Jun 25, 2021

(I couldn't find anything related after an intensive web search)
I am facing a very confusing issue. I believe the issue started from mrjob v0.6.8 and persists till the latest version. Here is the sample code:
`
from mrjob.job import MRJob
from mrjob.step import MRStep
import sys

class MRSp(MRJob):

def init_mr(self):
    sys.stderr.write('Processing file.\n')

def mapper(self, _, line):

    yield 1, 1

def print_header(self):
    header = 'test'
    print(header)

def steps(self):

    return [
        MRStep(mapper_init=self.init_mr,
               mapper=self.mapper,
               reducer_init=self.print_header,
               ),
    ]

if name == 'main':

MRSp.run()

`
(Note: don't know why underscores are stripped from above main function declaration)
To run the code, need to create a local directory (e.g., test_input) and put a random small text file in it. When running from a (CentOS) Linux shell in a (Python 3.8.9) virtualenv that has mrjob 0.6.7 installed, it runs fine. However, the code crashes with the following exception when running in a virtualenv that has mrjob 0.6.8 (and beyond) installed (the command line: python test.py test_input):
File "test.bytes.py", line 8, in init_mergedrs
sys.stderr.write('Processing file.\n')
TypeError: a bytes-like object is required, not 'str'

If I comment out the sys.stderr.write (putting a 'pass' statement), the code still crashes locally, but the offending line is now in the 'print(header)' statement (same exception).

The code runs fine remotely on hadoop cluster though (with either mrjob 0.6.7 or 0.6.8 and beyond). Checking v0.6.8 change log doesn't reveal anything that gives any hint. Can anybody help? Thanks.
(The issue also happens in Python 3.7 and 3.9).

@robinsonkwame
Copy link

robinsonkwame commented Dec 9, 2021

@my-umd did you resolve this? This may be helpful

@my-umd
Copy link
Author

my-umd commented Dec 9, 2021

Thanks @robinsonkwame. It turned out that I can't use print anymore. I have to use self.stdout.write.

@my-umd my-umd closed this as completed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants