Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading model from local in RNN prediction is slower than from HDFS due to page fault #16856

Closed
kdmxen opened this issue Feb 8, 2018 · 11 comments
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:community support Status - Community Support stat:contribution welcome Status - Contributions welcome type:bug Bug

Comments

@kdmxen
Copy link

kdmxen commented Feb 8, 2018

We have trained a RNN model and use it to predict. We feed some data and calculate QPS in prediction. We find that when CPU usage is above than 30%, the QPS always stayed in 900+. And not increasing linearly by CPU usage. But if we put the model in HDFS, The QPS can reach 2400+.

Our system infomation:

    OS:  RedHat 7.2
    CPU:  2 * 16 core * 2 thread
    Memory: 512G in 1 node

In local model case, we use performance tool to trace function call time and find nearly 20% time hanged in page fault which lead to spin_lock. Those page fault occurs less than 1% in hdfs situation.

Our performance result listed as below:

model loading from local:
local

model loading from hdfs:
hdfs

We check the source code (both eigen and tensorflow ) again and again and can not find any suspectable code which lead to page fault. we test loading model (wide and deep, cnn), page fault not happened. In RNN model we modify the code use HDFS file sytem instead of posix file system. page fault not happend too. we print log in every function in core/platform/posix/posix_file_system.cc. The log is only displayed in model loading, not occurs in prediction process.

Is anyone can help us to find out this problem? Thank you!

@tensorflowbutler tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Feb 8, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
OS Platform and Distribution
TensorFlow installed from
TensorFlow version
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

@tensorflowbutler
Copy link
Member

Nagging Awaiting Response: It has been 14 days with no activityand the awaiting response label was assigned. Is this still an issue?

1 similar comment
@tensorflowbutler
Copy link
Member

Nagging Awaiting Response: It has been 14 days with no activityand the awaiting response label was assigned. Is this still an issue?

@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and the awaiting response label was assigned. Is this still an issue?

@drpngx
Copy link
Contributor

drpngx commented Apr 3, 2018

@jhseu any comment?

@drpngx drpngx assigned jhseu and unassigned drpngx Apr 3, 2018
@drpngx drpngx added type:bug Bug stat:community support Status - Community Support labels Apr 3, 2018
@drpngx drpngx unassigned jhseu Apr 3, 2018
@tensorflowbutler
Copy link
Member

It has been 14 days with no activity and the awaiting response label was assigned. Is this still an issue?

@jhseu jhseu added stat:contribution welcome Status - Contributions welcome and removed stat:awaiting response Status - Awaiting response from author labels Apr 18, 2018
@jhseu
Copy link
Contributor

jhseu commented Apr 18, 2018

Fixes are welcome.

@weberxie
Copy link
Contributor

@kdmxen could you provide a demo to trigger this problem?

@github-actions
Copy link

This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Mar 28, 2023
Copy link

This issue was closed because it has been inactive for 1 year.

Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale This label marks the issue/pr stale - to be closed automatically if no activity stat:community support Status - Community Support stat:contribution welcome Status - Contributions welcome type:bug Bug
Projects
None yet
Development

No branches or pull requests

5 participants