-
Notifications
You must be signed in to change notification settings - Fork 74.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading model from local in RNN prediction is slower than from HDFS due to page fault #16856
Comments
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. |
Nagging Awaiting Response: It has been 14 days with no activityand the |
1 similar comment
Nagging Awaiting Response: It has been 14 days with no activityand the |
It has been 14 days with no activity and the |
@jhseu any comment? |
It has been 14 days with no activity and the |
Fixes are welcome. |
@kdmxen could you provide a demo to trigger this problem? |
This issue is stale because it has been open for 180 days with no activity. It will be closed if no further activity occurs. Thank you. |
This issue was closed because it has been inactive for 1 year. |
We have trained a RNN model and use it to predict. We feed some data and calculate QPS in prediction. We find that when CPU usage is above than 30%, the QPS always stayed in 900+. And not increasing linearly by CPU usage. But if we put the model in HDFS, The QPS can reach 2400+.
Our system infomation:
In local model case, we use performance tool to trace function call time and find nearly 20% time hanged in page fault which lead to spin_lock. Those page fault occurs less than 1% in hdfs situation.
Our performance result listed as below:
model loading from local:
model loading from hdfs:
We check the source code (both eigen and tensorflow ) again and again and can not find any suspectable code which lead to page fault. we test loading model (wide and deep, cnn), page fault not happened. In RNN model we modify the code use HDFS file sytem instead of posix file system. page fault not happend too. we print log in every function in core/platform/posix/posix_file_system.cc. The log is only displayed in model loading, not occurs in prediction process.
Is anyone can help us to find out this problem? Thank you!
The text was updated successfully, but these errors were encountered: