Skip to content
This repository has been archived by the owner on Mar 8, 2024. It is now read-only.

mnist doesn't converge.on low end GPU #4

Closed
davidleon opened this issue Jun 20, 2016 · 1 comment
Closed

mnist doesn't converge.on low end GPU #4

davidleon opened this issue Jun 20, 2016 · 1 comment

Comments

@davidleon
Copy link

on a low end GPU, the code doesn't converge, and the first 999 epoch is only around 10%.
out_d = [
[-0.0937949 -0.10633045 -0.09618157 -0 -0 0.91507906
-0.09252435 -0.09772251 -0.09196769 -0.09155396]
[0.9062051 -0.10633045 -0.09618157 -0 -0 -0.084920965
-0.09252435 -0.09772251 -0.09196769 -0.09155396]
[-0.0937949 -0.10633045 -0.09618157 -0 -0 -0.084920965
-0.09252435 -0.09772251 0.9080323 -0.09155396]
[-0.0937949 -0.10633045 -0.09618157 -0 1 -0.084920965
-0.09252435 -0.09772251 -0.09196769 -0.09155396]
[-0.0937949 -0.10633045 -0.09618157 -0 -0 -0.084920965
-0.09252435 -0.09772251 0.9080323 -0.09155396]
]

loss = [
[0.08563976 0.005653082 0.0046254476 0 0.1 0.0866216
0.0042803776 0.0047748443 0.16744195 0.0041910643]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
]

Accuracy: 11%

Epoch: 10999
out = [
[0.09369277 0.103886336 0.0954021 0 0 0.085475065
0.090690844 0.09841768 0.09427827 0.09054168]
[0.09369277 0.103886336 0.0954021 0 0 0.085475065
0.090690844 0.09841768 0.09427827 0.09054168]
[0.09369277 0.103886336 0.0954021 0 0 0.085475065
0.090690844 0.09841768 0.09427827 0.09054168]
[0.09369277 0.103886336 0.0954021 0 0 0.085475065
0.090690844 0.09841768 0.09427827 0.09054168]
[0.09369277 0.103886336 0.0954021 0 0 0.085475065
0.090690844 0.09841768 0.09427827 0.09054168]
]

out_d = [
[-0.09369277 -0.103886336 -0.0954021 -0 -0 -0.085475065
-0.090690844 0.9015823 -0.09427827 -0.09054168]
[-0.09369277 -0.103886336 -0.0954021 1 -0 -0.085475065
-0.090690844 -0.09841768 -0.09427827 -0.09054168]
[0.9063072 -0.103886336 -0.0954021 -0 -0 -0.085475065
-0.090690844 -0.09841768 -0.09427827 -0.09054168]
[-0.09369277 -0.103886336 -0.0954021 -0 1 -0.085475065
-0.090690844 -0.09841768 -0.09427827 -0.09054168]
[0.9063072 -0.103886336 -0.0954021 -0 -0 -0.085475065
-0.090690844 -0.09841768 -0.09427827 -0.09054168]
]

loss = [
[0.16691206 0.0053961854 0.0045507806 0.1 0.1 0.0036529934
0.0041124146 0.08515949 0.0044441964 0.0040988983]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
]

Accuracy: 10.679999%

Epoch: 11999
out = [
[0.09301148 0.10326048 0.09476416 0 0 0.08517593
0.09224633 0.10320083 0.09311914 0.09102862]
[0.09301148 0.10326048 0.09476416 0 0 0.08517593
0.09224633 0.10320083 0.09311914 0.09102862]
[0.09301148 0.10326048 0.09476416 0 0 0.08517593
0.09224633 0.10320083 0.09311914 0.09102862]
[0.09301148 0.10326048 0.09476416 0 0 0.08517593
0.09224633 0.10320083 0.09311914 0.09102862]
[0.09301148 0.10326048 0.09476416 0 0 0.08517593
0.09224633 0.10320083 0.09311914 0.09102862]
]

out_d = [
[-0.09301148 -0.10326048 -0.09476416 -0 -0 -0.08517593
-0.09224633 -0.10320083 0.90688086 -0.09102862]
[-0.09301148 -0.10326048 -0.09476416 1 -0 -0.08517593
-0.09224633 -0.10320083 -0.09311914 -0.09102862]
[-0.09301148 -0.10326048 -0.09476416 -0 -0 0.91482407
-0.09224633 -0.10320083 -0.09311914 -0.09102862]
[-0.09301148 -0.10326048 -0.09476416 -0 -0 -0.08517593
0.90775365 -0.10320083 -0.09311914 -0.09102862]
[-0.09301148 -0.10326048 -0.09476416 -0 -0 -0.08517593
-0.09224633 -0.10320083 0.90688086 -0.09102862]
]

loss = [
[0.004325568 0.0053313635 0.0044901227 0.1 0 0.08659229
0.08580542 0.0053252056 0.16708793 0.0041431054]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
]

Accuracy: 10%

Validating
Validation Accuracy: 12.6%

This is an old machine. Celeron(R) CPU G1820@2.7GHz and Intel HD Graphics.

After I tweaked gpuarray context.rs to use the CPU it converges as normal. and the first 999 epoch reachs over 70%
Running target\debug\examples\mnist.exe
Reading training labels...
Label count: 60000
Reading training images...
Reading validation labels...
Label count: 1000
Reading validation images...

Using OpenCL Device: Intel(R) HD Graphics

Epoch: 999
out = [
[0 0 0 0 0 0 0 0.781054 0
0.28385308]
[0 0 0.3421486 0.6585547 0 0 0 0
0.39498368 0]
[0.01904458 0 0.5730746 0.00049253064 0 0 0
0 0 0]
[0 0.6941181 0 0 0 0 0.012161581 0
0 0]
[0 0 0.62997645 0 0 0 0.11047391 0
0.31374422 0]
]

out_d = [
[-0 -0 -0 -0 -0 -0 -0 0.21894598 -0
-0.28385308]
[-0 -0 -0.3421486 0.34144533 -0 -0 -0 -0
-0.39498368 -0]
[-0.01904458 -0 0.42692542 -0.00049253064 -0 -0 -0
-0 -0 -0]
[-0 0.30588192 -0 -0 -0 -0 -0.012161581 -0
-0 -0]
[-0 -0 0.37002355 -0 -0 -0 -0.11047391 -0
-0.31374422 -0]
]

loss = [
[0.000036269605 0.009356375 0.043624844 0.011658516 0 0
0.0012352389 0.0047937343 0.025444752 0.008057258]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
]

Accuracy: 78.14%

@davidleon
Copy link
Author

I think I reproduce the culprit. I find the gnuabi opencl.lib compiles with msvc abi. AND in that situation, the mnist doesn't converge at all. because the the msvc binary compiled with gnu abi generated opencl.lib
That's extreme weird. While GNU abi won't compile the msvc abi generated lib. The issue is msvc abi shouldn't compile with gnu generated lib silently.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant