Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error during training the student model #20

Closed
elevantista opened this issue Feb 27, 2019 · 6 comments
Closed

error during training the student model #20

elevantista opened this issue Feb 27, 2019 · 6 comments

Comments

@elevantista
Copy link

@npapernot Hi, I try to run pate_2017 code.
I have succesfully trained the teacher model, but when I train the student model using command "python train_student.py --nb_teachers=100 --dataset=mnist --stdnt_share=5000"
However, there is an error like this:

Traceback (most recent call last):
File "train_student.py", line 208, in
tf.app.run()
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "train_student.py", line 205, in main
assert train_student(FLAGS.dataset, FLAGS.nb_teachers)
File "train_student.py", line 177, in train_student
stdnt_dataset = prepare_student_data(dataset, nb_teachers, save=True)
File "train_student.py", line 111, in prepare_student_data
test_data, test_labels = input.ld_mnist(test_only=True)
File "C:\Users\eleva\privacy\research\pate_2017\input.py", line 386, in ld_mnist
train_data = extract_mnist_data(local_urls[0], 60000, 28, 1)
File "C:\Users\eleva\privacy\research\pate_2017\input.py", line 274, in extract_mnist_data
return np.load(file_obj)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\numpy\lib\npyio.py", line 416, in load
magic = fid.read(N)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 132, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 100, in _prepare_value
return compat.as_str_any(val)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\util\compat.py", line 107, in as_str_any
return as_str(value)
File "C:\Users\eleva\AppData\Local\conda\conda\envs\PATE\lib\site-packages\tensorflow\python\util\compat.py", line 80, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte

@npapernot
Copy link
Collaborator

npapernot commented Feb 27, 2019 via email

@elevantista
Copy link
Author

@npapernot Thanks for the feedback. I change to python 2.7 running on Ubuntu.
However, I face another error here:

(pate17) harry@harry-VirtualBox:~/belajar/privacy/research/pate_2017$ python train_student.py --nb_teachers=100 --dataset=mnist --stdnt_share=5000
2019-03-05 22:22:23.491857: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2
2019-03-05 22:22:23.514316: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Traceback (most recent call last):
File "train_student.py", line 208, in
tf.app.run()
File "/home/harry/anaconda2/envs/pate17/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train_student.py", line 205, in main
assert train_student(FLAGS.dataset, FLAGS.nb_teachers)
File "train_student.py", line 177, in train_student
stdnt_dataset = prepare_student_data(dataset, nb_teachers, save=True)
File "train_student.py", line 123, in prepare_student_data
teachers_preds = ensemble_preds(dataset, nb_teachers, stdnt_data)
File "train_student.py", line 82, in ensemble_preds
result[teacher_id] = deep_cnn.softmax_preds(stdnt_data, ckpt_path)
File "/home/harry/belajar/privacy/research/pate_2017/deep_cnn.py", line 587, in softmax_preds
saver.restore(sess, ckpt_path)
File "/home/harry/anaconda2/envs/pate17/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1538, in restore
+ compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: /tmp/train_dir/mnist_100_teachers_0.ckpt-2999

@npapernot
Copy link
Collaborator

Have you trained the 100 MNIST models (the teachers that make up the ensemble from which labels are aggregated to supervise the student)?

@imaginedragontt
Copy link

Have you trained the 100 MNIST models (the teachers that make up the ensemble from which labels are aggregated to supervise the student)?

i had the same question while training student model , but i did trained teachers first ,i first train teachers with teacher_id=0 , the result shows that "ValueError: The passed save_path is not a valid checkpoint: /tmp/train_dir/mnist_100_teachers_1.ckpt-2999",then i tried to change teacher_id=1,the result would be "ValueError: The passed save_path is not a valid checkpoint: /tmp/train_dir/mnist_100_teachers_2.ckpt-2999" , i am wondering weather the teacher model did not store all the checkpoint or student read checpoint which it did not need.

@npapernot
Copy link
Collaborator

Oh I see: you need to run a training job for each teacher from teacher_id=0 to teacher_id=99, so you should execute train_teacher.py 100 times with a different teacher_id flag.

@npapernot
Copy link
Collaborator

Feel free to reopen if that did not solve your problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants