Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to get each user's recommendation items list trained with multi-gpu? #9386

Closed
ucasiggcas opened this issue Oct 17, 2020 · 10 comments
Closed
Assignees
Labels
models:official models that come under official repository stale stat:awaiting response Waiting on input from the contributor type:support

Comments

@ucasiggcas
Copy link

hi,dear
have seen the NCF ,but a little confused,
after training How to get the items list for every user ?

thx

@ravikyram ravikyram added the models:official models that come under official repository label Oct 19, 2020
@ravikyram ravikyram assigned saberkun and rachellj218 and unassigned ravikyram Oct 19, 2020
@jaeyounkim jaeyounkim added the stat:awaiting response Waiting on input from the contributor label Nov 13, 2020
@scotthoule
Copy link

Hi ucasiggcas,

Do you mean how to run the trained model for inference?
It would be great if you could list your specific confusions and have a pointer to the code that doesn't satisfy your need.

Thx

@ucasiggcas
Copy link
Author

yes
the target for recommendation is to give the users the items they like,
such as
user_1:[item_1,item_12,item_102,item_9]
user_2:[item_2,item_43,item_34]
....
i think it's not difficult for dev,but No API for this .
so could you pls help me ?

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Waiting on input from the contributor label Nov 19, 2020
@ucasiggcas
Copy link
Author

hi,dear
how to download the datasets quickly?
when I use the down, waiting for a long time, get nothing.
$ python movielens.py

@ucasiggcas
Copy link
Author

and got the error down, what's up?

<BisectionDataConstructor(Thread-1, initial daemon)>
General:
  Num users: 138493
  Num items: 26744

Training:
  Positive count:          19861770
  Batch size:              99000 
  Batch count per epoch:   1004

Eval:
  Positive count:          138493
  Batch size:              99000 
  Batch count per epoch:   1399
Traceback (most recent call last):
  File "ncf_keras_main.py", line 549, in <module>
    app.run(main)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "ncf_keras_main.py", line 544, in main
    logging.info("Result is %s", run_ncf(FLAGS))
  File "ncf_keras_main.py", line 253, in run_ncf
    params, producer, input_meta_data, strategy))
  File "/data/logs/xulm1/NCF/ncf_input_pipeline.py", line 139, in create_ncf_input_data
    (1 + rconst.NUM_EVAL_NEGATIVES)))
ValueError: Evaluation batch size must be divisible by 2 times 1000

my cmd is as follows
$ python ncf_keras_main.py --model_dir=./tmp/models --data_dir=./tmp/movielens-data --dataset=ml-20m --num_gpus=2

@ucasiggcas
Copy link
Author

when I set --num_gpus=1
got another error

Train on None steps, validate on 1399 steps
Epoch 1/2
2020-11-24 15:53:21.501988: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
Traceback (most recent call last):
  File "ncf_keras_main.py", line 549, in <module>
    app.run(main)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "ncf_keras_main.py", line 544, in main
    logging.info("Result is %s", run_ncf(FLAGS))
  File "ncf_keras_main.py", line 313, in run_ncf
    verbose=2)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py", line 727, in fit
    use_multiprocessing=use_multiprocessing)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_distributed.py", line 685, in fit
    steps_name='steps_per_epoch')
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_arrays.py", line 300, in model_iteration
    batch_outs = f(actual_inputs)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/tensorflow_core/python/keras/backend.py", line 3476, in __call__
    run_metadata=self.run_metadata)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1472, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: You must feed a value for placeholder tensor 'user_id' with dtype int32 and shape [?,1]
	 [[{{node user_id}}]]
	 [[loss_layer_1/sparse_categorical_crossentropy/weighted_loss/broadcast_weights/assert_broadcastable/is_valid_shape/has_valid_nonscalar_shape/has_invalid_dims/concat/_297]]
  (1) Invalid argument: You must feed a value for placeholder tensor 'user_id' with dtype int32 and shape [?,1]
	 [[{{node user_id}}]]
0 successful operations.
0 derived errors ignored.

@ucasiggcas
Copy link
Author

when I use ml-20m dataset, error

Traceback (most recent call last):
  File "ncf_keras_main.py", line 552, in <module>
    app.run(main)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "ncf_keras_main.py", line 547, in main
    logging.info("Result is %s", run_ncf(FLAGS))
  File "ncf_keras_main.py", line 235, in run_ncf
    num_users, num_items, _, _, producer = ncf_common.get_inputs(params)
  File "/data/logs/xulm1/NCF/ncf_common.py", line 47, in get_inputs
    deterministic=FLAGS.seed is not None)
  File "/data/logs/xulm1/NCF/data_preprocessing.py", line 218, in instantiate_pipeline
    raw_data, _ = _filter_index_sort(raw_rating_path, cache_path)
  File "/data/logs/xulm1/NCF/data_preprocessing.py", line 161, in _filter_index_sort
    user_map, item_map, df = read_dataframe(raw_rating_path)
  File "/data/logs/xulm1/NCF/data_preprocessing.py", line 62, in read_dataframe
    grouped = df.groupby(movielens.USER_COLUMN)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/pandas/core/generic.py", line 7894, in groupby
    **kwargs
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 2522, in groupby
    return klass(obj, by, **kwds)
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/pandas/core/groupby/groupby.py", line 391, in __init__
    mutated=self.mutated,
  File "/data/logs/xulm1/myconda/lib/python3.7/site-packages/pandas/core/groupby/grouper.py", line 621, in _get_grouper
    raise KeyError(gpr)
KeyError: 'user_id'

@laxmareddyp laxmareddyp self-assigned this Oct 14, 2022
@laxmareddyp
Copy link
Collaborator

Hi @ucasiggcas,

This question is better asked on StackOverflow and TensorFlow Forum since it is not a bug or feature request. There is also a larger community that reads questions there.

Thanks

@laxmareddyp laxmareddyp added the stat:awaiting response Waiting on input from the contributor label May 2, 2023
@github-actions
Copy link

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label May 10, 2023
@github-actions
Copy link

This issue was closed due to lack of activity after being marked stale for past 7 days.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
models:official models that come under official repository stale stat:awaiting response Waiting on input from the contributor type:support
Projects
None yet
Development

No branches or pull requests

8 participants