missing API against TH #70

Open
soumith opened this Issue Nov 6, 2014 · 52 comments

Comments

Projects
None yet
@soumith
Member

soumith commented Nov 6, 2014

The following math functions are missing in THC but present in TH:

  • numel
  • prod
  • zeros
  • ones
  • reshape
  • rand
  • randn
  • round
  • atan2
  • std (not stdall, this is per-dimension)
  • var (not varall, this is per-dimension)
  • cumsum
  • cumprod
  • maskedFill
  • maskedSelect
  • sort
  • maskedCopy (waiting for merge, #167)
  • multinomial (implemented, waiting for PR)
  • cross
  • logicalall
  • logicalany
  • tril
  • triu
  • trace
  • diag
  • nonzero
  • range
  • cat
  • linspace
  • logspace
  • eye
  • randperm
  • histc
  • conv2 (these are torch operators, not to be confused with nn Convolutions which are already implemented)
  • conv3

When these are implemented, cwrap entries can be added that would make cutorch completely API compatible with torch

@soumith soumith added the bug label Nov 6, 2014

This was referenced Nov 7, 2014

@clementfarabet

This comment has been minimized.

Show comment
Hide comment
@clementfarabet

clementfarabet Nov 8, 2014

Member

Hey @soumith , so running cutorch.test() currently reports errors on cumsum and cumprod (missing). That's on purpose?

Member

clementfarabet commented Nov 8, 2014

Hey @soumith , so running cutorch.test() currently reports errors on cumsum and cumprod (missing). That's on purpose?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Nov 8, 2014

Member

Yes they are yet to be implemented on cutorch. I'll get to that next week. They can be done with a thrust:: scan

Member

soumith commented Nov 8, 2014

Yes they are yet to be implemented on cutorch. I'll get to that next week. They can be done with a thrust:: scan

@clementfarabet

This comment has been minimized.

Show comment
Hide comment
@clementfarabet

clementfarabet Nov 8, 2014

Member

Ok cool.

On Sat, Nov 8, 2014 at 12:27 PM, Soumith Chintala notifications@github.com
wrote:

Yes they are yet to be implemented on cutorch. I'll get to that next week.
They can be done with a thrust:: scan


Reply to this email directly or view it on GitHub
#70 (comment).

Member

clementfarabet commented Nov 8, 2014

Ok cool.

On Sat, Nov 8, 2014 at 12:27 PM, Soumith Chintala notifications@github.com
wrote:

Yes they are yet to be implemented on cutorch. I'll get to that next week.
They can be done with a thrust:: scan


Reply to this email directly or view it on GitHub
#70 (comment).

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Jan 3, 2015

Member

Does anyone have an implementation of std and var yet? Otherwise I'll have a go next week.
Afaict, we can't use THCudaTensor_reduceDim so we'll need something custom (but with a similar structure as reduceDim).

Member

dominikgrewe commented Jan 3, 2015

Does anyone have an implementation of std and var yet? Otherwise I'll have a go next week.
Afaict, we can't use THCudaTensor_reduceDim so we'll need something custom (but with a similar structure as reduceDim).

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jan 3, 2015

Member

no, do not have it. this has been long overdue, but we should co-ordinate these, let me email you guys.

Member

soumith commented Jan 3, 2015

no, do not have it. this has been long overdue, but we should co-ordinate these, let me email you guys.

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Jan 20, 2015

Member

Any progress on cumsum and cumprod?

Member

dominikgrewe commented Jan 20, 2015

Any progress on cumsum and cumprod?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jan 20, 2015

Member

have not started them yet!

Member

soumith commented Jan 20, 2015

have not started them yet!

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Jan 20, 2015

Member

Okay, I might have a go soon then, because we'd like to use it. Just waiting for the THCState PR to be merged.

Member

dominikgrewe commented Jan 20, 2015

Okay, I might have a go soon then, because we'd like to use it. Just waiting for the THCState PR to be merged.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Jan 21, 2015

Member

I will merge the THCState PR on Friday. All the patches have been prepared except for fbcunn, working on that as well.

Member

soumith commented Jan 21, 2015

I will merge the THCState PR on Friday. All the patches have been prepared except for fbcunn, working on that as well.

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Feb 19, 2015

Member

@soumith For reductions along a single dimension, we still have the restriction that the tensor must not have more than 4 dimensions. Did you say you're working on a fix for that? If so, what's the progress on it?
If you're not working on it, we could easily change the code to what we do for std and var, where there's no restriction on dimensionality at all.

Member

dominikgrewe commented Feb 19, 2015

@soumith For reductions along a single dimension, we still have the restriction that the tensor must not have more than 4 dimensions. Did you say you're working on a fix for that? If so, what's the progress on it?
If you're not working on it, we could easily change the code to what we do for std and var, where there's no restriction on dimensionality at all.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Feb 19, 2015

Member

@wickedfoo already has PR for that internally. Its all implemented. I'm on
vacay till 26th, so I will sync those changes at the end of this month.
On Feb 19, 2015 4:34 PM, "Dominik Grewe" notifications@github.com wrote:

@soumith https://github.com/soumith For reductions along a single
dimension, we still have the restriction that the tensor must not have more
than 4 dimensions. Did you say you're working on a fix for that? If so,
what's the progress on it?
If you're not working on it, we could easily change the code to what we do
for std and var, where there's no restriction on dimensionality at all.

Reply to this email directly or view it on GitHub
#70 (comment).

Member

soumith commented Feb 19, 2015

@wickedfoo already has PR for that internally. Its all implemented. I'm on
vacay till 26th, so I will sync those changes at the end of this month.
On Feb 19, 2015 4:34 PM, "Dominik Grewe" notifications@github.com wrote:

@soumith https://github.com/soumith For reductions along a single
dimension, we still have the restriction that the tensor must not have more
than 4 dimensions. Did you say you're working on a fix for that? If so,
what's the progress on it?
If you're not working on it, we could easily change the code to what we do
for std and var, where there's no restriction on dimensionality at all.

Reply to this email directly or view it on GitHub
#70 (comment).

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Feb 19, 2015

Member

His PR is for apply (and apply2) along an arbitrary dimension. Not reductions. It generalizes the copy kernels and changes all the tensor math to use these apply kernels where appropriate instead of make contiguous + thrust

Member

soumith commented Feb 19, 2015

His PR is for apply (and apply2) along an arbitrary dimension. Not reductions. It generalizes the copy kernels and changes all the tensor math to use these apply kernels where appropriate instead of make contiguous + thrust

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Feb 19, 2015

Member

That's the status on that. He did not work yet on arbitrary reductions. If you want to tackle that, go for it.

Member

soumith commented Feb 19, 2015

That's the status on that. He did not work yet on arbitrary reductions. If you want to tackle that, go for it.

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Feb 19, 2015

Member

Thanks for the update. I'll have a go at the reductions kernel then.
Looking forward to the apply kernels!

Member

dominikgrewe commented Feb 19, 2015

Thanks for the update. I'll have a go at the reductions kernel then.
Looking forward to the apply kernels!

@wickedfoo

This comment has been minimized.

Show comment
Hide comment
@wickedfoo

wickedfoo Feb 19, 2015

Contributor

I'm on vacation too until next week but I don't think the generic apply
stuff got pushed yet, I think only the old version of the copy kernel. I
reimplemented all cutorch math (pointwise operators) in terms of it, so no
newContiguous calls are needed. I don't yet support reductions but it
shouldn't be too hard to add that in.

On Thursday, February 19, 2015, Dominik Grewe notifications@github.com
wrote:

Thanks for the update. I'll have a go at the reductions kernel then.
Looking forward to the apply kernels!


Reply to this email directly or view it on GitHub
#70 (comment).

Contributor

wickedfoo commented Feb 19, 2015

I'm on vacation too until next week but I don't think the generic apply
stuff got pushed yet, I think only the old version of the copy kernel. I
reimplemented all cutorch math (pointwise operators) in terms of it, so no
newContiguous calls are needed. I don't yet support reductions but it
shouldn't be too hard to add that in.

On Thursday, February 19, 2015, Dominik Grewe notifications@github.com
wrote:

Thanks for the update. I'll have a go at the reductions kernel then.
Looking forward to the apply kernels!


Reply to this email directly or view it on GitHub
#70 (comment).

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Mar 2, 2015

Member

If I remember correctly, you guys said you'd look into maskedFill etc, right? Any progress on that?

Member

dominikgrewe commented Mar 2, 2015

If I remember correctly, you guys said you'd look into maskedFill etc, right? Any progress on that?

@wickedfoo

This comment has been minimized.

Show comment
Hide comment
@wickedfoo

wickedfoo Mar 2, 2015

Contributor

for maskedFill etc. do you want the mask to be a float vector (because that's the only thing we have in cutorch at present), or do you want it to be 4 bytes packed into a float?

Contributor

wickedfoo commented Mar 2, 2015

for maskedFill etc. do you want the mask to be a float vector (because that's the only thing we have in cutorch at present), or do you want it to be 4 bytes packed into a float?

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Mar 2, 2015

Member

I guess float vectors make the most sense, because that's what logical functions (gt, ge etc) return.

Member

dominikgrewe commented Mar 2, 2015

I guess float vectors make the most sense, because that's what logical functions (gt, ge etc) return.

@wickedfoo

This comment has been minimized.

Show comment
Hide comment
@wickedfoo

wickedfoo Mar 5, 2015

Contributor

I have maskedFill/Copy/Select done, and sort() I have power-of-2 sizes at present (but on input with an arbitrary number of dimensions), so still working on that. maskedFill, maskedCopy and sort avoid newContiguous on the input, but maskedSelect I chickened out and just used two passes and temporary space with a Thrust prefix scan.

Re: "For reductions along a single dimension, we still have the restriction that the tensor must not have more than 4 dimensions. Did you say you're working on a fix for that? If so, what's the progress on it?" I have this fixed as well, took the copy kernel code and made a reduction kernel out of it, so no calls to newContiguous/copies etc. needed. Not a global reduction kernel (like a norm that reduces down to one point, but reduces along a dimension. sort() exploits similar code. I want to do the same shared memory optimization (so I can use coalesced reads) that you did if the reduction dimension is innermost/most contiguous though.

Contributor

wickedfoo commented Mar 5, 2015

I have maskedFill/Copy/Select done, and sort() I have power-of-2 sizes at present (but on input with an arbitrary number of dimensions), so still working on that. maskedFill, maskedCopy and sort avoid newContiguous on the input, but maskedSelect I chickened out and just used two passes and temporary space with a Thrust prefix scan.

Re: "For reductions along a single dimension, we still have the restriction that the tensor must not have more than 4 dimensions. Did you say you're working on a fix for that? If so, what's the progress on it?" I have this fixed as well, took the copy kernel code and made a reduction kernel out of it, so no calls to newContiguous/copies etc. needed. Not a global reduction kernel (like a norm that reduces down to one point, but reduces along a dimension. sort() exploits similar code. I want to do the same shared memory optimization (so I can use coalesced reads) that you did if the reduction dimension is innermost/most contiguous though.

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Mar 5, 2015

Member

Cool, looking forward to that. Yes, using the shared memory approach for reductions along contiguous dimensions is vital.

Member

dominikgrewe commented Mar 5, 2015

Cool, looking forward to that. Yes, using the shared memory approach for reductions along contiguous dimensions is vital.

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Mar 5, 2015

Member

When do you think you'll have a PR for maskedFill etc?

Member

dominikgrewe commented Mar 5, 2015

When do you think you'll have a PR for maskedFill etc?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Mar 5, 2015

Member

it's in review, and jeff is still working on revamping our code-base to the state argument based change.
Hopefully sometime next week.

And the cutorch TensorMath changes that remove most of the sync points on non-contiguous cases will also land at the same time.

Member

soumith commented Mar 5, 2015

it's in review, and jeff is still working on revamping our code-base to the state argument based change.
Hopefully sometime next week.

And the cutorch TensorMath changes that remove most of the sync points on non-contiguous cases will also land at the same time.

@jaderberg

This comment has been minimized.

Show comment
Hide comment
@jaderberg

jaderberg Mar 12, 2015

Any progress on the masked* functions?

Any progress on the masked* functions?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Mar 12, 2015

Member

Theyre implemented. We are working on refactoring our code and syncing.with master, we will try to merge them this week.

Member

soumith commented Mar 12, 2015

Theyre implemented. We are working on refactoring our code and syncing.with master, we will try to merge them this week.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Mar 13, 2015

Member

An update:
Jeff has powered through our internal refactor, getting us back to parity with oss cutorch.
There are three PRs coming up.

  • masked* functions
  • sort
  • all of the math (where applicable) revamped to use our own pointwise and reduce kernels, so that non-contiguous tensors are no longer sync points.

It will either be EOD today or most likely Monday/Tuesday.

Member

soumith commented Mar 13, 2015

An update:
Jeff has powered through our internal refactor, getting us back to parity with oss cutorch.
There are three PRs coming up.

  • masked* functions
  • sort
  • all of the math (where applicable) revamped to use our own pointwise and reduce kernels, so that non-contiguous tensors are no longer sync points.

It will either be EOD today or most likely Monday/Tuesday.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Mar 13, 2015

Member

Looks like what's left is the small fish.

  • torch.diag, torch.eye, torch.trace can be handled with the same generic diagonal-apply kernel.
  • torch.randperm, linspace, logspace, range is a thrust::scan
  • logicalall, logicalany also apply kernel!?
  • tril and triu might be tricky
Member

soumith commented Mar 13, 2015

Looks like what's left is the small fish.

  • torch.diag, torch.eye, torch.trace can be handled with the same generic diagonal-apply kernel.
  • torch.randperm, linspace, logspace, range is a thrust::scan
  • logicalall, logicalany also apply kernel!?
  • tril and triu might be tricky
@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Mar 13, 2015

Member

Thanks Soumith. Looking forward to that!

What about convolution functions like conv2, conv3 etc? There's code in THCTensorConv.cu, but it's not exposed in Lua. Any idea why? If we want full API parity between Torch and cutorch, we should add those, don't you think?

Member

dominikgrewe commented Mar 13, 2015

Thanks Soumith. Looking forward to that!

What about convolution functions like conv2, conv3 etc? There's code in THCTensorConv.cu, but it's not exposed in Lua. Any idea why? If we want full API parity between Torch and cutorch, we should add those, don't you think?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Mar 13, 2015

Member

ah yes you are right. not sure why I did not add them to the list. Writing conv2/conv3 kernels from scratch is going to be not worth our time. Maybe we can use the cu* API for that? Either that, or on GPU we use a buffer to unfold and do MM. What do you think?

Member

soumith commented Mar 13, 2015

ah yes you are right. not sure why I did not add them to the list. Writing conv2/conv3 kernels from scratch is going to be not worth our time. Maybe we can use the cu* API for that? Either that, or on GPU we use a buffer to unfold and do MM. What do you think?

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Mar 13, 2015

Member

If we can use cuDNN for this, then that would be easiest I guess.

Member

dominikgrewe commented Mar 13, 2015

If we can use cuDNN for this, then that would be easiest I guess.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Mar 13, 2015

Member

Cudnn is still not shipped with the CUDA toolkit, so not everyone has it. So it falls into the murky territory of, do we really want to introduce a hard-dependency on cudnn.

i am okay with a pcall to cudnn and having an error on not-found, but i am not sure how it will go down with the others.

Member

soumith commented Mar 13, 2015

Cudnn is still not shipped with the CUDA toolkit, so not everyone has it. So it falls into the murky territory of, do we really want to introduce a hard-dependency on cudnn.

i am okay with a pcall to cudnn and having an error on not-found, but i am not sure how it will go down with the others.

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Apr 1, 2015

Member

There are a number of linear algebra functions missing: symeig, eig, inverse etc. In Torch they seem to be implemented by wrapping Lapack. Could we do something similar for cutorch? There's MAGMA and CULA; does anyone have experience with these libraries?

Member

dominikgrewe commented Apr 1, 2015

There are a number of linear algebra functions missing: symeig, eig, inverse etc. In Torch they seem to be implemented by wrapping Lapack. Could we do something similar for cutorch? There's MAGMA and CULA; does anyone have experience with these libraries?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Apr 1, 2015

Member

MAGMA looks best, we built MAGMA internally and it looks reasonably good.

Member

soumith commented Apr 1, 2015

MAGMA looks best, we built MAGMA internally and it looks reasonably good.

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Apr 1, 2015

Member

Also, on the CuDNN note, we can configure a header (like THGeneral.h.in ) if we find cudnn. Caffe has the cmake macros needed for finding cudnn already written: https://github.com/BVLC/caffe/blob/master/cmake/Cuda.cmake

Member

soumith commented Apr 1, 2015

Also, on the CuDNN note, we can configure a header (like THGeneral.h.in ) if we find cudnn. Caffe has the cmake macros needed for finding cudnn already written: https://github.com/BVLC/caffe/blob/master/cmake/Cuda.cmake

kashif added a commit to kashif/cutorch that referenced this issue Apr 15, 2015

@soumith soumith referenced this issue May 4, 2015

Closed

Missing :diag() #146

@dpfau

This comment has been minimized.

Show comment
Hide comment
@dpfau

dpfau May 24, 2015

Hi guys. Noticed this thread when dealing with a script that needs diag, svd and eig on CudaTensors. I implemented diag myself in Lua using storage() and set(), but svd and eig are beyond my ken. What's the plan for that?

dpfau commented May 24, 2015

Hi guys. Noticed this thread when dealing with a script that needs diag, svd and eig on CudaTensors. I implemented diag myself in Lua using storage() and set(), but svd and eig are beyond my ken. What's the plan for that?

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith May 25, 2015

Member

One of my colleagues @SamGross is working on it by interfacing the magma cuda library. It'll happen over the next month or so when he finishes it up and sends a PR.

Member

soumith commented May 25, 2015

One of my colleagues @SamGross is working on it by interfacing the magma cuda library. It'll happen over the next month or so when he finishes it up and sends a PR.

@wickedfoo

This comment has been minimized.

Show comment
Hide comment
@wickedfoo

wickedfoo May 27, 2015

Contributor

This is not on this list, but I'm in the process of implementing THCudaTensor_multinomial as well.

Contributor

wickedfoo commented May 27, 2015

This is not on this list, but I'm in the process of implementing THCudaTensor_multinomial as well.

@nicholas-leonard

This comment has been minimized.

Show comment
Hide comment
@nicholas-leonard

nicholas-leonard May 28, 2015

Member

@wickedfoo Awesome. Would love to see multinomial in cuda.

Member

nicholas-leonard commented May 28, 2015

@wickedfoo Awesome. Would love to see multinomial in cuda.

@hughperkins

This comment has been minimized.

Show comment
Hide comment
@hughperkins

hughperkins Jun 15, 2015

Contributor

Just to confirm, scatter/gather arent implemented in cutorch, right?

Contributor

hughperkins commented Jun 15, 2015

Just to confirm, scatter/gather arent implemented in cutorch, right?

@dominikgrewe

This comment has been minimized.

Show comment
Hide comment
@dominikgrewe

dominikgrewe Jun 15, 2015

Member

That's right. I meant to do it, but haven't had the time yet, sorry.

Member

dominikgrewe commented Jun 15, 2015

That's right. I meant to do it, but haven't had the time yet, sorry.

@hughperkins

This comment has been minimized.

Show comment
Hide comment
@hughperkins

hughperkins Jun 23, 2015

Contributor

For gather, which I suddenly realize could be useful for implementing ClassNLLCriterion.forward, without needing a custom kernel, I guess?, I suppose a simple naive first-cut could be:

  • use the isContiguous (toContiguous? asContiguous?) method to convert the tensor to contiguous format
  • simply assign one thread to each output location, and I dont think we need any local memory or anything right? So, just throw everything into warp-size blocks and it's basically done?

Does that sound about right? Anything else I should bear in mind if I write a naive gather along these lines? (I'll be targeting cltorch I confess, but it's quite easy to convert cutorch<->cltorch kernels I think?)

(Edit: what do you think is the most similar existing class/kernel to base this off? and/or thoughts on where to put this, ie filename(s)?)

Contributor

hughperkins commented Jun 23, 2015

For gather, which I suddenly realize could be useful for implementing ClassNLLCriterion.forward, without needing a custom kernel, I guess?, I suppose a simple naive first-cut could be:

  • use the isContiguous (toContiguous? asContiguous?) method to convert the tensor to contiguous format
  • simply assign one thread to each output location, and I dont think we need any local memory or anything right? So, just throw everything into warp-size blocks and it's basically done?

Does that sound about right? Anything else I should bear in mind if I write a naive gather along these lines? (I'll be targeting cltorch I confess, but it's quite easy to convert cutorch<->cltorch kernels I think?)

(Edit: what do you think is the most similar existing class/kernel to base this off? and/or thoughts on where to put this, ie filename(s)?)

@hughperkins

This comment has been minimized.

Show comment
Hide comment
@hughperkins

hughperkins Jun 23, 2015

Contributor

One of my colleagues @SamGross is working on it by interfacing the magma cuda library. It'll happen over the next month or so when he finishes it up and sends a PR.

Magma looks cool. Has opencl version too it seems :-)

Contributor

hughperkins commented Jun 23, 2015

One of my colleagues @SamGross is working on it by interfacing the magma cuda library. It'll happen over the next month or so when he finishes it up and sends a PR.

Magma looks cool. Has opencl version too it seems :-)

@hughperkins

This comment has been minimized.

Show comment
Hide comment
@hughperkins

hughperkins Jun 24, 2015

Contributor

Shoe-horned the lua wrapper into TensorMath.lua: hughperkins/cltorch@0e469f4

Contributor

hughperkins commented Jun 24, 2015

Shoe-horned the lua wrapper into TensorMath.lua: hughperkins/cltorch@0e469f4

@hughperkins

This comment has been minimized.

Show comment
Hide comment
Contributor

hughperkins commented Jun 24, 2015

@abyravan

This comment has been minimized.

Show comment
Hide comment
@abyravan

abyravan Feb 23, 2016

Is there any update on adding these functions? I'm in need of the cross product on cudatensors.

I can code it up if someone can point me towards the things that need to be done. All the layers I've written so far are on the lua side and I'm not sure how to make the connection between cuda and lua. Thanks!

Is there any update on adding these functions? I'm in need of the cross product on cudatensors.

I can code it up if someone can point me towards the things that need to be done. All the layers I've written so far are on the lua side and I'm not sure how to make the connection between cuda and lua. Thanks!

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Feb 28, 2016

Member

@abyravan i could get to cross next week.

Member

soumith commented Feb 28, 2016

@abyravan i could get to cross next week.

@abyravan

This comment has been minimized.

Show comment
Hide comment
@abyravan

abyravan Feb 28, 2016

That would be great. Thanks a lot! Is there any sort of a tutorial or howto on adding new functionality for tensors? Would be useful to have :)

That would be great. Thanks a lot! Is there any sort of a tutorial or howto on adding new functionality for tensors? Would be useful to have :)

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Feb 28, 2016

Member

You can look at existing PRs that are cross-linked in this thread.
Like:
#120
#96
#75

Member

soumith commented Feb 28, 2016

You can look at existing PRs that are cross-linked in this thread.
Like:
#120
#96
#75

@soumith

This comment has been minimized.

Show comment
Hide comment
@soumith

soumith Feb 28, 2016

Member

nonzero is being implemented by FB, should be out in a few days.

Member

soumith commented Feb 28, 2016

nonzero is being implemented by FB, should be out in a few days.

archenroot pushed a commit to archenroot/gentoo-overlay that referenced this issue Dec 4, 2016

layman
cutorch
** NOTE on API changes and versioning **

Cutorch provides a CUDA backend for torch7.

Cutorch provides the following:

a new tensor type: torch.CudaTensor that acts like torch.FloatTensor, but all it's operations are on the GPU. Most of the tensor operations are supported by cutorch. There are a few missing ones, which are being implemented. The missing list can be found here: torch/cutorch#70
several other GPU tensor types, with limited functionality. Currently limited to copying/conversion, and several indexing and shaping operations.
cutorch.* - Functions to set/get GPU, get device properties, memory usage, set/get low-level streams, set/get random number generator's seed, synchronization etc. They are described in more detail belo
@pvtokmakov

This comment has been minimized.

Show comment
Hide comment
@pvtokmakov

pvtokmakov Jun 26, 2017

Hi guys,

any hope on implementing the conv2 (or xcorr2 for that matter)?

Hi guys,

any hope on implementing the conv2 (or xcorr2 for that matter)?

@ethanluoyc

This comment has been minimized.

Show comment
Hide comment
@ethanluoyc

ethanluoyc Aug 15, 2017

Contributor

Just adding a note that eye has been implemented here pytorch/pytorch#2395

Contributor

ethanluoyc commented Aug 15, 2017

Just adding a note that eye has been implemented here pytorch/pytorch#2395

@sjain-stanford

This comment has been minimized.

Show comment
Hide comment
@sjain-stanford

sjain-stanford Dec 22, 2017

Thanks for supporting many of the math functions in THC. I'm now waiting for histc!

Thanks for supporting many of the math functions in THC. I'm now waiting for histc!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment