Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Starting cuda setup #32

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

isamu-isozaki
Copy link

@isamu-isozaki isamu-isozaki commented Nov 14, 2023

This is a draft PR. I am porting from colossalai. The goal of this pr is to be able run at least 1d attention with cuda kernels as well as optionally triton and jit compiled kernels.
The steps needed for this is

  • copy code from colossal ai
  • build code with setup.py. This is blocked by the setup.py pr. This will probably take the most effort
  • Test the code across jit, triton, and cuda and do an initial benchmark.
  • See if there's any limitations of this approach that won't make it good for pipegoose
  • (optional) try making it work for windows

@isamu-isozaki isamu-isozaki marked this pull request as draft November 14, 2023 00:07
@xrsrke
Copy link
Owner

xrsrke commented Nov 14, 2023

thanks. @isamu-isozaki would be cool if users only needed to install pipegoose through pip one time. That's all - no need to install anything else in order to use the CUDA kernels

also since triton is superior than jit, so maybe we'll write all these activation functions in triton.

@xrsrke
Copy link
Owner

xrsrke commented Nov 14, 2023

@isamu-isozaki also what kernels will you port?

@isamu-isozaki
Copy link
Author

@xrsrke Ah well, it depends on how good of triton kernels we make. Sometimes torch.compile can give better results than triton.
For now, I want to experiment with just porting the attention kernel from colossalai and the dependencies for that. Ideally, like you said a single pip install should do the job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants