-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
navi14 (gfx1012): git apply can not find file patch/22.rocblas-ninja-1.patch #35
Comments
I upgrade navi10 documents and scripts to the latest version - ROCm-5.2.3. And my suggestion is try The documents is based on the rocm-build, if you want to build ROCm from zero, you need run all of scripts from rocm-build one-by-one. |
So I should try installing the ROCm 5.2.3 package instead first? I noticed they moved the install documentation to a new site since 5.x. The requirements seem to be the same as for 4.3. Is the |
In my environment, rx5700xt need amdgpu-5.2.3 to prevent PCIe atomic requirement issue. If there is kfd atomic problem in dmesg, you can try latest amdgpu dkms. So my suggestion is using latest amdgpu-dkms to prevent these issues. |
Okay I've set up the new 5.2.3 environment. The issue now is the navi14 rocblas patch seems to be failing:
I manually checked both files (Tensile/Common.py and Tensile/TensileCreateLibrary.py) and indeed the patches don't apply. I believe everything is correctly set up on my end. Is |
Yes, I forget to update gfx1012 patch. |
So I am trying to compile everything running every script. I could not succeed with just the scripts mentioned under I compiled everything successfully until miopen. First, the compiler complained about lacking "boost_filesystem" so I installed
I tried editing the Here's the full log➜ LC_ALL=C bash 35.miopen.sh /usr/local/pip install cget Requirement already satisfied: cget in /usr/local/lib/python3.8/site-packages (0.2.0) Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/site-packages (from cget) (8.1.3) Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.8/site-packages (from cget) (1.16.0) Downloading https://github.com/pfultz2/rocm-recipes/archive/HEAD.tar.gz |
The MIOpen need a customized boost-filesystem, we need I guess MIOpen maybe need a static library boost-filesystem. |
I see now. I removed my distro's related libboost*-dev packages and re-ran the script. The cget script exits successfully:
However the missing dependency error remains. There is in fact no If it's correctly building boost-filesystem, I'm not sure where it's being installed. (Edit: Full log➜ LC_ALL=C bash 35.miopen.sh /usr/local/pip install cget Requirement already satisfied: cget in /usr/local/lib/python3.8/site-packages (0.2.0) Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/site-packages (from cget) (8.1.3) Requirement already satisfied: six>=1.10 in /usr/local/lib/python3.8/site-packages (from cget) (1.16.0) Downloading https://github.com/pfultz2/rocm-recipes/archive/HEAD.tar.gz Extracting archive /usr/local/cget/build/tmp-f8d141f9a42f42f2b9bba7f78f4ba02e/HEAD.tar.gz ...
near the top of the file, but after cmake_minimum_required(). CMake is pretending there is a "project(Project)" command on the first -- The C compiler identification is Clang 14.0.0
should be added at the top of the file. The version specified may be lower -- Configuring done
-- Build files have been written to: /usr/local/cget/build/tmp-f8d141f9a42f42f2b9bba7f78f4ba02e/build CMake Warning at /home/tom/.local/cmake-3.16.8-Linux-x86_64/share/cmake-3.16/Modules/FindBoost.cmake:851 (message): CMake Error at /home/tom/.local/cmake-3.16.8-Linux-x86_64/share/cmake-3.16/Modules/FindPackageHandleStandardArgs.cmake:146 (message): -- Configuring incomplete, errors occurred! |
There is no compiling and installing logs of boost. |
Well, to start: ginger.amd.com is down. That's the domain address where the builder wants to get boost from. It's the same for the other dependencies (bzip2, sqlite3, zlib). I don't know why the logs don't show a connection/download issue though. |
I searched for the boost package hash included in the file linked above. It matches the official 1.72.0 boost package file, which means installing that version of boost-system and boost-filesystem in my system could be enough to go. My Mint version is based off of Ubuntu 20.04 which only provides libboost 1.71.0, which is possibly why my previous attempt was failing. |
I managed to run I failed to run Right now I can't build rocalution. Not sure why I'm getting these GPU target related errors. What do you think?
|
At this version, rocalution is not necessary for pytorch or tensorflow, so we can skip it. It will be fixed on rocm-5.3 or later. |
So between While trying to run
The compiler/linker seems to be using libprotobuf v3.2.0 (automatically fetched and built). But the ONNX source is referencing protobuf code written for a much later version (v3.15+). Should I change the ONNX / AMDMIGraphX dependency to use a newer libprotobuf? What do you think? |
miopen and rocsolver is the required components, the components after rocsolver is not used by tensorflow or pytorch now. amdmigraphx is a onnx runtime like onnx-runtime of microsoft, actually I havent test amdgraphx for a while. |
I went ahead and began compiling pytorch. Everything went well until:
The linker then complains that |
Which version branch of pytorch do you use? I havn't met this issue before. |
Sorry, I was once again victim of Mint. For some reason after reboot the /opt/rocm symlink gets lost, which was causing several issues. I finally managed to compile and install pytorch 1.12.1. Now to test it on my GPU. 😄 Thanks so much for your help! It's been a challenge and I couldn't have pulled through on my own. |
Sadly, my GPU is locking up when I run this simple test: import torch
torch.tensor([1., 2.], device='cuda') Even when Many things could have gone wrong here. Not sure if there's much I can do now. Edit: I just realized this test fails differently if I run python from the rocm-build environment (i.e. if I execute |
The
You could do some test before run pytorch, as pytorch is the last piece of the whole MI tasks. E.G. |
What's causing Without HSA override, Here are logs for the command you suggested (video hanged for several seconds until I killed the process):
|
I have to say that I had met this hanging on mem sync things, and after I move my gpu from the lower PCIe slot to other higher PCIe slot, it is ok, then. I don't know whether there is cpu or motherboard or hard driver 's pass number limit or south bridge things. But ther are two PCIe slots on the motherboard, one ok, one hanging. So my suggestion is changing other PCIe slot, and try again. And good luck. That is AMD and ROCm things, you know. |
I reinstalled
Now to see what I can actually do with navi14. Thank you very much @xuhuisheng once more. |
I think there was a Tensile related problem during rocblas build, because this happens when running stable-diffusion:
I checked and that path has no
Notice the What do you think? |
Seems I didn't sync navi14/22.rocblas.sh with gfx803, I changed Tensile_SEPARATE_ARCHITECTURES to OFF, the dat file appeared. I had pushed updated scripts to git, please try. |
Ok now it's attempting to generate the libraries, but is failing:
Also a python error at the end complaining for lack of msgpack symbols, which is strange because the python module is supposedly installed in Tensile's virtualenv. Attaching full log. |
Environment
What is the expected behavior
I am trying to build ROCm for my navi14 GPU. Dependencies and environment are installed and set. I am following the instructions from
navi14/REAME.md
.What actually happens
When executing
bash navi14/22.rocblas.sh
the script exits with error becausegit apply
can not find filepatch/22.rocblas-ninja-1.patch
. I looked in the repo and the file is no longer there since commit 7759bdb. I am not sure how to proceed from here.How to reproduce
It is not clear in the README: I ran
bash navi14/22.rocblas.sh
before running any other script because it is the first recommendation, before the list of 10 scripts to run. Do I need to run it in order as well?The text was updated successfully, but these errors were encountered: