Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instead of the function rot90() #10

Closed
djdll opened this issue Sep 26, 2022 · 6 comments
Closed

Instead of the function rot90() #10

djdll opened this issue Sep 26, 2022 · 6 comments

Comments

@djdll
Copy link

djdll commented Sep 26, 2022

I would like to ask if we can change the fill position instead of rotation by rotating the image 3 times on the right and rotating it back to the original position after interpolation. For example, the previous filling on the right side was changed to filling on the upper, lower, and left sides respectively, and then the interception was corresponding to the previous one. I tried not to rotate, but my experiment found that the result seemed to be wrong. Is there any special significance of rotation.
millions of thanks.

@yhjo09
Copy link
Owner

yhjo09 commented Oct 1, 2022

Hi.
The rotation gives significant PSNR improvement.
In the case of upscale=4, each rotation fills the same 4x4 output location, and you can implement our method without rotating an input image.
The following image shows the filling order for 0, 90, 180, and 270 deg rotations, and this can be implemented by simple output coordinate manipulation.
image

Please refer to our code fragment in Java.

for (int j = 0; j < upscale_factor*upscale_factor; j++) {
    int oy = j / upscale_factor;
    int ox = j % upscale_factor;

    // Ouput coordinates for four rotations
    int O = upscale_factor - 1;
    int outind1 = outind_base + oy * width_hr + ox;
    int outind2 = outind_base + ox * width_hr + (O - oy);
    int outind3 = outind_base + (O - oy) * width_hr + (O - ox);
    int outind4 = outind_base + (O - ox) * width_hr + oy;

    /*
    Read LUT and compute output pixel values  
    */

    // Write output values
    outrgb[outind1 * C + c] += outval1;
    outrgb[outind2 * C + c] += outval2;
    outrgb[outind3 * C + c] += outval3;
    outrgb[outind4 * C + c] += outval4;
}

@djdll
Copy link
Author

djdll commented Oct 11, 2022

Hi,
Your answer helped me a lot, through the paper I know you also hope your research can be used in reality, but I found in the back of the interpolation operation is a 3-loop (c ,h ,w), but this is not efficient in parallel, I tried to change to 2 loops (h, w) but this is not the most ideal, I want to know this part of your small program is how you think to optimize here, this is also my last in c++ implementation speedup trouble?

@yhjo09
Copy link
Owner

yhjo09 commented Oct 15, 2022

Hi.
Yes so you need to find a way to implement it efficiently like using CUDA or multithreading.
For example, we can use Stream API for Java and Android.

@djdll djdll closed this as completed Oct 26, 2022
@mrgreen3325
Copy link

mrgreen3325 commented Nov 8, 2022

Hi, Your answer helped me a lot, through the paper I know you also hope your research can be used in reality, but I found in the back of the interpolation operation is a 3-loop (c ,h ,w), but this is not efficient in parallel, I tried to change to 2 loops (h, w) but this is not the most ideal, I want to know this part of your small program is how you think to optimize here, this is also my last in c++ implementation speedup trouble?

Hello,
can you speed up the interpolation process in c++?

@djdll
Copy link
Author

djdll commented Nov 8, 2022

@mrgreen3325 CUDA can help us achieve this process quickly, but I ran into a new problem that the CPU to GPU data copy speed is too slow, resulting in the time speed still can't improve. The idea for interpolation is to take what was originally a tensor and transform it into a float* and map a multi-dimensional array to a one-dimensional array and then compute the mapping of a multi-dimensional array to a one-dimensional array

@mrgreen3325
Copy link

mrgreen3325 commented Nov 9, 2022

@mrgreen3325 CUDA can help us achieve this process quickly, but I ran into a new problem that the CPU to GPU data copy speed is too slow, resulting in the time speed still can't improve. The idea for interpolation is to take what was originally a tensor and transform it into a float* and map a multi-dimensional array to a one-dimensional array and then compute the mapping of a multi-dimensional array to a one-dimensional array

Thanks for your reply.
So the bottleneck is on the CPU to GPU loading speed, instead of the influence process speed?
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants