Nesse repositório contém o código referente à disciplina "Computação em GPU" (GPU28EE) ministrada pelo professor Giovanni Alfredo Guarneri, no último semestre de 2024.
O problema paralelizado foi encontrar o ponto onde ocorre a refração entre dois meios, considerando que a divisão entre os meios pode ser definida por uma série de pontos, e o ponto inicial (emissor) e final (foco) são conhecidos.
Most image reconstruction algorithms for ultrasound (US) data depends on knowing the time which the soundwave takes to propagate from the source of the wave to a point in space, usually called Time Of Flight (TOF). When the source and the point of interest, called focus, are in a single isotropic medium, the TOF is simply computed by
where
When M mediums are involved, the propagation path must follow Snell's rule, depending on each medium propagation speed, as well as the refraction profile shape:
where
In real-world problems the emitter and focus coordinates are likely to be known, hence to compute the TOF one must discover the refraction points.
Consider a two-medium environment, where the soundwave propagates in each medium with speed
where
Fermat's principle states that the ray which obeys the Snell's rule is the one with minimum TOF. Therefore, one way of finding a valid path is by casting the ray-tracing problem as an optimization problem, i.e.:
Parrilla et al [1] solved the non-linear optimization problem through Newton-Raphson. His method has a simple formulation, but becomes numerical unstable under circumstances such as complex geometries.
Instead of just considering the S-points which described the surface, lets defined a continuous function
which results in
Then, the derivative of the TOF at
where the derivative of
and
Assuming the second-order derivative of the TOF is equivalent to
and combining the previous equations, it is possible to find a non-integer index
Since
The algorithm is summarized in the diagram bellow:
The modelling assumes a single emitter and focus, but this does not represent the real scale of problems encountered in
practice. For linear phased array transducer, the number of emitters normally range from 32 to 128 elements. For methods
such as Total Focusing Method, the number of focus easily reaches scales such as
Fortunately, the ray-tracing problem as stated previously is completely independent for each emitter focus pair—an ideal scenario for parallel implementation.
In this work for each processing unit of the GPU and CPU parallel implementation will run the Newthon-Raphson algorithm for a different emitter focus pair. The bellow Figure shows a diagram of the parallel implementation.
I developed a test scenario to try the algorithm validity, as well as computing the speed-up obtained by implementing the code in parallel architecture.
The two mediums were considered to be water (
All the results were obtained using a 12th Gen Intel Core i5-12450H (12 cores), with 16 GB 2666 MHz DDR4 RAM and a NVIDIA GeForce RTX 3050 Laptop GPU 4 GB GDDR6 VRAM (2048 CUDA cores).
The first experiment consider a single focus within the steel at
The final result is shown in the Figure bellow, where red rays are obtained by CPU serial implementation,
while dark-green rays by GPU parallel implementation; upon close inspection, it is possible to notice that the rays
are virtually identical. The number of emitter-focus pair is
The second experiment aims to show the speed-up obtained by parallel computing. A punctual source is considered at
For lower number of focuses, the parallelization overhead makes the serial implementation faster. However, as the number increases, the GPU quickly overcome the runtime difference at around 100 focuses (10 point for each axis). On the other side, the CPU parallel implementation takes longer to have a lower runtime when compared to CPU serial. When comparing GPU and CPU parallel based implementations, for the simulated span the GPU wins, but it seems that for a large number of focus this difference progressively decreases. This might be related to the fact that the GPU implementation does not take into advantage many available features such as shared memory.
The Python-related dependencies are listed in the "requirements.txt" file:
matplotlib # for data visualization
scipy # for signal processing algorithms
numpy # for general numerical processing
Besides, the project implements parallel computing using CUDA, thus some additional tools are required, namely:
- CUDA compiler.
- MAKE and CMAKE to properly compile the project, if one desired to use the automatic configs available at build.sh and clean.sh files.



