-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a unique function returning only the unique values in a vector. #940 #965
base: master
Are you sure you want to change the base?
Implement a unique function returning only the unique values in a vector. #940 #965
Conversation
Linked to #940 |
src/stdlib_sorting_unique_impl.fypp
Outdated
return | ||
endif | ||
|
||
! Create a temporary copy that may be sorted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a matter of style here, but you could be more concise with:
temp_array = array ; if (want_sorted) call sort(temp_array)
! Start with first element always marked as unique | ||
mask(1) = .true. | ||
|
||
! Compare each element with previous to mark duplicates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The left-hand side assignment can done be in any order. You could use
do concurrent (i=2:n)
mask(i) = temp_array(i) /= temp_array(i-1)
enddo
I believe. If multithreading is being used, my understanding is that it'll enable the compiler to better optimize and speed-up the calculation (notably for very large arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another possibility using array syntax is simply mask(:n-1) = temp_array(:n-1) /= temp_array(2:)
. Not sure what would be the fastest here.
src/stdlib_sorting_unique_impl.fypp
Outdated
allocate(unique_values(unique_count)) | ||
|
||
! Extract unique elements to result array | ||
j = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your do
loop can be replaced by a one-liner: unique_values = pack(temp_array, mask)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the lines 56-67 could be replaced as follows:
allocate(unique_values, source = pack(temp_array, mask)
or
unique_values = pack(temp_array, mask)
want_sorted = optval(sorted, .false.) | ||
|
||
n = size(array) | ||
if (n == 0) then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also treat the case n == 1
here. If n == 1
, there is only a single value to return.
Thanks @demoncoder-crypto for this PR. Before reviewing it, could you please add a short summary of the key points regarding the What(s), Why(s) and/or How(s) of the PR in your first comment? It is very helpful for the reviewing process to have summary of what is being proposed. You can edit your first comment following the three dots to the right. |
! Get sorted unique values from a real array | ||
real :: a(8) = [3.1, 2.5, 7.2, 3.1, 2.5, 8.0, 7.2, 9.5] | ||
real, allocatable :: b(:) | ||
b = unique(a, sorted=.true.) ! b will be [2.5, 3.1, 7.2, 8.0, 9.5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, having the syntax
b = unique(a, sorted=.true.)
would imply that on entry a
is already sorted such that some internal logic can be skipped. Clearly, a
is not sorted in this example and so I find the syntax a bit counter-intuitive. Maybe you need change this internally.
@jalvesz for sure going ahead i will put summary of everything to make sure to reduce the load for reviews |
I was looking at the implementation, please consider the following review, I'll state the key points after the code #:include "common.fypp"
#:set INT_TYPES_ALT_NAME = list(zip(INT_KINDS, INT_TYPES, INT_KINDS))
#:set REAL_TYPES_ALT_NAME = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX))
#:set COMPLEX_TYPES_ALT_NAME = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX))
#:set STRING_TYPES_ALT_NAME = list(zip(STRING_TYPES, STRING_TYPES, STRING_KINDS))
#:set CHAR_TYPES_ALT_NAME = list(zip(["character(len=*)"], ["character(len=len(array))"], ["char"]))
#:set IRSC_KINDS_TYPES = INT_TYPES_ALT_NAME + REAL_TYPES_ALT_NAME + COMPLEX_TYPES_ALT_NAME + STRING_TYPES_ALT_NAME + CHAR_TYPES_ALT_NAME
submodule (stdlib_sorting_unique) stdlib_sorting_unique_impl
use stdlib_kinds
use stdlib_sorting, only: sort
implicit none
integer, parameter :: ilp = int64
contains
#:for k, t, s in IRSC_KINDS_TYPES
pure module function unique_1d_${s}$(array,sorted) result(unique_values)
${t}$, intent(in) :: array(:)
logical(lk), intent(in), optional :: sorted
${t}$, allocatable :: unique_values(:)
${t}$, allocatable :: temp(:)
integer(ilp) :: i, j, n
n = size(array,kind=ilp)
if (n == 0) return
! Create a temporary copy for sorting purposes
allocate(temp(n), source = array)
if(present(sorted))then
if(.not.sorted) call sort(temp)
else
! Remove duplicates
j = 0
do i = 2, n
if( temp(i) == temp(i-1) ) then
j = j + 1
else
temp(i-j) = temp(i)
end if
end do
n = n - j
! Transfer unique values to output array
allocate(unique_values(n), source = temp(1:n))
end function unique_1d_${s}$
#:endfor
end submodule stdlib_sorting_unique_impl
Other point, in the documentation, please put all your "runnable" examples in the example folder and simply import them in the .md file for the documentation. Once you have integrated these points, let's discuss how to takle rank>1 arrays. |
Yes I am currently working on this issue. I will update you shortly on this. I am understanding every comment made here so it might take some time but I am really grateful for all the responsiveness and guidance. Thanks means a lot |
I agree on the excellent reviews, please keep going! The only objection is on this point: function results with the So when empty, we should return an empy array i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @demoncoder-crypto for this PR. It will be quite useful to many users IMO.
Here are some suggestions regarding the API, code and its integration in the stdlib
structure
@@ -0,0 +1,176 @@ | |||
--- | |||
title: unique function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unique
is included in stdlib_sorting
, the specs of unique
should be added in stdlib_sorting
.
|
||
The `unique` function is currently in **experimental** status. | ||
|
||
## Version History |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same format as the other stdlib
specs should be used.
|Version|Change| | ||
|---|---| | ||
|v0.1.0|Initial functionality in experimental status| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|Version|Change| | |
|---|---| | |
|v0.1.0|Initial functionality in experimental status| | |
Experimental |
|
||
## Requirements | ||
|
||
This function has been designed to handle arrays of different types, including intrinsic numeric types, character arrays, and `string_type` arrays. The function should be efficient while maintaining an easy-to-use interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not requirements. The content of these sentences will be included in the description of the API of unique
|
||
### `unique` - Returns unique values from an array | ||
|
||
#### Interface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#### Interface | |
#### Syntax |
## Related Functions | ||
|
||
* `sort` - Sorts an array in ascending or descending order | ||
* `sort_index` - Creates index array that would sort an array | ||
* `ord_sort` - Performs a stable sort on an array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Related Functions | |
* `sort` - Sorts an array in ascending or descending order | |
* `sort_index` - Creates index array that would sort an array | |
* `ord_sort` - Performs a stable sort on an array |
@@ -0,0 +1,64 @@ | |||
program example_unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you split this program in smaller programs and include them in the specs, please?
!! TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE | ||
!! SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. | ||
module stdlib_sorting_unique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be a submodule of stdlib_sort
more appropriate, as it is suggested to access it through stdlib_sorting
in the examples?
src/stdlib_sorting_unique_impl.fypp
Outdated
unique_count = count(mask) | ||
allocate(unique_values(unique_count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines could be combined into a single one.
src/stdlib_sorting_unique_impl.fypp
Outdated
allocate(unique_values(unique_count)) | ||
|
||
! Extract unique elements to result array | ||
j = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the lines 56-67 could be replaced as follows:
allocate(unique_values, source = pack(temp_array, mask)
or
unique_values = pack(temp_array, mask)
@jvdp1 maybe you could check out the modified version proposed here #965 (comment) regarding the internal implementation. |
Sorry, I started my review early today and missed your comment. I am puzzled with the aim of |
I think it only makes sense for the input as in "is the input sorted or not?" Because in any case the output will be sorted. This flag allows controlling whether to apply the sorting or not to the temporary array which only makes sense to skip if one knows the input is already ordered but that contains duplicates. |
After reading all the helpful comments. I have decided to 1)- Clarify the Semantics of the sorted Flag- I will change the interpretation so that the optional sorted flag indicates whether the input is already sorted. If it’s .true., the function will skip sorting the temporary array, thereby saving computational time. Otherwise, it will sort the array before removing duplicates. This way, the output will consistently be in sorted order when duplicates are removed, but the flag serves as an efficiency hint. 2)- Edge Case Handling- Empty Array: When the input array is empty, I will ensure that the function returns an allocated array of size zero, in line with Fortran standard requirements for allocatable function results. Singleton Array: I will explicitly check if the size of the input is one and, if so, return the single element immediately without unnecessary processing. I will first focus on these two issues and then proceed to make iterative changes over the next few things. I have decided to keep the preview of next commit manageable. I sincerely thanks so much of input I will address them in coming commits when I resolve the errors mentioned. Thank you all for such support |
I have tried to implement changes above, when merging the branch I don't know something weird happened when I merged with head and I tried to fix it hopefully I did Let me know if the points I mentioned above were successfully integrated |
The build fails due to a circular dependency. As @jvdp1 suggested:
there should be a |
Yes I will fix it shortly. Thanks for guiding |
Please let me know if this is the right approach