Implement a unique function returning only the unique values in a vector. #940 #965

demoncoder-crypto · 2025-03-25T17:36:31Z

Please let me know if this is the right approach

perazz · 2025-03-26T08:54:43Z

Linked to #940

loiseaujc · 2025-03-27T20:34:20Z

src/stdlib_sorting_unique_impl.fypp

+            return
+        endif
+
+        ! Create a temporary copy that may be sorted


Just a matter of style here, but you could be more concise with:

temp_array = array ; if (want_sorted) call sort(temp_array)

loiseaujc · 2025-03-27T20:36:50Z

src/stdlib_sorting_unique_impl.fypp

+        ! Start with first element always marked as unique
+        mask(1) = .true.
+
+        ! Compare each element with previous to mark duplicates


The left-hand side assignment can done be in any order. You could use

do concurrent (i=2:n) mask(i) = temp_array(i) /= temp_array(i-1) enddo

I believe. If multithreading is being used, my understanding is that it'll enable the compiler to better optimize and speed-up the calculation (notably for very large arrays).

Another possibility using array syntax is simply mask(:n-1) = temp_array(:n-1) /= temp_array(2:). Not sure what would be the fastest here.

loiseaujc · 2025-03-27T20:44:50Z

src/stdlib_sorting_unique_impl.fypp

+        allocate(unique_values(unique_count))
+
+        ! Extract unique elements to result array
+        j = 0


Your do loop can be replaced by a one-liner: unique_values = pack(temp_array, mask).

Actually, the lines 56-67 could be replaced as follows:

allocate(unique_values, source = pack(temp_array, mask)

or

unique_values = pack(temp_array, mask)

loiseaujc · 2025-03-27T20:47:15Z

src/stdlib_sorting_unique_impl.fypp

+        want_sorted = optval(sorted, .false.)
+
+        n = size(array)
+        if (n == 0) then


You can also treat the case n == 1 here. If n == 1, there is only a single value to return.

jalvesz · 2025-03-28T07:43:55Z

Thanks @demoncoder-crypto for this PR. Before reviewing it, could you please add a short summary of the key points regarding the What(s), Why(s) and/or How(s) of the PR in your first comment? It is very helpful for the reviewing process to have summary of what is being proposed. You can edit your first comment following the three dots to the right.

loiseaujc · 2025-03-28T08:06:21Z

doc/specs/stdlib_sorting_unique.md

+! Get sorted unique values from a real array
+real :: a(8) = [3.1, 2.5, 7.2, 3.1, 2.5, 8.0, 7.2, 9.5]
+real, allocatable :: b(:)
+b = unique(a, sorted=.true.)  ! b will be [2.5, 3.1, 7.2, 8.0, 9.5]


To me, having the syntax

b = unique(a, sorted=.true.)

would imply that on entry a is already sorted such that some internal logic can be skipped. Clearly, a is not sorted in this example and so I find the syntax a bit counter-intuitive. Maybe you need change this internally.

demoncoder-crypto · 2025-03-28T09:17:58Z

@jalvesz for sure going ahead i will put summary of everything to make sure to reduce the load for reviews

jalvesz · 2025-03-29T12:35:24Z

I was looking at the implementation, please consider the following review, I'll state the key points after the code

#:include "common.fypp"

#:set INT_TYPES_ALT_NAME = list(zip(INT_KINDS, INT_TYPES, INT_KINDS))
#:set REAL_TYPES_ALT_NAME = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX))
#:set COMPLEX_TYPES_ALT_NAME = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX))
#:set STRING_TYPES_ALT_NAME = list(zip(STRING_TYPES, STRING_TYPES, STRING_KINDS))
#:set CHAR_TYPES_ALT_NAME = list(zip(["character(len=*)"], ["character(len=len(array))"], ["char"]))

#:set IRSC_KINDS_TYPES = INT_TYPES_ALT_NAME + REAL_TYPES_ALT_NAME + COMPLEX_TYPES_ALT_NAME + STRING_TYPES_ALT_NAME + CHAR_TYPES_ALT_NAME

submodule (stdlib_sorting_unique) stdlib_sorting_unique_impl
    use stdlib_kinds
    use stdlib_sorting, only: sort
    implicit none

    integer, parameter :: ilp = int64

contains

#:for k, t, s in IRSC_KINDS_TYPES
    pure module function unique_1d_${s}$(array,sorted) result(unique_values)
        ${t}$, intent(in) :: array(:)
        logical(lk), intent(in), optional :: sorted
        ${t}$, allocatable :: unique_values(:)

        ${t}$, allocatable :: temp(:)
        integer(ilp) :: i, j, n

        n = size(array,kind=ilp)
        if (n == 0) return

        ! Create a temporary copy for sorting purposes
        allocate(temp(n), source = array)
        if(present(sorted))then
            if(.not.sorted) call sort(temp)
        else

        ! Remove duplicates
        j = 0
        do i = 2, n
            if( temp(i) == temp(i-1) ) then
                j = j + 1
            else
                temp(i-j) = temp(i)
            end if
        end do
        n = n - j
        ! Transfer unique values to output array
        allocate(unique_values(n), source = temp(1:n))
    end function unique_1d_${s}$

#:endfor

end submodule stdlib_sorting_unique_impl

Prefere the following naming convention for internal implementations: <name>_<rank>d_<kind_suffix>
The sorted input variable actually should be meant as an attribute for the input array, not the ouput: In order to create a unique array the input array should be sorted, the precedure can give the option to the user to say "is the input sorted? I won't sort it then to save on computational time", the default behaviour: assume it is not sorted.
You should not allocate an array to size 0, this is an error.
In the implementation proposal here, you don't need to allocate a mask array. Duplicates are thrown out in the last loop.

Other point, in the documentation, please put all your "runnable" examples in the example folder and simply import them in the .md file for the documentation.

Once you have integrated these points, let's discuss how to takle rank>1 arrays.

demoncoder-crypto · 2025-03-29T12:39:39Z

Yes I am currently working on this issue. I will update you shortly on this. I am understanding every comment made here so it might take some time but I am really grateful for all the responsiveness and guidance. Thanks means a lot

perazz · 2025-03-29T14:43:49Z

3. You should not allocate an array to size 0, this is an error.

I agree on the excellent reviews, please keep going! The only objection is on this point: function results with the allocatable attribute MUST always return an allocated value, per Fortran standard 15.6.2.2 "If the function result is not a pointer, its value shall be defined by the function.". It is necessary, (though not sufficient), for an allocatable function result (any object) to be allocated to be defined. see also where I also learned this the hard way.

So when empty, we should return an empy array i.e. allocate(array(0))

jvdp1

Thank you @demoncoder-crypto for this PR. It will be quite useful to many users IMO.
Here are some suggestions regarding the API, code and its integration in the stdlib structure

jvdp1 · 2025-03-29T18:46:33Z

doc/specs/stdlib_sorting_unique.md

@@ -0,0 +1,176 @@
+---
+title: unique function


If unique is included in stdlib_sorting, the specs of unique should be added in stdlib_sorting.

jvdp1 · 2025-03-29T18:47:07Z

doc/specs/stdlib_sorting_unique.md

+
+The `unique` function is currently in **experimental** status.
+
+## Version History


The same format as the other stdlib specs should be used.

jvdp1 · 2025-03-29T18:47:24Z

doc/specs/stdlib_sorting_unique.md

+|Version|Change|
+|---|---|
+|v0.1.0|Initial functionality in experimental status|


Suggested change

|Version|Change|

|---|---|

|v0.1.0|Initial functionality in experimental status|

Experimental

jvdp1 · 2025-03-29T18:48:34Z

doc/specs/stdlib_sorting_unique.md

+
+## Requirements
+
+This function has been designed to handle arrays of different types, including intrinsic numeric types, character arrays, and `string_type` arrays. The function should be efficient while maintaining an easy-to-use interface.


These are not requirements. The content of these sentences will be included in the description of the API of unique

jvdp1 · 2025-03-29T18:50:24Z

doc/specs/stdlib_sorting_unique.md

+
+### `unique` - Returns unique values from an array
+
+#### Interface


Suggested change

#### Interface

#### Syntax

jvdp1 · 2025-03-29T18:58:28Z

doc/specs/stdlib_sorting_unique.md

+## Related Functions
+
+* `sort` - Sorts an array in ascending or descending order
+* `sort_index` - Creates index array that would sort an array
+* `ord_sort` - Performs a stable sort on an array 


Suggested change

## Related Functions

* `sort` - Sorts an array in ascending or descending order

* `sort_index` - Creates index array that would sort an array

* `ord_sort` - Performs a stable sort on an array

jvdp1 · 2025-03-29T18:59:46Z

example/sorting/example_unique.f90

@@ -0,0 +1,64 @@
+program example_unique


Could you split this program in smaller programs and include them in the specs, please?

jvdp1 · 2025-03-29T19:02:49Z

src/stdlib_sorting_unique.fypp

+!!   TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+!!   SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+module stdlib_sorting_unique


Would this be a submodule of stdlib_sort more appropriate, as it is suggested to access it through stdlib_sorting in the examples?

jvdp1 · 2025-03-29T19:06:34Z

src/stdlib_sorting_unique_impl.fypp

+        unique_count = count(mask)
+        allocate(unique_values(unique_count))


These two lines could be combined into a single one.

jvdp1 · 2025-03-29T19:09:11Z

src/stdlib_sorting_unique_impl.fypp

+        allocate(unique_values(unique_count))
+
+        ! Extract unique elements to result array
+        j = 0


Actually, the lines 56-67 could be replaced as follows:

allocate(unique_values, source = pack(temp_array, mask)

or

unique_values = pack(temp_array, mask)

jalvesz · 2025-03-29T19:21:17Z

@jvdp1 maybe you could check out the modified version proposed here #965 (comment) regarding the internal implementation.

jvdp1 · 2025-03-29T19:41:15Z

@jvdp1 maybe you could check out the modified version proposed here #965 (comment) regarding the internal implementation.

Sorry, I started my review early today and missed your comment. I am puzzled with the aim of sorted (is it for the input (as in the code) or for the output (as in the specs)?). Otherwise, your code LGTM. It probably avoids a temporary array generated by pack. It could include the case n == 1 in addition to n==0.
Also, allocate(temp(n), source = array) could be written as allocate(temp, source = array).

jalvesz · 2025-03-29T21:14:25Z

I am puzzled with the aim of sorted (is it for the input (as in the code) or for the output

I think it only makes sense for the input as in "is the input sorted or not?" Because in any case the output will be sorted. This flag allows controlling whether to apply the sorting or not to the temporary array which only makes sense to skip if one knows the input is already ordered but that contains duplicates.

demoncoder-crypto · 2025-03-29T21:28:18Z

After reading all the helpful comments. I have decided to

1)- Clarify the Semantics of the sorted Flag- I will change the interpretation so that the optional sorted flag indicates whether the input is already sorted. If it’s .true., the function will skip sorting the temporary array, thereby saving computational time. Otherwise, it will sort the array before removing duplicates. This way, the output will consistently be in sorted order when duplicates are removed, but the flag serves as an efficiency hint.

2)- Edge Case Handling- Empty Array: When the input array is empty, I will ensure that the function returns an allocated array of size zero, in line with Fortran standard requirements for allocatable function results. Singleton Array: I will explicitly check if the size of the input is one and, if so, return the single element immediately without unnecessary processing.

I will first focus on these two issues and then proceed to make iterative changes over the next few things. I have decided to keep the preview of next commit manageable. I sincerely thanks so much of input I will address them in coming commits when I resolve the errors mentioned. Thank you all for such support

…ize edge cases

demoncoder-crypto · 2025-03-30T17:47:21Z

I have tried to implement changes above, when merging the branch I don't know something weird happened when I merged with head and I tried to fix it hopefully I did Let me know if the points I mentioned above were successfully integrated

perazz · 2025-03-31T10:40:06Z

The build fails due to a circular dependency. As @jvdp1 suggested:

Would this be a submodule of stdlib_sort more appropriate, as it is suggested to access it through stdlib_sorting in the examples?

there should be a submodule(stdlib_sorting) stdlib_sorting_unique that only contains the implementation. To achieve that, you could remove the _impl submodule, put its contents into stdlib_sorting_unique, then put the interface directly in module stdlib_sorting

demoncoder-crypto · 2025-03-31T10:41:24Z

Yes I will fix it shortly. Thanks for guiding

demoncoder-crypto · 2025-04-01T16:56:49Z

I have tried to remove circular dependency let me know if it works

loiseaujc · 2025-04-01T17:46:57Z

Rather than seeing if it works only whence you commit, the easiest for you would be to actually build and run the tests locally. Assuming you have cmake installed, it is as simple as

cmake -B build # sets up the build process.
cmake --build build # Actually compile stdlib
cmake --build build --target test # Compile and runs the tests

Doing so, you'll be able to test everything locally and only push to github once you have a working implementation.

demoncoder-crypto · 2025-04-01T17:48:00Z

Understood. Thanks

loiseaujc · 2025-06-13T20:47:51Z

@demoncoder-crypto : I've been quite busy over the past couple of months and hardly any time to devote to stdlib. Any chance you've had time to make progress on this PR on your side? :)

Implement unique function that returns only the unique values in a ve…

adc27a4

…ctor (Issue fortran-lang#940)

loiseaujc reviewed Mar 27, 2025

View reviewed changes

loiseaujc reviewed Mar 28, 2025

View reviewed changes

jvdp1 reviewed Mar 29, 2025

View reviewed changes

Improve unique function: clarify sorted parameter semantics and optim…

02ecfc6

…ize edge cases

Refactor unique function to resolve circular dependency

53eaa12

Fix CMakeLists.txt: remove reference to deleted unique impl file

10d7bd4


		The `unique` function is currently in experimental status.

		## Version History


		## Requirements

		This function has been designed to handle arrays of different types, including intrinsic numeric types, character arrays, and `string_type` arrays. The function should be efficient while maintaining an easy-to-use interface.


		### `unique` - Returns unique values from an array

		#### Interface

		unique_count = count(mask)
		allocate(unique_values(unique_count))

Implement a unique function returning only the unique values in a vector. #940 #965

Are you sure you want to change the base?

Implement a unique function returning only the unique values in a vector. #940 #965

Conversation

demoncoder-crypto commented Mar 25, 2025

Uh oh!

perazz commented Mar 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loiseaujc Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loiseaujc Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jalvesz commented Mar 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

demoncoder-crypto commented Mar 28, 2025

Uh oh!

jalvesz commented Mar 29, 2025

Uh oh!

demoncoder-crypto commented Mar 29, 2025

Uh oh!

perazz commented Mar 29, 2025

Uh oh!

jvdp1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jalvesz commented Mar 29, 2025

Uh oh!

jvdp1 commented Mar 29, 2025

Uh oh!

jalvesz commented Mar 29, 2025

Uh oh!

demoncoder-crypto commented Mar 29, 2025

Uh oh!

demoncoder-crypto commented Mar 30, 2025

Uh oh!

perazz commented Mar 31, 2025

Uh oh!

demoncoder-crypto commented Mar 31, 2025

Uh oh!

demoncoder-crypto commented Apr 1, 2025

Uh oh!

loiseaujc commented Apr 1, 2025

Uh oh!

demoncoder-crypto commented Apr 1, 2025

Uh oh!

loiseaujc commented Jun 13, 2025

loiseaujc Mar 27, 2025 •

edited

Loading

loiseaujc Mar 27, 2025 •

edited

Loading