Description
Apache Hop version?
2.13.0
Java version?
17.0.2
Operating system
Linux
What happened?
This issue occurs in a specific scenario when you need to discard and (say) log all the duplicates in a data flow, without preserving a single record in a group of duplicates.
An implementation of this scenario can be done like this:
- create a Unique rows transform, and activate both
Activate counter output?
andRedirect duplicate row
flags: specify a name for theCounter field
(e.g.dup_count
) - link the main output to a Filter rows transform, and specify
dup_count = 1
as condition - link both the error output of Unique rows, and the False output of Filter rows to another transform to process all the duplicates
The issue is that those latter flows differ by a field (dup_count
is not present in the error output), but in Hop GUI they can be connected to the same transform without any warning.
A partial workaround (that doesn't preserve the info of the number of duplicates, in the error procedure) consists in linking a Select values transform to the False output of Filter rows, to strip away the dup_count
field.
(Attached files: a pipeline with the scenario described above. The transforms before Unique rows are needed to generate "100 people that roll two 20-sided dice")
Attached files:
bug-unique-rows.hpl.txt
Issue Priority
Priority: 3
Issue Component
Component: Hop Gui, Component: Transforms