Skip to content

[Bug]: Unique rows - error handling behavior #5230

Open
@dave-csc

Description

@dave-csc

Apache Hop version?

2.13.0

Java version?

17.0.2

Operating system

Linux

What happened?

This issue occurs in a specific scenario when you need to discard and (say) log all the duplicates in a data flow, without preserving a single record in a group of duplicates.

An implementation of this scenario can be done like this:

  • create a Unique rows transform, and activate both Activate counter output? and Redirect duplicate row flags: specify a name for the Counter field (e.g. dup_count)
  • link the main output to a Filter rows transform, and specify dup_count = 1 as condition
  • link both the error output of Unique rows, and the False output of Filter rows to another transform to process all the duplicates

The issue is that those latter flows differ by a field (dup_count is not present in the error output), but in Hop GUI they can be connected to the same transform without any warning.

A partial workaround (that doesn't preserve the info of the number of duplicates, in the error procedure) consists in linking a Select values transform to the False output of Filter rows, to strip away the dup_count field.

(Attached files: a pipeline with the scenario described above. The transforms before Unique rows are needed to generate "100 people that roll two 20-sided dice")


Attached files:
bug-unique-rows.hpl.txt

Issue Priority

Priority: 3

Issue Component

Component: Hop Gui, Component: Transforms

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions