Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UNKNOWN_TOPIC_OR_PARTITION check for addMultipleTargetTopics #938

Merged
merged 2 commits into from Nov 4, 2020

Conversation

bpinhosilva
Copy link
Contributor

@bpinhosilva bpinhosilva commented Nov 3, 2020

addMultipleTargetTopics cluster method fails to restore 'targetTopics' property
when kafka has KAFKA_AUTO_CREATE_TOPICS_ENABLE set to false.
This pull request aims to fix this by restoring targetTopics to its original state
when an UNKNOWN_TOPIC_OR_PARTITION error is thrown after trying
to push messages to kafka using a topic
that does not exist and will not be automatically created after refreshing metadata.

Just to give more context, consider a kafka cluster where all nodes have auto create topic set to false. The driver goes to a faulty state after doing the following steps:

  1. Publish a message to an existing topic, it returns success;
  2. Try to send a message to an inexistent topic returns failure;
  3. Try to send a message to an existing topic again. It fails because the driver kept the invalid one from previous request and it cannot recover, even after refreshing metadata.

The only way to make it work again is restarting the connection which is not ideal.

addMultipleTargetTopics cluster method fails to restore 'targetTopics'

when kafka has KAFKA_AUTO_CREATE_TOPICS_ENABLE set to false.
@Nevon
Copy link
Collaborator

Nevon commented Nov 4, 2020

Just to leave a note for future me, we tried to create a test for this issue, but it was harder than expected.

If you turn off topic auto creation, you can obviously reproduce this, but plenty of our other tests depend on this setting and making that change would be really painful. So another option that I proposed was that we use ACLs to reproduce the issue. Basically:

  1. Create a new principal
  2. Give that principal DENY on CREATE TOPIC
  3. Give that principal ALLOW on WRITE TOPIC

I had expected that that would mean that we could explicitly create a topic (with another principal) and then have our new principal produce to that topic. Then have the principal try to produce to a topic that doesn't exist and get the error, and then verify that the principal can still produce to the original topic.

What ended up happening was that despite supposedly not being allowed to create topics, the principal could create the topic just by producing to it. It seems like the auto creation setting overrides the ACL.

At this point we decided to give up for now. Manual tests showed that the solution worked, although it's a shame we don't have any test to prevent regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants