Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support nested fields in Elasticsearch Connector #1001

Merged
merged 1 commit into from Jul 24, 2019

Conversation

zhenxiao
Copy link
Contributor

No description provided.

@cla-bot cla-bot bot added the cla-signed label Jun 15, 2019
@martint martint self-requested a review June 15, 2019 21:18
Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we know the type of the column, it might be better to drive the "un-flattening" based on the fields that are actually needed. I'll think about how we might be able to do this and comment later.

if (length1 == length2) {
return key1.compareTo(key2);
}
return length2 - length1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Integer.compare(). Comparing using subtraction can result in subtle issues due to overflow (unlikely to be an issue here, but let's use the more robust way).

processFields(fieldsMap);
}

private void processFields(TreeMap<String, Object> fieldsMap)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just call the argument fields

fieldsMap.put(prefix, value);
processFields(fieldsMap);
}
else {
Copy link
Member

@martint martint Jul 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can only happen if there are no fields with . left, but it's not immediately obvious. I have the feeling we should be able to structure it in such a way that it's easier to read, but at a minimum, it deserves a comment.

@martint
Copy link
Member

martint commented Jul 8, 2019

@zhenxiao, here's my suggestion for an alternate implementation of the unflatten logic:

    private void extractFromSource(SearchHit hit)
    {
        List<Field> fields = new ArrayList<>();
        for (Map.Entry<String, Object> entry : hit.getSourceAsMap().entrySet()) {
            fields.add(new Field(entry.getKey(), entry.getValue()));
        }
        Collections.sort(fields, Comparator.comparing(Field::getName));

        for (Map.Entry<String, Object> entry : unflatten(fields).entrySet()) {
            setFieldIfExists(entry.getKey(), entry.getValue());
        }
    }

    private static Map<String, Object> unflatten(List<Field> fields)
    {
        return unflatten(fields, 0, 0, fields.size());
    }

    private static Map<String, Object> unflatten(List<Field> fields, int level, int start, int length)
    {
        checkArgument(length > 0, "length must be > 0");

        int limit = start + length;

        Map<String, Object> result = new HashMap<>();
        int anchor = start;
        int current = start;

        do {
            Field field = fields.get(anchor);
            String name = field.getPathElement(level);

            current++;
            if (current == limit || !name.equals(fields.get(current).getPathElement(level))) {
                // We assume that fields can't be both leaves and intermediate nodes
                Object value;
                if (level < field.getDepth() - 1) {
                    value = unflatten(fields, level + 1, anchor, current - anchor);
                }
                else {
                    value = field.getValue();
                }
                result.put(name, value);
                anchor = current;
            }
        }
        while (current < limit);

        return result;
    }

    private final static class Field
    {
        private final String name;
        private final List<String> path;
        private final Object value;

        public Field(String name, Object value)
        {
            this.name = name;
            this.path = Arrays.asList(name.split("\\."));
            this.value = value;
        }

        public String getName()
        {
            return name;
        }

        public int getDepth()
        {
            return path.size();
        }

        public String getPathElement(int level)
        {
            return path.get(level);
        }

        public Object getValue()
        {
            return value;
        }
    }

@zhenxiao
Copy link
Contributor Author

thank you, @martint
get it updated. could you please take a look?

Copy link
Member

@martint martint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'll merge it once travis passes

@martint martint self-assigned this Jul 18, 2019
@martint martint merged commit b4d50cb into trinodb:master Jul 24, 2019
@martint martint mentioned this pull request Jul 24, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants