Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

same user had different result #8

Open
ft20082 opened this issue Dec 18, 2016 · 4 comments
Open

same user had different result #8

ft20082 opened this issue Dec 18, 2016 · 4 comments
Assignees
Labels

Comments

@ft20082
Copy link

ft20082 commented Dec 18, 2016

os environment
hive version:
Hive 1.1.0-cdh5.5.2
cdh version:
cdh 5.5.2

when i query one day like:

select ouid, funnel(tag1, game_time, array('activity'), array('yaozujianglin'), array('huangjinyuchang')) as funnel from sscq.odl_act_detail_info_sscq_qq where ds = '2016-12-17' group by ouid order by ouid asc

one ouid is :
000000000000000000000000001EC76C [0,0,0]

but i only query one id:
select ouid, funnel(tag1, game_time, array('activity'), array('yaozujianglin'), array('huangjinyuchang')) as funnel from sscq.odl_act_detail_info_sscq_qq where ds = '2016-12-17' and ouid = '000000000000000000000000001EC76C' group by ouid order by ouid asc
result is different.
000000000000000000000000001EC76C [1,0,0]

@joshwalters
Copy link
Contributor

joshwalters commented Jan 3, 2017

  1. You don't need the order by ouid asc part of the query, just the group by ouid, assuming that ouid is your unique ID.
  2. I am not sure how it would report that first record but give an empty funnel count of [0,0,0]. Is game_time a timestamp column in a string/long format?

As I don't have access to your data, it may be best for you to show an example failure in a unit test. Here is a sample unit test to use:

@Test
public void testComplete() throws HiveException {
Funnel udaf = new Funnel();
ObjectInspector[] inputObjectInspectorList = new ObjectInspector[]{
PrimitiveObjectInspectorFactory.javaStringObjectInspector, // action_column
PrimitiveObjectInspectorFactory.javaLongObjectInspector, // timestamp_column
ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.javaStringObjectInspector), // funnel_step_1
ObjectInspectorFactory.getStandardListObjectInspector(PrimitiveObjectInspectorFactory.javaStringObjectInspector) // funnel_step_1
};
GenericUDAFParameterInfo paramInfo = new SimpleGenericUDAFParameterInfo(inputObjectInspectorList, false, false);
GenericUDAFEvaluator udafEvaluator = udaf.getEvaluator(paramInfo);
ObjectInspector outputObjectInspector = udafEvaluator.init(Mode.COMPLETE, inputObjectInspectorList);
// Order will be "alpha, beta, gamma, delta" when ordered on timestamp_column
// Funnel is "beta" -> "gamma" -> "epsilon"
// Should return [1, 1, 0] as we don't have an epsilon
Object[] parameters1 = new Object[]{ "beta", 200L, new ArrayList<Object>(), Arrays.asList("beta", "BAD"), null, "gamma", Arrays.asList("epsilon")}; // Test empty list funnel step, and null in funnel step
Object[] parameters2 = new Object[]{"alpha", 100L, Arrays.asList("beta", "BAD"), "gamma", Arrays.asList("epsilon")};
Object[] parameters3 = new Object[]{"delta", 400L, Arrays.asList("beta", "BAD"), "gamma", Arrays.asList("epsilon")};
Object[] parameters4 = new Object[]{"gamma", 200L, Arrays.asList("beta", "BAD"), "gamma", Arrays.asList("epsilon")}; // gamma and beta happen at the same time, beta should come first (sorted on action after timestamp)
Object[] parameters5 = new Object[]{ null, 800L, Arrays.asList("beta", "BAD"), "gamma", Arrays.asList("epsilon")}; // Check null action_column
Object[] parameters6 = new Object[]{"omega", null, Arrays.asList("beta", "BAD"), "gamma", Arrays.asList("epsilon")}; // Check null timestamp
// Process the data
AggregationBuffer agg = udafEvaluator.getNewAggregationBuffer();
udafEvaluator.reset(agg);
udafEvaluator.iterate(agg, parameters1);
udafEvaluator.iterate(agg, parameters2);
udafEvaluator.iterate(agg, parameters3);
udafEvaluator.iterate(agg, parameters4);
udafEvaluator.iterate(agg, parameters5);
udafEvaluator.iterate(agg, parameters6);
Object result = udafEvaluator.terminate(agg);
// Expected
List<Long> expected = new ArrayList<>();
expected.add(1L);
expected.add(1L);
expected.add(0L);
Assert.assertEquals(expected, result);
}

If you can construct a sample unit test that exhibits this behavior, I can then develop a fix for the issue.

As an alternative, if you could provide a Hive script that creates a sample database/table, and inserts some sample artificial records, I could then debug the problem.

@ft20082
Copy link
Author

ft20082 commented Jan 5, 2017

game_time cloumn type is bigint, i guess it maybe serialize and deserialize data error, can get different intermediate result data.

@joshwalters
Copy link
Contributor

I have used bigint timestamp columns with these UDFs before and it works. I don't think that is the issue.

Would it be possible to create a simple Hive script to generate a table, add a few records, and verify that this issue persists? Once there is a replicable failure, we can fix the issue.

@RameshByndoor
Copy link

I have faced the same issue. Sorting the dataset in inner query before calling funnel gave right result. Seems like somewhere sorting is making difference. I am able to reproduce this bug but not able to extract smaller dataset for the case. Something not so direct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants
@ft20082 @joshwalters @RameshByndoor and others