Aggregation Prediction uses select column from the ground truth #12

ghost · 2018-01-17T23:15:00Z

The below code snippet looks like using the column position for select in evaluation as well as training.

SQLNet/sqlnet/utils.py

Line 204 in 5dfb96e

gt_sel_seq = [x[1] for x in ans_seq]

SQLNet/sqlnet/model/sqlnet.py

Line 131 in 5dfb96e

col_name_len, col_len, col_num, gt_sel=gt_sel)

SQLNet/sqlnet/model/modules/aggregator_predict.py

Line 41 in 5dfb96e

chosen_sel_idx = torch.LongTensor(gt_sel)

Don't you think the model should predict the column for select instead of the column given as ground truth before aggregation prediction?
In fact, the select column will not be given in prediction time when applying this to real worlds.

You made selection prediction, so the output could be fed with the aggregation prediction.
In result, the evaluation number might be wrong while comparing with the original paper, seq2sql.

guotong1988 · 2018-01-22T06:30:16Z

#3

ghost · 2018-01-26T10:36:20Z

@guotong1988

Here is the code snippet to replace the ground truth to the prediction.
So it will work well in evaluation as well as training.

        if gt_sel is None:
            if self.trainable_emb:
                x_emb_var, x_len = self.sel_embed_layer.gen_x_batch(q, col)
                col_inp_var, col_name_len, col_len = \
                        self.sel_embed_layer.gen_col_batch(col)
                max_x_len = max(x_len)
                sel_score = self.sel_pred(x_emb_var, x_len, col_inp_var,
                        col_name_len, col_len, col_num)
            else:
                sel_score = self.sel_pred(x_emb_var, x_len, col_inp_var,
                        col_name_len, col_len, col_num)

            pred_sel = []
            for b in range(B):
                pred_sel.append(np.asscalar(np.argmax(sel_score[b].data.cpu().numpy())))

            # use the preds for aggregator
            gt_sel = pred_sel

ThomasLecat · 2018-02-05T10:48:52Z

@deepcoord

Thanks for the code snippet.
I have a question though:

If I'm not mistaken, with your fix, sel_score is computed twice:

a first time by your piece of code before predicting the aggregator
a second time by the original code after predicting the aggregator

Wouldn't it be simpler to just reverse the order in which sel_score and agg_score are computed in the forward method in sqlnet.py ? By doing so, we could compute sel_score only one time and add a shorter snippet of code just before predicting the aggregator :

if gt_sel is None:
      sel_score_np = sel_score.data.cpu().numpy()
      gt_sel = [np.asscalar(np.argmax(sel_score_np[b])) for b in range(B)]

Am I missing something?

xiaojunxu · 2018-02-12T02:58:30Z

Thanks for recognizing the issue! Yes, this will lead to a wrong break-down result on aggregator but will not affect the overall acc_qm or acc_ex. I will fix the bug soon.

farsmile mentioned this issue Feb 21, 2019

Prediction uses select column from the ground truth naver/sqlova#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aggregation Prediction uses select column from the ground truth #12

Aggregation Prediction uses select column from the ground truth #12

ghost commented Jan 17, 2018

guotong1988 commented Jan 22, 2018

ghost commented Jan 26, 2018 •

edited by ghost

Loading

ThomasLecat commented Feb 5, 2018 •

edited

Loading

xiaojunxu commented Feb 12, 2018

Aggregation Prediction uses select column from the ground truth #12

Aggregation Prediction uses select column from the ground truth #12

Comments

ghost commented Jan 17, 2018

guotong1988 commented Jan 22, 2018

ghost commented Jan 26, 2018 • edited by ghost Loading

ThomasLecat commented Feb 5, 2018 • edited Loading

xiaojunxu commented Feb 12, 2018

ghost commented Jan 26, 2018 •

edited by ghost

Loading

ThomasLecat commented Feb 5, 2018 •

edited

Loading