Larger unlabeled dataset #3

don-tpanic · 2018-07-04T22:28:19Z

Hi it's me again...

I notice a tiny inconsistency in your implementation against the paper. It seems the unlabeled data you used (from 6001 book to 34742 dvd) was way more than Blitzer used (from 3685 to 5945). I am not if you have used all of them in your actual experiment so I have to confirm this from you.

Also I wonder if you can recall how exactly did you collect the data. It seems there are two amazon datasets (a big one and a small one) according to this paper:
http://www.icml-2011.org/papers/342_icmlpaper.pdf
And clearly Blitzer used the small one in this SCL paper. But in your implementation, the labeled data you used has the same size as Blitzer's (2000 positive, 2000 negative for each domain), I just wanted to know where did you get the unlabeled data from as it seems the small amazon dataset doesn't seem to have that much data.

Thanks as always

The text was updated successfully, but these errors were encountered:

yftah89 · 2018-07-04T23:53:39Z

We wrote about it in the appendix (B Experimental Choices) as follows : "Variants of the Product Review Data There are two releases of the datasets of the Blitzer et al. (2007) cross-domain product review task. We use the one from http://www.cs.jhu.edu/˜mdredze/datasets/sentiment/index2.html where the data is imbalanced, consisting of more positive than negative reviews.
We believe that our setup is more realistic as when collecting unlabeled data, it is hard to get a
balanced set. Note that Blitzer et al. (2007) used the other release where the unlabeled data consists
of the same number of positive and negative reviews.

yftah89 · 2018-07-04T23:56:41Z

I know that "Danushka Bollegala, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Unsupervised cross-domain word representation learning. In Proc. of ACL." also used this variant.

don-tpanic closed this as completed May 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Larger unlabeled dataset #3

Larger unlabeled dataset #3

don-tpanic commented Jul 4, 2018

yftah89 commented Jul 4, 2018 •

edited

Loading

yftah89 commented Jul 4, 2018 •

edited

Loading

Larger unlabeled dataset #3

Larger unlabeled dataset #3

Comments

don-tpanic commented Jul 4, 2018

yftah89 commented Jul 4, 2018 • edited Loading

yftah89 commented Jul 4, 2018 • edited Loading

yftah89 commented Jul 4, 2018 •

edited

Loading

yftah89 commented Jul 4, 2018 •

edited

Loading