You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice a tiny inconsistency in your implementation against the paper. It seems the unlabeled data you used (from 6001 book to 34742 dvd) was way more than Blitzer used (from 3685 to 5945). I am not if you have used all of them in your actual experiment so I have to confirm this from you.
Also I wonder if you can recall how exactly did you collect the data. It seems there are two amazon datasets (a big one and a small one) according to this paper: http://www.icml-2011.org/papers/342_icmlpaper.pdf
And clearly Blitzer used the small one in this SCL paper. But in your implementation, the labeled data you used has the same size as Blitzer's (2000 positive, 2000 negative for each domain), I just wanted to know where did you get the unlabeled data from as it seems the small amazon dataset doesn't seem to have that much data.
Thanks as always
The text was updated successfully, but these errors were encountered:
We wrote about it in the appendix (B Experimental Choices) as follows : "Variants of the Product Review Data There are two releases of the datasets of the Blitzer et al. (2007) cross-domain product review task. We use the one from http://www.cs.jhu.edu/˜mdredze/datasets/sentiment/index2.html where the data is imbalanced, consisting of more positive than negative reviews.
We believe that our setup is more realistic as when collecting unlabeled data, it is hard to get a
balanced set. Note that Blitzer et al. (2007) used the other release where the unlabeled data consists
of the same number of positive and negative reviews.
I know that "Danushka Bollegala, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Unsupervised cross-domain word representation learning. In Proc. of ACL." also used this variant.
Hi it's me again...
I notice a tiny inconsistency in your implementation against the paper. It seems the unlabeled data you used (from 6001 book to 34742 dvd) was way more than Blitzer used (from 3685 to 5945). I am not if you have used all of them in your actual experiment so I have to confirm this from you.
Also I wonder if you can recall how exactly did you collect the data. It seems there are two amazon datasets (a big one and a small one) according to this paper:
http://www.icml-2011.org/papers/342_icmlpaper.pdf
And clearly Blitzer used the small one in this SCL paper. But in your implementation, the labeled data you used has the same size as Blitzer's (2000 positive, 2000 negative for each domain), I just wanted to know where did you get the unlabeled data from as it seems the small amazon dataset doesn't seem to have that much data.
Thanks as always
The text was updated successfully, but these errors were encountered: