-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
finish basic_text_classification translation #98
Conversation
@TobiasLee 这样可以 |
@leviding 认领校对 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
翻译超赞!有几处格式问题,还有英文原文也保留着。需要查看一下
"\n", | ||
"在这个任务中,我们将把电影评论分为**积极**和**消极**两种,即是一个**二分类**任务,这是一个非常重要并且已经被广泛应用的机器学习问题。\n", | ||
"\n", | ||
"我们将使用 [IMDB 数据集](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb),其中包括了 50000 条来自 [Internet Movie Database](https://www.imdb.com/) 的电影评论。这些评论被等分成两份分别用于训练和测试,并且,训练集和测试集的样本是**平衡**的,也就是说,积极和消极的评论数目相同。\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
「这些评论被等分成两份分别用于训练和测试」=>「这些评论被等分成两份,分别用于训练和测试」
"\n", | ||
"我们将使用 [IMDB 数据集](https://www.tensorflow.org/api_docs/python/tf/keras/datasets/imdb),其中包括了 50000 条来自 [Internet Movie Database](https://www.imdb.com/) 的电影评论。这些评论被等分成两份分别用于训练和测试,并且,训练集和测试集的样本是**平衡**的,也就是说,积极和消极的评论数目相同。\n", | ||
"\n", | ||
"接下来的代码中,我们会使用一个用于创建和训练 TensorFlow 模型的高级 API —— [tf.keras](https://www.tensorflow.org/guide/keras)。如果你希望查看进阶版的文本分类教程,请查看 [MLCC Text Classification Guide](https://developers.google.com/machine-learning/guides/text-classification/)。" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
「如果你希望查看进阶版的文本分类教程」=>「如果你希望查看 tf.keras
进阶版的文本分类教程」
"## 下载 IMDB 数据集\n", | ||
"\n", | ||
"\n", | ||
"IMDB 数据集随 TensorFlow 附带,并且已经被预处理过:单词序列已经被转换成证书序列,并且每个整数对应字典中特定的一个单词。\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
「单词序列已经被转换成证书序列」=>「单词序列已经被转换成整数序列」
"source": [ | ||
"## 探索数据\n", | ||
"\n", | ||
"让我们先来看看数据的格式。数据集已经被预处理过了,其中:每个电影评论样本(一连串的单词)由一个整数数组代表,每个评论的标签是一个 0 或者 1 的整数,其中 0 代表消极的评论,1 代表积极的评论。" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
「其中:每个电影评论样本(一连串的单词)由一个整数数组代表,」=>「其中:每个电影评论样本(一连串的单词)由一个整数数组代表,其中每个整数表示一个单词。」
"source": [ | ||
"## 准备数据\n", | ||
"\n", | ||
"The reviews—the arrays of integers—must be converted to tensors before fed into the neural network. This conversion can be done a couple of ways:\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的英文已经翻译了,但没有删除这段原文,是有特殊考虑?
"source": [ | ||
"### 隐藏单元\n", | ||
"\n", | ||
"The above model has two intermediate or \"hidden\" layers, between the input and output. The number of outputs (units, nodes, or neurons) is the dimension of the representational space for the layer. In other words, the amount of freedom the network is allowed when learning an internal representation.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里和上面有一段一样,翻译了中文之后没有删除英文。
"\n", | ||
"The above model has two intermediate or \"hidden\" layers, between the input and output. The number of outputs (units, nodes, or neurons) is the dimension of the representational space for the layer. In other words, the amount of freedom the network is allowed when learning an internal representation.\n", | ||
"\n", | ||
"上面的模型在输入和输出之间有两层隐藏层。输出向量的维度(单位,节点或神经元)是网络层的表示空间的维度。 换句话说,是网络在学习内部表示时所具有的自由度。\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
「上面的模型在输入和输出之间有两层隐藏层」=>「上面的模型在输入和输出之间有两个中间层,或者叫“隐藏”层」
"上面的模型在输入和输出之间有两层隐藏层。输出向量的维度(单位,节点或神经元)是网络层的表示空间的维度。 换句话说,是网络在学习内部表示时所具有的自由度。\n", | ||
"\n", | ||
"\n", | ||
"If a model has more hidden units (a higher-dimensional representation space), and/or more layers, then the network can learn more complex representations. However, it makes the network more computationally expensive and may lead to learning unwanted patterns—patterns that improve performance on training data but not on the test data. This is called *overfitting*, and we'll explore it later.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
英文原文没删
"source": [ | ||
"## 评估模型\n", | ||
"\n", | ||
"让我们看看模型最终表现的怎么样,我们将得到两个指标:loss(代表模型的错误,越低越好)以及准确率。" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
「loss(代表模型的错误,越低越好)以及准确率。」=>「Loss(代表模型的错误,值越低越好)以及准确率。」
@leviding @TobiasLee 校对完成 |
@TobiasLee 可以修改啦 |
@leviding 修改完毕 |
@TobiasLee 你检查一下,jpynb 文件和英文原文预览效果不同 https://github.com/xitu/tensorflow-docs/blob/v1.10/tutorials/keras/basic_text_classification.ipynb 检查什么问题。 |
@leviding 麻烦再看一下?好像之前是有个 cell 的 type 错了 |
@TobiasLee 还是不一样,你两边对比一下,应该很明显,不用截图说明吧?开头的 In [ ]: 中标号不显示,文章结尾多余一个代码块 |
@leviding 开头的标号不显示是因为我把运行记录清空了,Notebook 处于未运行状态,一般网上的 Jupyter Notebook 都是这样子,所以我建议维持原样。最后一行的空行已经去掉了。 |
@TobiasLee 好的,辛苦啦~ |
辛苦啦各位 👍 |
resolve: #96
.md
文件中 Colab Notebook 我不知道怎么更改... 麻烦 @leviding 检查一下~另外
.ipynb
我是在 Jupyter Notebook 中直接修改的,也不知道有没有问题, 有问题随时告知我,辛苦啦!