BERT模型源码解析( 四 )


hidden_dropout_prob=config.hidden_dropout_prob,
attention_probs_dropout_prob=config.attention_probs_dropout_prob,
initializer_range=config.initializer_range,
do_return_all_layers=True)
 [-1]表示倒数第一项
self.sequence_output = self.all_encoder_layers[-1]
# The "pooler" converts the encoded sequence tensor of shape
# [batch_size, seq_length, hidden_size] to a tensor of shape
# [batch_size, hidden_size].
pooler改变编码张量的形状,从3维变成了2维
This is necessary for segment-level
# (or segment-pair-level) classification tasks where we need a fixed
# dimensional representation of the segment.
 句子分类任务中,这种转换是必要的,因为我们需要一个固定维度的表达
with tf.variable_scope("pooler"):
# We "pool" the model by simply taking the hidden state corresponding to the first token.
通过获取和第一个令牌一致的隐藏状态,我们池化了模型
We assume that this has been pre-trained
假定模型已经预训练好了
 tf.squeeze从张量的形状中去除大小为1的维数
squeeze英 [skwi?z]  美 [skwi?z]v. 挤压,捏;
first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)
self.pooled_output = tf.layers.dense(
first_token_tensor, 符号张量输入到密集层
config.hidden_size, 隐藏层的大小
activation=tf.tanh, 激活函数:反正切
kernel_initializer=create_initializer(config.initializer_range))
#构造函数结束
def get_pooled_output(self):  获取池化输出
return self.pooled_output
def get_sequence_output(self):   获取序列输出
"""Gets final hidden layer of encoder.  获取编码后的隐藏层
Returns: 返回一个张量,和transformer 编码一致的
float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
to the final hidden of the transformer encoder.
"""
return self.sequence_output
def get_all_encoder_layers(self):  获取所有编码层
return self.all_encoder_layers
def get_embedding_output(self):  获取嵌入层的输出
"""Gets output of the embedding lookup (i.e., input to the transformer).
获取嵌入查找 的结果,例如 transformer的输入
Returns: 返回一个浮点型张量,和嵌入层一致的
将位置嵌入和类型嵌入数据统统相加求和,然后再标准化
这就是transformer的输入
float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
to the output of the embedding layer, after summing the word
embeddings with the positional embeddings and the token type embeddings,
then performing layer normalization. This is the input to the transformer.
"""
return self.embedding_output
def get_embedding_table(self):  获取嵌入表
return self.embedding_table
格鲁激活
■格鲁激活函数
def gelu(x):
"""Gaussian Error Linear Unit.  高斯误差线性单元
This is a smoother version of the RELU.   gelu是relu的平滑版
Original paper: https://arxiv.org/abs/1606.08415
Args:  x是将被激活的张量
x: float Tensor to perform activation.
Returns: 返回值是激活后的张量
`x` with the GELU activation applied.
"""    tf.tanh 反正切函数
cdf = 0.5 * (1.0 + tf.tanh(
(np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))))
return x * cdf
获取
激活函数
【BERT模型源码解析】■通过字符串(函数名称)获取激活函数
def get_activation(activation_string):
"""Maps a string to a Python function, e.g., "relu" => `tf.nn.relu`.
创建一个字符串到激活函数的映射关系

经验总结扩展阅读