BERT模型源码解析( 四 ) _生活百科

hidden_dropout_prob=config.hidden_dropout_prob,
attention_probs_dropout_prob=config.attention_probs_dropout_prob,
initializer_range=config.initializer_range,
do_return_all_layers=True)
[-1]表示倒数第一项
self.sequence_output = self.all_encoder_layers[-1]
# The "pooler" converts the encoded sequence tensor of shape
# [batch_size, seq_length, hidden_size] to a tensor of shape
# [batch_size, hidden_size].
pooler改变编码张量的形状，从3维变成了2维
This is necessary for segment-level
# (or segment-pair-level) classification tasks where we need a fixed
# dimensional representation of the segment.
句子分类任务中，这种转换是必要的，因为我们需要一个固定维度的表达
with tf.variable_scope("pooler"):
# We "pool" the model by simply taking the hidden state corresponding to the first token.
通过获取和第一个令牌一致的隐藏状态，我们池化了模型
We assume that this has been pre-trained
假定模型已经预训练好了
tf.squeeze从张量的形状中去除大小为1的维数
squeeze英 [skwi?z] 美 [skwi?z]v. 挤压，捏；
first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)
self.pooled_output = tf.layers.dense(
first_token_tensor, 符号张量输入到密集层
config.hidden_size, 隐藏层的大小
activation=tf.tanh, 激活函数：反正切
kernel_initializer=create_initializer(config.initializer_range))
#构造函数结束
def get_pooled_output(self):  获取池化输出
return self.pooled_output
def get_sequence_output(self):   获取序列输出
"""Gets final hidden layer of encoder.  获取编码后的隐藏层
Returns: 返回一个张量，和transformer 编码一致的
float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
to the final hidden of the transformer encoder.
"""
return self.sequence_output
def get_all_encoder_layers(self):  获取所有编码层
return self.all_encoder_layers
def get_embedding_output(self):  获取嵌入层的输出
"""Gets output of the embedding lookup (i.e., input to the transformer).
获取嵌入查找的结果，例如 transformer的输入
Returns: 返回一个浮点型张量，和嵌入层一致的
将位置嵌入和类型嵌入数据统统相加求和，然后再标准化
这就是transformer的输入
float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
to the output of the embedding layer, after summing the word
embeddings with the positional embeddings and the token type embeddings,
then performing layer normalization. This is the input to the transformer.
"""
return self.embedding_output
def get_embedding_table(self):  获取嵌入表
return self.embedding_table
格鲁激活
■格鲁激活函数
def gelu(x):
"""Gaussian Error Linear Unit.  高斯误差线性单元
This is a smoother version of the RELU.   gelu是relu的平滑版
Original paper: https://arxiv.org/abs/1606.08415
Args:  x是将被激活的张量
x: float Tensor to perform activation.
Returns: 返回值是激活后的张量
`x` with the GELU activation applied.
"""    tf.tanh 反正切函数
cdf = 0.5 * (1.0 + tf.tanh(
(np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))))
return x * cdf
获取
激活函数
【BERT模型源码解析】■通过字符串（函数名称）获取激活函数
def get_activation(activation_string):
"""Maps a string to a Python function, e.g., "relu" => `tf.nn.relu`.
创建一个字符串到激活函数的映射关系

BERT模型源码解析( 四 )

经验总结扩展阅读

肺结节10个人9个人有吗真的吗

月经期的饮食保健

supreme羊驼真假怎么辨别?

春秋五霸是谁

黄色和金黄色的区别?

事业单位高温补贴多少钱事业单位高温补贴发放是每年都有吗

28岁女白领：靠出轨38岁领导走向事业顶峰，我却过得很煎熬

提拉紧致按摩手法轻柔处理才能改善问题

卡西欧手表哪一款性价比高,卡西欧系列的手表都有哪些好的推荐？

灰紫色的翡翠手镯怎么样

护肤先敷面膜还是先用芦荟胶?

久久说情感凤凰男要求AA制，多年后却向妻子求助，妻子回应：我有钱，但不帮

男人的情感软肋在哪里：3个男人告诉你

淘宝聊天窗口打开失败怎么回事

蜜蜡和翡翠怎么保养

观赏鱼饲养用水有哪些要求?

鸿蒙侧边栏怎么删除应用?

三星W999有什么配件

桃胶什么季节吃最好

连衣裙炎热的夏天，穿一件短款修身连衣裙游逛商厦绝对是最惬意的事情！