BERT模型源码解析( 八 ) _生活百科

full_position_embeddings = tf.get_variable(
name=position_embedding_name,
shape=[max_position_embeddings, width],
initializer=create_initializer(initializer_range))
因为位置嵌入表是一个学习变量，可以通过最大长度创建；
实际的序列长度可能小于这个长度，因为快速训练任务没有长序列；
# Since the position embedding table is a learned variable, we create it
# using a (long) sequence length `max_position_embeddings`. The actual
# sequence length might be shorter than this, for faster training of
# tasks that do not have long sequences.
所以全位置嵌入是一个高效的嵌入，并且当前序列有位置信息，所以我们只执行一个切片
# So `full_position_embeddings` is effectively an embedding table
# for position [0, 1, 2, ..., max_position_embeddings-1], and the current
# sequence has positions [0, 1, 2, ... seq_length-1], so we can just
# perform a slice.
函数：tf.slice(inputs, begin, size, name)
作用：从列表、数组、张量等对象中抽取一部分数据
position_embeddings = tf.slice(full_position_embeddings, [0, 0],
[seq_length, -1])
num_dims = len(output.shape.as_list()) 维度个数
只有最后两个维度是有意义的，所以我们在第一个维度广播，通常这个维度是批处理量
# Only the last two dimensions are relevant (`seq_length` and `width`), so
# we broadcast among the first dimensions, which is typically just
# the batch size.
position_broadcast_shape = [] 广播形状
for _ in range(num_dims - 2):
position_broadcast_shape.append(1)
position_broadcast_shape.extend([seq_length, width]) 扩张
position_embeddings = tf.reshape(position_embeddings,
position_broadcast_shape) 变形
output += position_embeddings  将位置数据加进去
output = layer_norm_and_dropout(output, dropout_prob) 标准化和丢弃
return output
创建掩码
■从输入掩码创建注意力掩码
def create_attention_mask_from_input_mask(from_tensor, to_mask):
"""Create 3D attention mask from a 2D tensor mask.
从 2D掩码创建3D掩码
Args: 入参：输入张量，转换成掩码的张量
from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, ...].
to_mask: int32 Tensor of shape [batch_size, to_seq_length].
Returns:  返回值浮点值的张量
float Tensor of shape [batch_size, from_seq_length, to_seq_length].
""" 获取入参形状参数
from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
batch_size = from_shape[0]
from_seq_length = from_shape[1]
获取转换张量的形状
to_shape = get_shape_list(to_mask, expected_rank=2)
to_seq_length = to_shape[1]
先变形，然后转换成float32浮点数
to_mask = tf.cast(
tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32)
from_tensor不一定是掩码（虽然它可能是）
  我们不太关心（from里面的填充符号），所以创建一个全是1的张量；
# We don't assume that `from_tensor` is a mask (although it could be). We
# don't actually care if we attend *from* padding tokens (only *to* padding)
# tokens so we create a tensor of all ones.
#
# `broadcast_ones` = [batch_size, from_seq_length, 1]
创建全1张量
broadcast_ones = tf.ones(
shape=[batch_size, from_seq_length, 1], dtype=tf.float32)
我们在两个维度上进行广播，从而创建掩码
# Here we broadcast along two dimensions to create the mask.
mask = broadcast_ones * to_mask
return mask
注意力层
■注意力层
def attention_layer(from_tensor,
to_tensor,
attention_mask=None,
num_attention_heads=1,

BERT模型源码解析( 八 )

经验总结扩展阅读

春秋五霸是谁

桃胶什么季节吃最好

护肤先敷面膜还是先用芦荟胶?

淘宝聊天窗口打开失败怎么回事

卡西欧手表哪一款性价比高,卡西欧系列的手表都有哪些好的推荐？

连衣裙炎热的夏天，穿一件短款修身连衣裙游逛商厦绝对是最惬意的事情！

观赏鱼饲养用水有哪些要求?

三星W999有什么配件

28岁女白领：靠出轨38岁领导走向事业顶峰，我却过得很煎熬

鸿蒙侧边栏怎么删除应用?

灰紫色的翡翠手镯怎么样

事业单位高温补贴多少钱事业单位高温补贴发放是每年都有吗

肺结节10个人9个人有吗真的吗

黄色和金黄色的区别?

supreme羊驼真假怎么辨别?

月经期的饮食保健

男人的情感软肋在哪里：3个男人告诉你

提拉紧致按摩手法轻柔处理才能改善问题

蜜蜡和翡翠怎么保养

久久说情感凤凰男要求AA制，多年后却向妻子求助，妻子回应：我有钱，但不帮