full_position_embeddings = tf.get_variable(
name=position_embedding_name,
shape=[max_position_embeddings, width],
initializer=create_initializer(initializer_range))
因为位置嵌入表 是一个学习变量,可以通过最大长度创建;
实际的序列长度可能小于这个长度,因为快速训练任务没有长序列;
# Since the position embedding table is a learned variable, we create it
# using a (long) sequence length `max_position_embeddings`. The actual
# sequence length might be shorter than this, for faster training of
# tasks that do not have long sequences.
所以 全位置嵌入是一个高效的嵌入,并且当前序列有位置信息,所以我们只执行一个切片
# So `full_position_embeddings` is effectively an embedding table
# for position [0, 1, 2, ..., max_position_embeddings-1], and the current
# sequence has positions [0, 1, 2, ... seq_length-1], so we can just
# perform a slice.
函数:tf.slice(inputs, begin, size, name)
作用:从列表、数组、张量等对象中抽取一部分数据
position_embeddings = tf.slice(full_position_embeddings, [0, 0],
[seq_length, -1])
num_dims = len(output.shape.as_list()) 维度个数
只有最后两个维度是有意义的,所以我们在第一个维度广播,通常这个维度是 批处理量
# Only the last two dimensions are relevant (`seq_length` and `width`), so
# we broadcast among the first dimensions, which is typically just
# the batch size.
position_broadcast_shape = [] 广播形状
for _ in range(num_dims - 2):
position_broadcast_shape.append(1)
position_broadcast_shape.extend([seq_length, width]) 扩张
position_embeddings = tf.reshape(position_embeddings,
position_broadcast_shape) 变形
output += position_embeddings 将位置数据加进去
output = layer_norm_and_dropout(output, dropout_prob) 标准化和丢弃
return output
创建掩码
■从输入掩码创建注意力掩码
def create_attention_mask_from_input_mask(from_tensor, to_mask):
"""Create 3D attention mask from a 2D tensor mask.
从 2D掩码创建3D掩码
Args: 入参:输入张量,转换成掩码的张量
from_tensor: 2D or 3D Tensor of shape [batch_size, from_seq_length, ...].
to_mask: int32 Tensor of shape [batch_size, to_seq_length].
Returns: 返回值 浮点值的张量
float Tensor of shape [batch_size, from_seq_length, to_seq_length].
""" 获取入参形状参数
from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
batch_size = from_shape[0]
from_seq_length = from_shape[1]
获取转换张量的形状
to_shape = get_shape_list(to_mask, expected_rank=2)
to_seq_length = to_shape[1]
先变形,然后转换成float32浮点数
to_mask = tf.cast(
tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32)
from_tensor不一定是掩码(虽然它可能是)
我们不太关心(from里面的填充符号),所以创建一个全是1的张量;
# We don't assume that `from_tensor` is a mask (although it could be). We
# don't actually care if we attend *from* padding tokens (only *to* padding)
# tokens so we create a tensor of all ones.
#
# `broadcast_ones` = [batch_size, from_seq_length, 1]
创建全1张量
broadcast_ones = tf.ones(
shape=[batch_size, from_seq_length, 1], dtype=tf.float32)
我们在两个维度上进行广播,从而创建掩码
# Here we broadcast along two dimensions to create the mask.
mask = broadcast_ones * to_mask
return mask
注意力层
■注意力 层
def attention_layer(from_tensor,
to_tensor,
attention_mask=None,
num_attention_heads=1,
经验总结扩展阅读
- 【lwip】11-UDP协议&源码分析
- 硬核剖析Java锁底层AQS源码,深入理解底层架构设计
- SpringCloudAlibaba 微服务组件 Nacos 之配置中心源码深度解析
- Seata 1.5.2 源码学习
- MindStudio模型训练场景精度比对全流程和结果分析
- .NET 源码学习 [数据结构-线性表1.2] 链表与 LinkedList<T>
- Redisson源码解读-公平锁
- OpenHarmony移植案例: build lite源码分析之hb命令__entry__.py
- 【深入浅出 Yarn 架构与实现】1-2 搭建 Hadoop 源码阅读环境
- JVM学习笔记——内存模型篇