BERT模型源码解析( 七 ) _生活百科

use_position_embeddings=True,
position_embedding_name="position_embeddings",
initializer_range=0.02,
max_position_embeddings=512,
dropout_prob=0.1):
"""Performs various post-processing on a word embedding tensor.
对词嵌入张量进行各种处理
Args: 入参输入的张量
input_tensor: float Tensor of shape [batch_size, seq_length,
embedding_size].
是否使用类型令牌
use_token_type: bool. Whether to add embeddings for `token_type_ids`.
类型令牌的id，如果要使用类型令牌，那么该参数必须指定
token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].
Must be specified if `use_token_type` is True.
类型令牌大小
token_type_vocab_size: int. The vocabulary size of `token_type_ids`.
类型令牌的名称
token_type_embedding_name: string. The name of the embedding table variable
for token type ids.
是否使用位置嵌入
use_position_embeddings: bool. Whether to add position embeddings for the
position of each token in the sequence.
位置嵌入表的名称
position_embedding_name: string. The name of the embedding table variable
for positional embeddings.
标准差stdev，也就是参数的范围，用于权重参数的初始化
initializer_range: float. Range of the weight initialization.
位置嵌入的最大长度，可以大于输入序列的长度，但是不能小于
max_position_embeddings: int. Maximum sequence length that might ever be
used with this model. This can be longer than the sequence length of
input_tensor, but cannot be shorter.
丢弃率=1-保留率
dropout_prob: float. Dropout probability applied to the final output tensor.
Returns: 返回值：和输入张量形状相同的另一个张量
float tensor with same shape as `input_tensor`.
Raises: 异常：张量形状或者输入值无效
ValueError: One of the tensor shapes or input values is invalid.
"""
input_shape = get_shape_list(input_tensor, expected_rank=3) 获取形状列表
batch_size = input_shape[0]
seq_length = input_shape[1]
width = input_shape[2]
output = input_tensor
类型嵌入■
if use_token_type:
if token_type_ids is None: 如果没有token_type_ids 就触发异常
raise ValueError("`token_type_ids` must be specified if"
"`use_token_type` is True.")
类型表
token_type_table = tf.get_variable(
name=token_type_embedding_name,
shape=[token_type_vocab_size, width],
initializer=create_initializer(initializer_range))
# This vocab will be small so we always do one-hot here, since it is always
# faster for a small vocabulary.
这个词典比较小，所以使用 one-hot，因为更快
flat_token_type_ids = tf.reshape(token_type_ids, [-1]) 平坦化，变成一维的
转换成one_hot格式的id
one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)
one_hot格式乘以一个类型表，则转换为词向量
token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)
token_type_embeddings = tf.reshape(token_type_embeddings,
[batch_size, seq_length, width])
output += token_type_embeddings 将类型数据加进去
位置嵌入■
if use_position_embeddings: 如果使用位置嵌入
断言条件 x <= y 保持元素
assert_op = tf.assert_less_equal(seq_length, max_position_embeddings)
tf.control_dependencies是tensorflow中的一个flow顺序控制机制，作用有二：
插入依赖（dependencies）和清空依赖（依赖是op或tensor）。
with tf.control_dependencies([assert_op]):
get_variable函数的作用是创建新的tensorflow变量，常见的initializer有：常量初始化器tf.constant_initializer、正太分布初始化器tf.random_normal_initializer、截断正态分布初始化器tf.truncated_normal_initializer、均匀分布初始化器tf.random_uniform_initializer 。

BERT模型源码解析( 七 )

经验总结扩展阅读

春秋五霸是谁

桃胶什么季节吃最好

护肤先敷面膜还是先用芦荟胶?

淘宝聊天窗口打开失败怎么回事

卡西欧手表哪一款性价比高,卡西欧系列的手表都有哪些好的推荐？

连衣裙炎热的夏天，穿一件短款修身连衣裙游逛商厦绝对是最惬意的事情！

观赏鱼饲养用水有哪些要求?

三星W999有什么配件

28岁女白领：靠出轨38岁领导走向事业顶峰，我却过得很煎熬

鸿蒙侧边栏怎么删除应用?

灰紫色的翡翠手镯怎么样

事业单位高温补贴多少钱事业单位高温补贴发放是每年都有吗

肺结节10个人9个人有吗真的吗

黄色和金黄色的区别?

supreme羊驼真假怎么辨别?

月经期的饮食保健

男人的情感软肋在哪里：3个男人告诉你

提拉紧致按摩手法轻柔处理才能改善问题

蜜蜡和翡翠怎么保养

久久说情感凤凰男要求AA制，多年后却向妻子求助，妻子回应：我有钱，但不帮