BERT模型源码解析( 六 ) _生活百科

■（2合1函数）先标准化，再丢弃，然后返回
def layer_norm_and_dropout(input_tensor, dropout_prob, name=None):
"""Runs layer normalization followed by dropout."""
output_tensor = layer_norm(input_tensor, name)
output_tensor = dropout(output_tensor, dropout_prob)
return output_tensor
■初始化权重参数
def create_initializer(initializer_range=0.02): initializer_range初始化范围，就是标准差stddev
"""Creates a `truncated_normal_initializer` with the given range."""
正态分布初始化 // 这是神经网络权重和过滤器的推荐初始值。
return tf.truncated_normal_initializer(stddev=initializer_range)
tf.truncated_normal_initializer的意思是：从截断的正态分布中输出随机值。
生成的值服从具有指定平均值和标准偏差的正态分布，
如果生成的值大于平均值2个标准偏差的值则丢弃重新选择。
嵌入查找
■通过词查找对应的嵌入张量
def embedding_lookup(input_ids,
vocab_size,
embedding_size=128,
initializer_range=0.02,
word_embedding_name="word_embeddings",
use_one_hot_embeddings=False):
"""Looks up words embeddings for id tensor.
Args: 入参
input_ids: int32 Tensor of shape [batch_size, seq_length] containing word
ids.包含词的id的整型张量
vocab_size: int. Size of the embedding vocabulary.嵌入词典的大小
embedding_size: int. Width of the word embeddings. 词嵌入的大小
initializer_range: float. Embedding initialization range.权重参数初始化的标准差
word_embedding_name: string. Name of the embedding table.词嵌入名称
use_one_hot_embeddings: bool. If True, use one-hot method for word
embeddings. If False, use `tf.gather()`. 是否使用onehot码
Returns: 返回一个张量
float Tensor of shape [batch_size, seq_length, embedding_size].
""" 假定输入数据形状为 [batch_size, seq_length, num_inputs]
# This function assumes that the input is of shape [batch_size, seq_length,
# num_inputs].
# 如果输入是2D张量，则必须变形为3维张量，增加第三维，并且第三维的大小为1
# If the input is a 2D tensor of shape [batch_size, seq_length], we
# reshape to [batch_size, seq_length, 1].
if input_ids.shape.ndims == 2: 如果输入是2维，则扩张维度tf.expand_dims
input_ids = tf.expand_dims(input_ids, axis=[-1])
嵌入表格
embedding_table = tf.get_variable(
name=word_embedding_name,
shape=[vocab_size, embedding_size],
initializer=create_initializer(initializer_range))
平坦化，降维成1维
哪一维使用了-1，那这一维度就不定义大小，而是根据你的数据情况进行匹配。
即先不管-1的那一个维度，先看其他维度，然后用原矩阵的总元素个数除以确定的维度，就能得到-1维度的值。
不过要注意：但列表中只能存在一个-1 。
flat_input_ids = tf.reshape(input_ids, [-1])
if use_one_hot_embeddings:
tf.one_hot()函数是将input转化为one-hot类型数据输出
one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size)
将one_hot向量和嵌入表相乘，用于向量降维，减少数据量
output = tf.matmul(one_hot_input_ids, embedding_table)
else:
output = tf.gather(embedding_table, flat_input_ids)
input_shape = get_shape_list(input_ids)
张量变形
output = tf.reshape(output,
input_shape[0:-1] + [input_shape[-1] * embedding_size])
return (output, embedding_table)
嵌入
后处理
■嵌入后处理
def embedding_postprocessor(input_tensor,
use_token_type=False,
token_type_ids=None,
token_type_vocab_size=16,
token_type_embedding_name="token_type_embeddings",

BERT模型源码解析( 六 )

经验总结扩展阅读

肺结节10个人9个人有吗真的吗

月经期的饮食保健

supreme羊驼真假怎么辨别?

春秋五霸是谁

黄色和金黄色的区别?

事业单位高温补贴多少钱事业单位高温补贴发放是每年都有吗

28岁女白领：靠出轨38岁领导走向事业顶峰，我却过得很煎熬

提拉紧致按摩手法轻柔处理才能改善问题

卡西欧手表哪一款性价比高,卡西欧系列的手表都有哪些好的推荐？

灰紫色的翡翠手镯怎么样

护肤先敷面膜还是先用芦荟胶?

久久说情感凤凰男要求AA制，多年后却向妻子求助，妻子回应：我有钱，但不帮

男人的情感软肋在哪里：3个男人告诉你

淘宝聊天窗口打开失败怎么回事

蜜蜡和翡翠怎么保养

观赏鱼饲养用水有哪些要求?

鸿蒙侧边栏怎么删除应用?

三星W999有什么配件

桃胶什么季节吃最好

连衣裙炎热的夏天，穿一件短款修身连衣裙游逛商厦绝对是最惬意的事情！