tensorflow的embedding操作详解

二维仅词和embedding

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
import tensorflow as tf
import numpy as np
#设置类别的数量
num_classes = 5
#需要转换的整数
matrix = np.random.random([5,3]) #embedding矩阵,5个词每个词embedding为3维度
print(matrix)


onehotidx = [3]
c = tf.nn.embedding_lookup(matrix,onehotidx)


#将整数转为一个num_classes位的one hot编码
onehot = np.eye(num_classes)[onehotidx]
print(onehot)

with tf.Session() as sess:
print('用look up提供ids取embedding\n')
print(c.eval())
print('onehot后再取embedding\n')
print(tf.matmul(onehot,matrix).eval())

ids相当于左边,matrix相当于右边待取的词向量,one hot(ids)*matrix

三维加入batch的情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import tensorflow as tf
import numpy as np
#设置类别的数量
num_classes = 10
#需要转换的整数
arr = [1,3,5,9]
arr2 = [0,4,6,12]

#将整数转为一个10位的one hot编码
a = np.eye(16)[arr]
b = np.eye(16)[arr2]
#print(np.array([a,b,b]))


#matrix = np.random.random([10,5, 16]) # 10batch,5voc,3dimen
matrix = np.array([a,b,b]) # 3batch,5voc,16dimen
ids = np.array([1, 1, 2])
#结论 如果多batch下,embedding look up 也是取第一维即3batch拨开为[5,16], [5,16], [5,16]再根据ids取哪几个batch后拼
#二维即5个词embedding[16],取哪几个词的embedding,这里词也可以是一维feature
print(matrix)
c = tf.nn.embedding_lookup(matrix,ids)
with tf.Session() as sess:
print
print(c.eval())