自己实现Op注册到tensorflow serving

目的是云端算法中执行LSTM部分计算过程的加速,即用cu文件编译出so,用此so中的LSTM类或函数替代tf.LSTMCell进行运算。
整个项目见Github,流程见博客,博主也刚入门cuda,欢迎留言探讨~

使用自定义操作提供TensorFlow模型

TensorFlow预先构建了一个广泛的操作库和操作内核(实现),可针对不同的硬件类型(CPU,GPU等)进行微调。这些操作自动链接到TensorFlow Serving ModelServer二进制文件,无需用户进行额外的工作。但是,有两个用例需要用户在ops中显式链接到ModelServer:

  • 您已经编写了自己的自定义操作(例如,使用 本指南
  • 您正在使用TensorFlow未附带的已实现的操作系统

注意:从2.0版开始,TensorFlow不再分发contrib模块; 如果您使用contrib ops提供TensorFlow程序,请使用本指南明确地将这些操作链接到ModelServer。

无论您是否实现了操作,为了使用自定义操作来提供模型,您都需要访问操作系统的源代码。本指南将指导您完成使用源以使自定义操作可用于服务的步骤。有关自定义操作的实现的指导,请参阅 tensorflow

先决条件:已经编写了自定义操作并注册到tensorflow op。

将op源复制到Serving项目中

/home/public/serving_gpu_15addopjiami/tensorflow_serving文件夹下创建以cuda_lstm_forward命名的文件夹

然后同时把”00_lstm.cu”, “00_lstm.so” , “cuda_lstm_forward.h”, “cuda_lstm_forward.cc”,”cuda_lstm_forward.so,”即所有依赖项放到当前文件夹下

image-20190923101116942

为op构建静态库

在cuda_lstm_forward的文件夹,您会看到一个生成共享对象文件(.so)的目标,您可以将其加载到python中以创建和训练模型。但是,TensorFlow服务在构建时静态链接操作,并且需要一个.a文件。因此,需要创建一个生成此文件的构建规则 tensorflow_serving/cuda_lstm_forward/BUILD

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
package(
default_visibility = [
"//tensorflow_serving:internal",
],
features = ["-layering_check"],
)



cc_library(
name = "cuda_lstm_forward.so",
visibility = ["//visibility:public"],
srcs = [
"cuda_lstm_forward.cc",
#"cuda_lstm_forward.h",
"lib00_lstm.so",
#"00_lstm.cu.cc"
] ,
copts = ["-std=c++11"],
deps = ["@org_tensorflow//tensorflow/core:framework_headers_lib",
"@org_tensorflow//tensorflow/core/util/ctc",
"@org_tensorflow//third_party/eigen3",
],
alwayslink=1,
)

bazel BUILD规则参考: bazel C/C++ Rules

使用op链接构建ModelServer

要为使用自定义操作的模型提供服务,您必须使用链接的操作构建ModelServer二进制文件。具体来说,您将cuda_lstm_forward上面创建的构建目标添加到ModelServer的BUILD文件中。

编辑tensorflow_serving/model_servers/BUILD以添加目标中SUPPORTED_TENSORFLOW_OPS包含的自定义op构建server_lib目标:

1
2
3
4
5
6
7
8
9
找到“SUPPORTED_TENSORFLOW_OPS”,做如下修改:

SUPPORTED_TENSORFLOW_OPS = [
"@org_tensorflow//tensorflow/contrib:contrib_kernels",
"@org_tensorflow//tensorflow/contrib:contrib_ops_op_lib",
#Added this line
#"//tensorflow_serving/fsmn_forward:fsmn_forward.so",
"//tensorflow_serving/cuda_lstm_forward:cuda_lstm_forward.so"
]

然后使用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49

```sh
#FROM nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04
export TF_CUDA_VERSION=8.0
export TF_CUDNN_VERSION=6
TF_SERVING_COMMIT=tags/1.5.0
#TF_COMMIT=tags/ais_v0.0.1
BAZEL_VERSION=0.15.0

export TF_NEED_CUDA=1
export TF_NEED_S3=1
export TF_CUDA_COMPUTE_CAPABILITIES="3.5,5.2,6.1"
export TF_NEED_GCP=1
export TF_NEED_JEMALLOC=0
export TF_NEED_HDFS=0
export TF_NEED_OPENCL=0
#export TF_NEED_MKL=1
export TF_NEED_VERBS=0
export TF_NEED_MPI=0
export TF_NEED_GDR=0
export TF_ENABLE_XLA=0
export TF_CUDA_CLANG=0
export TF_NEED_OPENCL_SYCL=0
export CUDA_TOOLKIT_PATH=/usr/local/cuda
export CUDNN_INSTALL_PATH=/usr/local/cuda
#export MKL_INSTALL_PATH=/opt/intel/mkl
export GCC_HOST_COMPILER_PATH=/usr/bin/gcc
export PYTHON_BIN_PATH=/usr/bin/python
export PYTHON_LIB_PATH=/usr/lib/python2.7/site-packages/
export CC_OPT_FLAGS="-march=native"

if [ ! -d "./tensorflow" ]; then
git clone https://gitlab.spetechcular.com/core/tensorflow.git
fi

if [ ! -d "./build_out" ]; then
mkdir ./build_out
fi

#git checkout $TF_SERVING_COMMIT
cd ./tensorflow && \
#git checkout $TF_COMMIT
TF_SET_ANDROID_WORKSPACE= ./configure

cd ..
#bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k --verbose_failures --crosstool_top=@local_config_cuda//crosstool:toolchain tensorflow_serving/model_servers:tensorflow_model_server

#bazel build -c opt --copt=-mavx --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k --verbose_failures --crosstool_top=@local_config_cuda//crosstool:toolchain --spawn_strategy=standalone tensorflow_serving/model_servers:tensorflow_model_server
bazel build -c opt --config=cuda -k --verbose_failures --crosstool_top=@local_config_cuda//crosstool:toolchain tensorflow_serving/model_servers:tensorflow_model_server

提供包含您的自定义操作的模型

  1. /home/resources/lizhipeng/lstmkernel_test文件下运行00_build_lstm_owncompli.py,生成savemodel文件夹拷贝到/home/public/tfs_sever_gpu/tfs_models/cudalstm/,后重命名为1

    • cp -r /home/resources/lizhipeng/lstmkernel_test/savedmodel/ /home/public/tfs_sever_gpu/tfs_models/cudalstm/

    • cd /home/public/tfs_server_gpu/tfs_models/cudalstm/

    • mv savedmodel/ 1
  2. 把依赖项lib00_lstm.so拷贝到/home/public/tfs_sever_gpu/cuda_so/

  3. 把编译好的/home/public/serving_gpu_15addopjiami/bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server拷贝到/home/public/tfs_sever_gpu/bin/下,拷贝时无权限用chmod 777 ,建议用ll命令查看更新时间确保拷贝成功

  4. ps -ef |grep tensorflow查看端口号,开服务器时配置相应端口号

  5. /home/public/tfs_sever_gpu/sh run13.sh开服务器,不需要激活环境和本机无关

  6. /home/public/tfs_sever_gpu/model_transfer/python lstmctc.py and python cudalstm.py即可使用,这里的虚拟环境可以用source activate /home/pz853/anaconda3/envs/py2,若环境缺很多包grpc, tensorflow-serving-api,pip第二个后少abs,并且无可逆,因此建议使用py2.7, tf1.4gpu版本

  7. 多并发即同样目录下写个shell脚本用 & 多次跑,测到1 2 4 8 16 32 64 128

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/home/public/tfs_sever_gpu/run.sh

basepath=$(dirname $(readlink -f $0))

# ipcpath 按需修改
#ipcpath=unix:${basepath}/f
ipcpath=10.12.8.26:9001

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${basepath}/cuda_so

# del model.cfg if exist
if [ -f "${basepath}/cfg/model.cfg" ]; then
rm "${basepath}/cfg/model.cfg"
fi

# prepare model.cfg file
model_cfg=${basepath}/cfg/model.cfg
touch "$model_cfg"
echo "model_config_list: {" >> $model_cfg
for dir in `ls ${basepath}/tfs_models`
do
if [ -d "${basepath}/tfs_models/${dir}" ]; then
echo " config: {" >> $model_cfg
echo " name: \"${dir}\"," >> $model_cfg
echo " ## must be absolute path" >> $model_cfg
echo " base_path: \"${basepath}/tfs_models/${dir}\"," >> $model_cfg
echo " model_platform: \"tensorflow\"" >> $model_cfg
echo " model_version_policy: {" >> $model_cfg
echo " all: {}" >> $model_cfg
echo " }" >> $model_cfg
echo " }" >> $model_cfg
fi
done
echo "}" >> $model_cfg

if [ ! -f "$model_cfg" ]; then
echo "error: $model_cfg not exist"
exit
fi

platform_cfg=${basepath}/cfg/platform.cfg
if [ ! -f "$platform_cfg" ]; then
echo "error: $platform_cfg not exist"
exit
fi

tfs_bin=${basepath}/bin/tensorflow_model_server_addop

#cpuinfo=`cat /proc/cpuinfo |grep flags | sed -n '1p' |grep avx -c`
#if [ $cpuinfo -gt 0 ]; then
# tfs_bin=${basepath}/bin/tensorflow_model_server_xla
#fi
#cpuinfo=`cat /proc/cpuinfo |grep flags | sed -n '1p' |grep avx2 -c`
#if [ $cpuinfo -gt 0 ]; then
# tfs_bin=${basepath}/bin/tensorflow_model_server_sse
#fi
#echo $tfs_bin

#tfs_bin=${basepath}/bin/tensorflow_model_server

while true;
do
#CUDA_VISIBLE_DEVICES=0 exec ${tfs_bin} --ipcpath=10.12.8.26:9002 --model_config_file=${model_cfg} --platform_config_file=${platform_cfg}
CUDA_VISIBLE_DEVICES=0 exec ${tfs_bin} --port=9001 --model_config_file=${model_cfg} --platform_config_file=${platform_cfg}
done