分享某大学实验室内部项目:Docker+Kafka实战流程

前言

之前老有朋友跟我说有没有Docker+其他技术完整得执行流程啊,体验一下做项目得感觉是什么样,但是说实话,我实在是无能为力,虽然公司内部确实用到了Docker得相关内容,但是你要说让我摘出来,那真的是有点为难我了,实在是没那个精力,我也就没有去做这件事

但是,最近刚好家里在读大学的小侄子(纯辈分大),他有一个实验项目,需要用到Docker,这一天,这不是现成得实例 啊,这里就给大家完整得展示一下,有需要的朋友也可以对比着实际操作一下
文章首发公众号:Java架构师联盟,每日更新技术好闻

实验目的

Understanding the concept of message passing

Trying to follow up the procedure of a message broker that handles message from many tenants Repeating what others have done in the past sheds the light on your future

实验步骤

1. 安装docker-compose 和 docker-machine

已安装docker,系统是Ubuntu

安装docker-compose

\1. 下载最新版本 (v1.24.0) docker-compose

$ sudo curl -L "http://github.com/docker/compose/releases/download/1.24.0/docker-compose-$(uname
-s)-$(uname -m)" -o /usr/local/bin/docker-compose

ps:网速真心慢,建议直接从github(链接:https://github.com/docker/machine/releases/download/v0.16.1/docker- machine-Linux-x86_64)下载,把文件名改成docker-compose,再移动到/usr/local/bin/下。为什么这样做,你可以百度下curl -o 的作用

注意 : 使用curl,url应该以http,不是https。使用https可能会超出时间或拒绝连接

\2. 给已下载的docker-compose文件执行权限

$ sudo chmod +x /usr/local/bin/docker-compose

\3. Install command completion for the bash and zsh shell.

$ sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

\4. 检查docker-compose版本

$ docker-compose --version
docker-compose version 1.24.0, build 0aa59064

安装docker-maceine

提示: 你可以按照官网的步骤使用curl下载docker文件,但网速太慢。建议和我一样(和下载docker-compose 建议一样)从github下载,并改名成docker-machine。移动到/tmp下。

官网shell:

$ base=https://github.com/docker/machine/releases/download/v0.16.0 &&
curl -L $base/docker-machine-$(uname -s)-$(uname -m) >/tmp/docker-machine && sudo install /tmp/docker-machine /usr/local/bin/docker-machine

但是年轻人没有性格怎么可以,我的步骤如下:

\1. 从github下载docker-machine所需文件

文件链接:https://github.com/docker/machine/releases/download/v0.16.1/docker-machine-Linux- x86_64

\2. 文件改名成docker-machine,并移动到/tmp文件夹下

\3. 安装docker-machine

$ sudo install /tmp/docker-machine /usr/local/bin/docker-machine

\4. 查看docker-machine版本

$ docker-machine version

2. 安装VirtualBox

1. 虽然也可以直接安装deb包,但毕竟懒,添加源可以保持更新

$ sed -i '$adeb http://download.virtualbox.org/virtualbox/debian xenial contrib'
/etc/apt/sources.list

2. 为apt-secure导入公钥

$ wget -q https://www.virtualbox.org/download/oracle_vbox_2016.asc -O- | sudo apt-key add -
$ wget -q https://www.virtualbox.org/download/oracle_vbox.asc -O- | sudo apt-key add -

\3. 通过apt安装VirtualBox

$ sudo apt-get update
$ sudo apt-get install virtualbox-6.0

这里可能会有安装包冲突问题

lzd@ubuntu:~$ sudo apt-get install virtualbox-6.0
正在读取软件包列表... 完成正在分析软件包的依赖关系树正在读取状态信息... 完成
有一些软件包无法被安装。如果您用的是 unstable 发行版,这也许是
因为系统无法达到您要求的状态造成的。该版本中可能会有一些您需要的软件包尚未被创建或是它们已被从新到(Incoming)目录移出。
下列信息可能会对解决问题有所帮助:

下列软件包有未满足的依赖关系:
virtualbox-6.0 : 依赖: libcurl3 (>= 7.16.2) 但是它将不会被安装依赖: libpng12-0 (>= 1.2.13-4) 但无法安装它依赖: libvpx3 (>= 1.5.0) 但无法安装它
推荐: libsdl-ttf2.0-0 但是它将不会被安装
E: 无法修正错误,因为您要求某些软件包保持现状,就是它们破坏了软件包间的依赖关系。

解决方案

如遇到此问题,使用 aptitude 代替 apt-get 安装 virtualbox-6.0

$ sudo aptitude install virtualbox-6.

3. 在Docker上创建VM

$ docker-machine create --driver virtualbox --virtualbox-memory 2048 dev

发现执行这步不行,会出现下面这个问题

lzd@ubuntu:~$ docker-machine create --driver virtualbox --virtualbox-memory 2048 dev
Running pre-create checks...
Error with pre-create check: "VBoxManage not found. Make sure VirtualBox is installed and VBoxManage is in the path"

解决方案

@Dreampie, @Aaqib041 can you install the virtualbox : sudo apt-get install virtualbox
then run this command:
docker-machine create --driver virtualbox default

按照这个方法执行完后,成功创建 dev VM

3. 编写代码

\1. 安装sbt

安装sbt前,先安装jdk将下载的压缩包解压到/usr/local/目录下

tar -zxvf sbt-1.2.8.tgz -C /usr/local/sbt

/usr/local/sbt/目录下创建sbt文件

$ cd /usr/local/sbt
$ vim sbt

在sbt文件中写入以下内容

#!/bin/bash
BT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled - XX:MaxPermSize=256M"
java $SBT_OPTS -jar /usr/local/sbt/bin/sbt-launch.jar "$@"

注意sbt-launch.jar的目录是否正确修改sbt文件的权限

$ chmod u+x sbt

配置sbt环境变量

$ vim /etc/profile

在最后一行添加一些内容

export PATH=/usr/local/sbt/bin:$PATH

然后运行使文件生效

$ source /etc/profile

修改sbt路径下的sbtconfig.txt文件

$ vim /usr/local/sbt/conf/sbtconfig.txt

添加以下内容

-Dsbt.global.base=/home/rose/.sbt
-Dsbt.boot.directory=/home/rose/.sbt/boot/
-Dsbt.ivy.home=/home/rose/.ivy2

检查sbt是否安装成功(这里需要联网,会下载东西)

$ sbt sbt-verison

\2. 使用sbt构建项目

首先需要为项目设置sbt目录结构(遗憾的是,sbt它不提供引导项目的命令),这里使用脚本设置目录结构。

shell代码如下:

#!/bin/bash

if [ -z "$1" ] ; then
echo 'Project name is empty' exit 1
fi

PROJECT_NAME="$1" SCALA_VERSION="${2-2.11.8}" SCALATEST_VERSION="${3-2.2.6}"

mkdir $PROJECT_NAME cd $PROJECT_NAME

cat > build.sbt << EOF name := "$PROJECT_NAME"
scalaVersion := "$SCALA_VERSION"
libraryDependencies += "org.scalatest" %% "scalatest" % "$SCALATEST_VERSION" % "test"
EOF
mkdir -p src/{main/{scala,resources},test/{scala,resources}} cat > .gitignore << EOF
target/ EOF
~

脚本文件命名为 sbt-init.sh ,使用脚本创建sbt项目

sudo ./sbt-init.sh example

\3. 新建文件assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")

\4. 新建配置文件build.sbt

name := "direct_kafka_word_count" scalaVersion := "2.10.5"
val sparkVersion = "1.5.1"

libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion % "provided", "org.apache.spark" %% "spark-streaming" % sparkVersion % "provided", ("org.apache.spark" %% "spark-streaming-kafka" % sparkVersion) exclude
("org.spark-project.spark", "unused")
)

assemblyJarName in assembly := name.value + ".jar"

\6. 在项目src/main/scala/com/example/spark目录下新建DirectKafkaWordCount.scala文件写入

package com.example.spark

import kafka.serializer.StringDecoder
import org.apache.spark.{TaskContext, SparkConf}
import org.apache.spark.streaming.kafka.{OffsetRange, HasOffsetRanges, KafkaUtils} import org.apache.spark.streaming.{Seconds, StreamingContext}

object DirectKafkaWordCount {
def main(args: Array[String]): Unit = { if (args.length < 2) {
System.err.println(s"""
|Usage: DirectKafkaWordCount <brokers> <topics>
| <brokers> is a list of one or more Kafka brokers
| <topics> is a list of one or more kafka topics to consume from
| """.stripMargin)
System.exit(1)
}

val Array(brokers, topics) = args

// Create context with 10 second batch interval
val sparkConf = new SparkConf().setAppName("DirectKafkaWordCount") val ssc = new StreamingContext(sparkConf, Seconds(10))

// Create direct kafka stream with brokers and topics val topicsSet = topics.split(",").toSet
val kafkaParams = Map[String, String]("metadata.broker.list" -> brokers) val messages = KafkaUtils.createDirectStream[String, String, StringDecoder,
StringDecoder](
ssc, kafkaParams, topicsSet)

// Get the lines, split them into words, count the words and print val lines = messages.map(_._2)
val words = lines.flatMap(_.split(" "))
val wordCounts = words.map(x => (x, 1L)).reduceByKey(_ + _) wordCounts.print()

// Start the computation ssc.start() ssc.awaitTermination()
}
}

\7. 运行sbt语句,得到文件

sbt assembly

4. 配置容器

\1. 使用 dev Docker客户端

$ eval "$(docker-machine env dev)"

\2. 使用docker-compose.yml文件配置docker-compose

$ mkdir -p /home/lzd/myDocker-6/
$ cd /home/lzd/myDocker-6/
$ touch docker-compose.yml
$ chmod +rwx docker-compose.yml

写入如下配置信息

kafka:
image: antlypls/kafka-legacy environment:
· KAFKA=localhost:9092
· ZOOKEEPER=localhost:2181 expose:
- "2181"
- "9092"

spark:
image: antlypls/spark:1.5.1 command: bash
volumes:
· ./target/scala-2.10:/app links:
- kafka

执行docker-compose.yml文件启动所有容器

docker-compose run --rm spark

这将启动kafka然后spark并将我们记录到spark容器shell中。--rm标志使得在运行后docker-compose删除相应的spark容器

$ docker-compose run --rm spark

\3. 配置 kafka 容器

Create a topic in a kafka broker

进入 kafka 容器

$ docker exec -it <kafka container ID or kafka container name> bash

在kafka里添加topic word-count

$ kafka-topics.sh --create --zookeeper $ZOOKEEPER --replication-factor 1 -- partitions 2 --topic word-count

检查此topic是否加入

$ kafka-topics.sh --list --zookeeper $ZOOKEEPER
$ kafka-topics.sh --describe --zookeeper $ZOOKEEPER --topic word-count

\4. 配置 spark 容器新开一个终端

进入 spark 容器

docker exec -it <spark container ID or spark container name> bash

执行以下语句

spark-submit --master yarn-client --driver-memory 1G --class com.example.spark.DirectKafkaWordCount \ app/direct_kafka_word_count.jar kafka:9092 word-count

ps:这里要指定 driver-memory 大小,要不然使用默认值 4G。

试验中就会报错。错误如下:

Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000d5550000, 715849728, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 715849728 bytes for committing reserved memory.

# An error report file with more information is saved as: # /hs_err_pid16414.log

2. Kafka - Stream Word Count demo

1. 启动 zookeeper 和 broker

启动 zookeeper

$ docker run -d --net=host --name=zookeeper -e ZOOKEEPER_CLIENT_PORT=32181 confluentinc/cp-zookeeper:latest

查看 zookeeper

docker logs zookeeper | grep -i binding

- cd /kafka/docker/stream
export COMPOSE_PROJECT_NAME="stream-demo" docker-compose up -d

运行结果

Creating network "streamdemo_default" with the default driver Creating streamdemo_zookeeper_1 ...
Creating streamdemo_zookeeper_1 ... done Creating streamdemo_broker_1 ...
Creating streamdemo_broker_1 ... done

Service status

docker-compose ps

2. Create a topic

Create the input topic

docker-compose exec broker bash kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic streams-plaintext-input

Create the output topic

kafka-topics --create --zookeeper zookeeper:2181 --replication-factor 1 --partitions 1 --topic streams-wordcount-output --config cleanup.policy=compact

Describe them

kafka-topics --zookeeper zookeeper:2181 --describe

结果如下

Topic:streams-plaintext-input
Configs:
PartitionCount:1
ReplicationFactor:1
Topic: streams-plaintext-input Partition: 0
Isr: 1
Leader: 1
Replicas: 1
Topic:streams-wordcount-output PartitionCount:1
Configs:cleanup.policy=compact
Topic: streams-wordcount-output Partition: 0
ReplicationFactor:1
Leader: 1
Replicas:1
Isr: 1
(0)

相关推荐