ClickHouse 分片副本集群部署

环境介绍

Clickhouse高可用集群采用ReplicatedMergeTree + Distributed方案，2分片2副本，共4个节点。

注意：本文档不展示zookeeper集群的安装过程，请自行查找zk集群安装文档。生产环境建议Zookeeper独立服务器部署，Clickhouse集群对Zookeeper依赖较高。

名称	版本
操作系统	Centos7.9
ClickHouse(RPM)	21.9.7.2
Zookeeper	3.4.5
Node01	192.168.99.28
Node02	192.168.99.29
Node03	192.168.99.30
Node04	192.168.99.31

准备工作

准备工作四台主机都需要操作。

pvcreate /dev/sdb
vgcreate data /dev/sdb
lvcreate --name data_01 -l 100%FREE data
mkfs.ext4 /dev/mapper/data-data_01
mkdir -p /data/clickhouse

cat >> /etc/fstab<<EOF
/dev/mapper/data-data_01 /data ext4 defaults,noatime 0
EOF
mount -a

修改系统文件安全上限配置及安装依赖

$ vim /etc/security/limits.conf
* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

$ yum install -y libtool *unixODBC*

关闭防火墙和Selinux

# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld

#关闭selinux
sed -i 's/=enforcing/=disabled/g'  /etc/selinux/config
setenforce 0

离线安装

下载并安装软件包

https://repo.yandex.ru/clickhouse/rpm/stable/x86_64，下载离线rpm安装包,clickhouse-client、clickhouse-common-static、clickhouse-common-static-dbg、 clickhouse-server这四个安装包的版本要一致

我这里安装的是21.9.7.2版本，下面是下载地址这里可以先下载软件包到一台主机，然后通过scp的方式传送到另外三台主机。

$ cd /usr/local/src
wget https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/clickhouse-client-21.9.7.2-2.noarch.rpm
wget https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/clickhouse-common-static-21.9.7.2-2.x86_64.rpm
wget https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/clickhouse-common-static-dbg-21.9.7.2-2.x86_64.rpm
wget https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/clickhouse-server-21.9.7.2-2.noarch.rpm
wget https://repo.yandex.ru/clickhouse/rpm/stable/x86_64/clickhouse-test-21.9.7.2-2.noarch.rpm

# 安装
$ yum install -y ./*.rpm

# 修改挂载数据目录的属主和属组
mkdir -p /data/clickhouse
chown -R clickhouse:clickhouse /data/clickhouse/ && chmod 700 /data/clickhouse/

修改配置文件
修改四台主机的clickhouse主配置文件config.xml

$ vim /etc/clickhouse-server/config.xml
<!--注意，此处显示修改后的配置，其他可自行配置-->
		<!--我主机的9000端口占用了，这里改成29000-->
    <tcp_port>29000</tcp_port>
		<!--副本复制同步数据时用的当前节点ip，如果不设置默认是主机名，除非你设置主机名解析，不然会造成副本节点无法同步数据-->
    <interserver_http_host>192.168.99.28</interserver_http_host>
		<!--监听主机的所有地址-->
    <listen_host>0.0.0.0</listen_host>
		<!--支持的最大连接数-->
    <max_connections>4096</max_connections>
		<!--下面是修改默认的data目录到自定义的/data目录-->
    <path>/data/clickhouse/</path>
    <tmp_path>/data/clickhouse/tmp/</tmp_path>
    <format_schema_path>/data/clickhouse/format_schemas/</format_schema_path>
    <user_files_path>/data/clickhouse/user_files/</user_files_path>
           <path>/data/clickhouse/access/</path>
    <!--...-->
		<!--包含的集群配置文件，后面我们集群的配置都写到这个文件中-->
    <include_from>/etc/clickhouse-server/config.d/metrika.xml</include_from>
    <!--...-->

四个节点新建都需要的配置文件/etc/clickhouse-server/config.d/metrika.xml，一段不同的配置在下面列举，相同的配置如下：

$ vim /etc/clickhouse-server/config.d/metrika.xml
<yandex>
<!--ck集群节点-->
<remote_servers>
    <!-- 集群名称，你可以自定义 -->
    <test_cdh_ck_cluster>
        <!--分片1-->
        <shard>
            <!-- 分片的权重 -->
            <weight>1</weight>

            <!-- 这个参数是控制写入数据到分布式表时，分布式表会控制这个写入是否的写入到所有副本中。与复制表的同步是不一样的。为什么<2>中要设置为true，这就是为了避免和复制表的同步复制机制出现冲突，导致数据重复或者不一致。 -->
            <internal_replication>true</internal_replication>
            <!-- 分片1的第一个副本 -->
            <replica>
                <host>192.168.99.28</host>
                <port>29000</port>
                <user>default</user>
                <password></password>
               <compression>true</compression>
            </replica>
            <!-- 分片1的第二个副本 -->
            <replica>
                <host>192.168.99.30</host>
                <port>29000</port>
                <user>default</user>
                <password></password>
               <compression>true</compression>
            </replica>
        </shard>
        <!--分片2-->
        <shard>
            <weight>1</weight>
            <internal_replication>true</internal_replication>
            <!-- 分片2的第一个副本 -->
            <replica>
                <host>192.168.99.29</host>
                <port>29000</port>
                <user>default</user>
                <password></password>
               <compression>true</compression>
            </replica>
            <!-- 分片2的第二个副本 -->
            <replica>
                <host>192.168.99.31</host>
                <port>29000</port>
                <user>default</user>
                <password></password>
               <compression>true</compression>
            </replica>
        </shard>
    </test_cdh_ck_cluster>
</remote_servers>

<!--你的zookeeper集群相关配置-->
<zookeeper>
    <node index="1">
        <host>192.168.99.28</host>
        <port>2181</port>
    </node>
    <node index="2">
        <host>192.168.99.29</host>
        <port>2181</port>
    </node>
    <node index="3">
        <host>192.168.99.30</host>
        <port>2181</port>
    </node>
</zookeeper>

<!-- 网络配置，监听所有地址 -->
<networks>
    <ip>::/0</ip>
</networks>

<!--压缩相关配置-->
<clickhouse_compression>
    <case>
        <min_part_size>10000000000</min_part_size>
        <min_part_size_ratio>0.01</min_part_size_ratio>
        <method>lz4</method> <!--压缩算法lz4压缩比zstd快, 更占磁盘-->
    </case>
</clickhouse_compression>

<!-- 各节点以下配置段不同，注意修改 -->
<macros>
    <!-- 分片的编号，同一个分片的主和副本副本这里的编号一定要相同 -->
    <shard>01</shard>
    <!--当前节点主机名: 我这里写的是主机名 分片名 副本名-->
    <replica>ch28-01-01</replica>
</macros>

</yandex>

Node01-99.28节点不同的集群配置字段如下

<macros>
    <!-- 分片的编号，同一个分片的主和副本副本这里的编号一定要相同 -->
    <shard>01</shard>
    <!--当前节点主机名: 我这里写的是主机名 分片名 副本名-->
    <replica>ch28-01-01</replica>
</macros>

Node02-99.29节点不同的集群配置字段如下

<macros>
    <shard>02</shard>
    <replica>ch29-02-01</replica> <!--当前节点主机名: 我这里写的是主机名 分片名 副本名-->
</macros>

Node03-99.30节点不同的集群配置字段如下

<macros>
    <shard>01</shard>
    <replica>ch30-01-02</replica> <!--当前节点主机名: 我这里写的是主机名 分片名 副本名-->
</macros>

Node04-99.31节点不同的集群配置字段如下

<macros>
    <shard>02</shard>
    <replica>ch31-02-02</replica> <!--当前节点主机名: 我这里写的是主机名 分片名 副本名-->
</macros>

修改所有节点的目录的属组，因为刚刚新建的用户可能是root，我们要改成属组和属主是clickhouse。

修改所有节点的目录的属组，因为刚刚新建的用户可能是root，我们要改成属组和属主是clickhouse。

启动服务
完成后现在就可以启动各节点的clickhouse服务了。

# 所有节点执行启动命令，不要用systemctl启动，可能有问题
$ /etc/init.d/clickhouse-server start

# 查看端口是否起来
$  ss -tnl |egrep '29000|8123'
LISTEN     0      64           *:29000                    *:*
LISTEN     0      64           *:8123                     *:*

如果启动有问题就可以

连接集群并建库验证
任何一台机器使用clickhouse-client命令行工具连接到服务

# 默认是9000端口，因为我们改了所以这里要指定，默认用户名是default密码是空，我们没有改所以直接就连接了。
$ clickhouse-client  --port 29000
clickhouse-client  --port 29000
ClickHouse client version 21.9.7.2 (official build).
Connecting to localhost:29000 as user default.
Connected to ClickHouse server version 21.9.7 revision 54449.

k8s-master31 :)

查看集群信息

:) select cluster,shard_num,replica_num,shard_weight,host_name,port,user from system.clusters;

┌─cluster─────────┬─shard_num─┬─replica_num─┬─shard_weight─┬─host_name─────┬──port─┬─user────┐
│ test_cdh_ck_cluster │         1 │           1 │            1 │ 192.168.99.28 │ 29000 │ default │
│ test_cdh_ck_cluster │         1 │           2 │            1 │ 192.168.99.30 │ 29000 │ default │
│ test_cdh_ck_cluster │         2 │           1 │            1 │ 192.168.99.29 │ 29000 │ default │
│ test_cdh_ck_cluster │         2 │           2 │            1 │ 192.168.99.31 │ 29000 │ default │
└─────────────────┴───────────┴─────────────┴──────────────┴───────────────┴───────┴─────────┘
# 可以看到如下信息
# test_cdh_ck_cluster为我们的集群名称
# 分片1的第一个副本是99.28,第二个副本是99.30
# 分片2的第一个副本是99.29,第二个副本是99.31

创建数据库因为有on cluster test_cdh_ck_cluster语句字段，所有会自动在所有节点执行。

create database testdb on cluster test_cdh_ck_cluster;

创建本地表，如果去掉ON CLUSTER test_cdh_ck_cluster语句字段则需要每个节点都要创建。

create table testdb.table_test ON CLUSTER test_cdh_ck_cluster ( label_id UInt32, label_name String, insert_time Date) ENGINE = ReplicatedMergeTree('/clickhouse/tables/test_cdh_ck_cluster/{shard}/table_test','{replica}',insert_time, (label_id, insert_time), 8192);

ReplicatedMergeTree 引擎用法： ENGINE = ReplicatedMergeTree(‘zk_path’, ‘replica_name’)

zk_path 用于指定在 zk 中创建数据表的路径，一般 zk_path 建议配置成如下形式：

/clickhouse/tables/{cluster}/{shard}/{table_nam版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！e},{replica}
{cluster} 表示集群名，替换成实际的集群名
{shard} 表示分片编号，ch中已定义宏变版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！量，会自动读取本节点的值
{table_name} 表示数据表的名称，替换成实际的表名
{replica} 表示副本编号，ch中已定义宏变量，会自动读取本节点的值

4个节点需要手动去创建分布式表

CREATE TABLE table_test_all AS table_test ENGINE = Distributed(test_cdh_ck_cluster, testdb, table_test, rand());

也可以用下面命令直接在集群中创建

# 需要注意的是这俩不能用as字段了，需要指定字段
CREATE TABLE table_test_all  ON CLUSTER test_cdh_ck_cluster ( label_id UInt32, label_name String, insert_time Date) ENGINE = Distributed(test_cdh_ck_cluster, testdb, table_test, rand());

说明：

test_cdh_ck_cluster：集群名称
table_test_all：分布式表名称
testdb：数据库名称
table_test：本地表名称
rand()：随机分配

我们往分布式表插入8条数据

insert into table_test_all values (1,'111','2021-09-11');
insert into table_test_all values (2,'222','2021-10-22');
insert into table_test_all values (3,'333','2021-10-33');
insert into table_test_all values (4,'444','2021-10-44');
insert into table_test_all values (5,'555','2021-10-55');
insert into table_test_all values (6,'666','2021-10-66');
insert into table_test_all values (7,'777','2021-10-77');
insert into table_test_all values (8,'888','2021-10-88');

下面分别登录各节点查看分布式表的数据都是8条

Node01-99.28节点

:) select count(*) from table_test_all;
┌─count()─┐
│       8 │
└─────────┘

Node02-99.29节点

:) select count(*) from table_test_all;
┌─count()─┐
│       8 │
└─────────┘

Node03-99.30节点

:) select count(*) from table_test_all;
┌─count()─┐
│       8 │
└─────────┘

Node04-99.31节点

:) select count(*) from table_test_all;
┌─count()─┐
│       8 │
└─────────┘

查看各节点的本地表及副本分片的数据。

node01和node03都是分片1的主和副本分片，这里本地表数据肯定是一样的，只是查询出来顺序不一样而已。

# node01-99.28
:) select * from table_test;
┌─label_id─┬─label_name─┬─insert_time─┐
│        1 │ 111        │  2021-09-11 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        8 │ 888        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        6 │ 666        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        3 │ 333        │  1970-01-01 │
└──────────┴────────────┴─────────────┘

# node03-99.30
:) select * from table_test;
┌─label_id─┬─label_name─┬─insert_time─┐
│        6 │ 666        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        3 │ 333        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        8 │ 888        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        1 │ 111        │  2021-09-11 │
└──────────┴────────────┴─────────────┘

node02和node04都是分片2的主和副本分片，这里本地表数据肯定是一样的，只是查询出来顺序不一样而已。

# node02-99.29
:) select * from table_test;
┌─label_id─┬─label_name─┬─insert_time─┐
│        2 │ 222        │  2021-10-22 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        7 │ 777        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        5 │ 555        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        4 │ 444        │  1970-01-01 │
└──────────┴────────────┴─────────────┘

# node04-99.31
:) select * from table_test;
┌─label_id─┬─label_name─┬─insert_time─┐
│        7 │ 777        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        2 │ 222        │  2021-10-22 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        5 │ 555        │  1970-01-01 │
└──────────┴────────────┴─────────────┘
┌─label_id─┬─label_name─┬─insert_time─┐
│        4 │ 444        │  1970-01-01 │
└──────────┴────────────┴─────────────┘

至此集群就安装成功了。

运维管理

服务启停管理命令

$ /etc/init.d/clickhouse-server start
$ /etc/init.d/clickhouse-server restart
$ /etc/init.d/clickhouse-server stop
$ /etc/init.d/clickhouse-server status

终端clickhouse-client命令连接

$ clickhouse-client --port 29000

第三方Gui工具连接

连接地址：192.168.99.28:8123
账号密码：default/密码为空

数据目录

/data/clickhouse

日志目录

/var/log/clickhouse-server/

性能测试

测试脚本

clickhouse-benchmark -h 192.168.99.28 –user=xxxxx–password=xxxxx -c 100 -i 1000 -r < log.txt

查询性能

限制查询并发在100以内,走主键的查询耗时90分位在1s左右
大数据量的分析时间和索引相关度高,31亿数据分析走全表扫描约在10-20s

SELECT count(*) from mapcoding.github_events_distributed where event_type='IssuesEvent';
--group 
SELECT actor_login,count(),uniq(repo_name) AS repos,uniq(repo_name, number) AS prs, replaceRegexpAll(substringUTF8(anyHeavy(body), 1, 100), '[\r\n]', ' ') AS comment FROM mapcoding.github_events_distributed WHERE (event_type = 'PullRequestReviewCommentEvent') AND (action = 'created') GROUP BY actor_login ORDER BY count() DESC LIMIT 50
--join
select count(*) from (SELECT repo_name from mapcoding.github_events_distributed where  event_type='IssuesEvent' limit 10000000)  A  left join (SELECT repo_name from mapcoding.github_events_distributed where  event_type='IssueCommentEvent' limit 10000000)  B ON A.repo_name=B.repo_name   ;
--复合查询
SELECT actor_login, COUNT(*) FROM mapcoding.github_events_distributed WHERE event_type='IssuesEvent' GROUP BY actor_login HAVING COUNT(*) > 10 ORDER BY count(*) DESC LIMIT 100
--主键查询
select actor_login from mapcoding.github_events_distributed where repo_name like 'elastic%' limit 100

--二级索引 主键和索引粒度的选择对查询性能有致命的影响
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test ADD INDEX actor_login_index actor_login TYPE set(0) GRANULARITY 2;
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test MATERIALIZE INDEX actor_login_index;
-- ALTER TABLE mapcoding.github_events ON cluster unimap_test DROP INDEX actor_login_index;
SELECT count(*) FROM mapcoding.github_events_distributed WHERE actor_login='frank';

写入性能

批量插入测试31亿数据耗时3个小时,每秒插入数据在20-30w
单条插入性能没有测试,预计不佳

参考文献

https://clickhouse.com/do版权声明：本文遵循 CC 4.0 BY-SA 版权协议，若要转载请务必附上原文出处链接及本声明，谢谢合作！cs/zh/getting-started/tutorial

https://learn-bigdata.incubator.edurt.io/docs/ClickHouse/Action/get-started-config/

ClickHouse 分片副本集群部署

环境介绍

准备工作

离线安装

运维管理

性能测试

参考文献

Hadoop 高可用HA集群

RabbitMQ 镜像队列集群

vSphere 7 (NSX-T 3.0 VDS) with kubernetes&Tanzu cluster

Zabbix的HA集群搭建