Skip to content

ClickHouse 集群搭建

Posted on:2021年6月21日

人群圈选业务痛点

ClickHouse 特性

下载安装包

sudo yum install yum-utils
sudo rpm --import https://repo.clickhouse.tech/CLICKHOUSE-KEY.GPG
sudo yum-config-manager --add-repo https://repo.clickhouse.tech/rpm/clickhouse.repo
sudo yum install clickhouse-server clickhouse-client

文件目录

# 默认数据存储路径
/var/lib/clickhouse

# 默认日志路径
/var/log/clickhouse-server

# 服务端配置文件路径
/etc/clickhouse-server

# 核心配置文件详解:https://www.e-learn.cn/topic/4050968
/etc/clickhouse-server/config.xml

# 客户端配置文件路径
/etc/clickhouse-client

命令

# 服务端命令
service clickhouse-server start | status | restart | stop

# 调试,日志打印在控制台
sudo -u clickhouse /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml

# 启动客户端
clickhouse-client --port 9003

集群部署

  1. 在群集的所有机器上安装 ClickHouse 服务端

  2. 在配置文件中设置群集配置

    <!--config.xml配置-->
    <tcp_port>9003</tcp_port>
    <listen_host>::</listen_host>
    <!--incl属性表示可从外部文件中获取节点名-->
    <remote_servers incl="clickhouse_remote_servers" />
    <zookeeper incl="zookeeper-servers" optional="true" />
    <macros incl="macros" optional="true" />
    <!--使用外部扩展配置文件-->
    <include_from>/etc/clickhouse-server/metrika.xml</include_from>
    
    <!--metrika.xml配置-->
    <yandex>
      <!-- 集群配置 -->
      <clickhouse_remote_servers>
        <ck_cluster>
          <shard>
            <replica>
              <host>worker07</host>
              <port>9003</port>
            </replica>
            <replica>
              <host>worker08</host>
              <port>9003</port>
            </replica>
            <replica>
              <host>worker09</host>
              <port>9003</port>
            </replica>
          </shard>
        </ck_cluster>
      </clickhouse_remote_servers>
      <!--zookeeper相关配置-->
      <zookeeper-servers>
        <node index="3">
          <host>master01</host>
          <port>2181</port>
        </node>
        <node index="2">
          <host>worker01</host>
          <port>2181</port>
        </node>
        <node index="1">
          <host>worker02</host>
          <port>2181</port>
        </node>
        <node index="5">
          <host>worker03</host>
          <port>2181</port>
        </node>
        <node index="4">
          <host>worker04</host>
          <port>2181</port>
        </node>
      </zookeeper-servers>
      <!--环境变量 -->
      <macros>
        <shard>0</shard>
        <replica>worker07</replica>
      </macros>
    </yandex>
  3. 分发配置,其自动生效,查看集群

    select * from system.clusters;

一致性

internal_replication:表示是否只将数据写入其中一个副本,默认为 false

四种复制模式:

优化

  1. 读的时候是用 distribution 表去读取数据,轮询写入本地表
  2. 写入时尽量“大批量,少批次”写入
  3. 尽量使用大宽表代替 join,列宽支持 1w+

集群测试

-- 先创建本地表,再创建分布式表,往分布式表插入数据,数据随机分布在本地表
-- 在每个实例上创建本地表
CREATE TABLE default.cluster3s1r_local(`id` Int32,`website` String,`wechat` String,`FlightDate` Date,Year UInt16) ENGINE = MergeTree(FlightDate, (Year, FlightDate), 8192);

-- 创建一个分布式表(指定集群名、数据库名、数据表名、分片KEY)
CREATE TABLE default.cluster3s1r_all AS cluster3s1r_local ENGINE = Distributed(ck_cluster, default, cluster3s1r_local, rand());

-- 插入测试数据
INSERT INTO default.cluster3s1r_all
(id,website,wechat,FlightDate,Year)values(1,'https://niocoder.com/','java干货','2020-11-28',2020);
INSERT INTO default.cluster3s1r_all (id,website,wechat,FlightDate,Year)values(2,'http://www.merryyou.cn/','javaganhuo','2020-11-28',2020);
INSERT INTO default.cluster3s1r_all (id,website,wechat,FlightDate,Year)values(3,'http://www.xxxxx.cn/','xxxxx','2020-11-28',2020);

-- 查询分布式表
select * from cluster3s1r_all;
-- 查询本地表
select * from cluster3s1r_local;

-- 创建复制表,连接zk,不用创建分布式表,不配置集群clickhouse_remote_servers。数据写入本地表,其他节点自动同步数据
-- 创建复制表
CREATE TABLE default.cluster1s3r_local on cluster ck_cluster (`id` Int32,`website` String,`wechat` String,`FlightDate` Date,Year UInt16) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/cluster1s3r_local', '{replica}', FlightDate, (Year, FlightDate), 8192);

INSERT INTO default.cluster1s3r_all
(id,website,wechat,FlightDate,Year)values(1,'https://niocoder.com/','java干货','2020-11-28',2020);

select * from cluster1s3r_all;

问题

  1. DB::Exception: Effective user of the process (root) does not match the owner of the data (clickhouse). Run under ‘sudo -u clickhouse’.

    sudo -u clickhouse clickhouse-server —config-file=/etc/clickhouse-server/config.xml

  2. Access to file denied: /var/log/clickhouse-server/clickhouse-server.log

    chown -R clickhouse /var/log/clickhouse-server/

  3. Address already in use

    更改 tcp 配置,切换 9000 端口为 9003

  4. There are two exactly the same ClickHouse instances

    单台节点只能部署一个分片副本,即 3 分片 2 副本需要 6 台服务器


Previous Post
统计 HDFS 小文件
Next Post
《深入理解虚拟机》笔记