others linux服务器运维 django3 监控 k8s golang 数据库 大数据 前端 devops 理论基础 java oracle 运维日志

TiDB Data Migration 数据迁入(tiup部署)

访问量:1573 创建时间:2021-09-06

官方文档:https://docs.pingcap.com/zh/tidb-data-migration/stable/overview

TiDB Data Migration (DM) 是将 MySQL/MariaDB 数据迁移到 TiDB 的工具,支持全量数据和增量数据的迁移。

如果数据量在 TB 级别以下,推荐直接使用 DM 迁移 MySQL/MariaDB 数据到 TiDB(迁移的过程包括全量数据的导出导入和增量数据的同步)。

tiup 部署 Data Migration 集群

确保已经安装了tiup工具

tidb基础集群安装参考

安装 TiUP DM 组件

[tidb@localhost ~]$ tiup install dm
download https://tiup-mirrors.pingcap.com/dm-v1.5.6-linux-amd64.tar.gz 8.08 MiB / 8.08 MiB 100.00% 3.52 MiB/s  
[tidb@localhost ~]$ tiup update --self && tiup update dm
download https://tiup-mirrors.pingcap.com/tiup-v1.5.6-linux-amd64.tar.gz 6.89 MiB / 6.89 MiB 100.00% 3.67 MiB/s
Updated successfully!
component dm version v1.5.6 is already installed
Updated successfully!

配置集群

#生成配置模板
[tidb@localhost ~]$ tiup dm template > dm-topology.yaml
# 修改配置文件如下,我这里只是用2个master_server
[tidb@localhost ~]$ cat dm-topology.yaml 
---
global:
  user: "tidb"
  ssh_port: 22
  deploy_dir: "/data/dm/deploy"
  data_dir: "/data/dm/data"
  # arch: "amd64"

master_servers:
  - host: 192.168.0.207
  - host: 192.168.0.216

worker_servers:
  - host: 192.168.0.207
  - host: 192.168.0.216
  - host: 192.168.0.217

monitoring_servers:
  - host: 192.168.0.217

grafana_servers:
  - host: 192.168.0.217

alertmanager_servers:
  - host: 192.168.0.217

查看dm版本,选择v2.0.6 版本

[tidb@localhost ~]$ tiup list dm-master
Available versions for dm-master:
Version                             Installed  Release                    Platforms
-------                             ---------  -------                    ---------
nightly -> v5.0.0-nightly-20210904             2021-09-04T21:43:50+08:00  linux/amd64
v2.0.0-rc                                      2020-08-21T17:49:08+08:00  linux/amd64,linux/arm64
v2.0.0-rc.2                                    2020-09-01T20:51:29+08:00  linux/amd64,linux/arm64
v2.0.0                                         2020-10-30T16:10:58+08:00  linux/amd64,linux/arm64
v2.0.1                                         2020-12-25T13:22:29+08:00  linux/amd64,linux/arm64
v2.0.3                                         2021-05-11T22:14:31+08:00  linux/amd64,linux/arm64
v2.0.4                                         2021-06-18T16:34:30+08:00  linux/amd64,linux/arm64
v2.0.5                                         2021-07-30T18:46:27+08:00  linux/amd64,linux/arm64
v2.0.6                                         2021-08-13T17:36:06+08:00  linux/amd64,linux/arm64
v5.0.0-nightly-20210904                        2021-09-04T21:43:50+08:00  linux/amd64

执行部署安装

[tidb@localhost ~]$ tiup dm deploy dm-zabbix v2.0.6 ./dm-topology.yaml --user root -p

查看、启动

#查看 TiUP 管理的集群情况(可以部署多个dm集群)
[tidb@localhost ~]$ tiup dm list
Starting component `dm`: /home/tidb/.tiup/components/dm/v1.5.6/tiup-dm list
Name       User  Version  Path                                            PrivateKey
----       ----  -------  ----                                            ----------
dm-zabbix  tidb  v2.0.6   /home/tidb/.tiup/storage/dm/clusters/dm-zabbix  /home/tidb/.tiup/storage/dm/clusters/dm-zabbix/ssh/id_rsa
#检查部署的 DM 集群情况
[tidb@localhost ~]$ tiup dm display dm-zabbix
Starting component `dm`: /home/tidb/.tiup/components/dm/v1.5.6/tiup-dm display dm-zabbix
Cluster type:       dm
Cluster name:       dm-zabbix
Cluster version:    v2.0.6
Deploy user:        tidb
SSH type:           builtin
ID                  Role          Host           Ports      OS/Arch       Status  Data Dir                         Deploy Dir
--                  ----          ----           -----      -------       ------  --------                         ----------
192.168.0.217:9093  alertmanager  192.168.0.217  9093/9094  linux/x86_64  Down    /data/dm/data/alertmanager-9093  /data/dm/deploy/alertmanager-9093
192.168.0.207:8261  dm-master     192.168.0.207  8261/8291  linux/x86_64  Down    /data/dm/data/dm-master-8261     /data/dm/deploy/dm-master-8261
192.168.0.216:8261  dm-master     192.168.0.216  8261/8291  linux/x86_64  Down    /data/dm/data/dm-master-8261     /data/dm/deploy/dm-master-8261
192.168.0.207:8262  dm-worker     192.168.0.207  8262       linux/x86_64  N/A     /data/dm/data/dm-worker-8262     /data/dm/deploy/dm-worker-8262
192.168.0.216:8262  dm-worker     192.168.0.216  8262       linux/x86_64  N/A     /data/dm/data/dm-worker-8262     /data/dm/deploy/dm-worker-8262
192.168.0.217:8262  dm-worker     192.168.0.217  8262       linux/x86_64  N/A     /data/dm/data/dm-worker-8262     /data/dm/deploy/dm-worker-8262
192.168.0.217:3000  grafana       192.168.0.217  3000       linux/x86_64  Down    -                                /data/dm/deploy/grafana-3000
192.168.0.217:9090  prometheus    192.168.0.217  9090       linux/x86_64  Down    /data/dm/data/prometheus-9090    /data/dm/deploy/prometheus-9090
Total nodes: 8
#启动集群
[tidb@localhost ~]$ tiup dm start dm-zabbix
Starting component `dm`: /home/tidb/.tiup/components/dm/v1.5.6/tiup-dm start dm-zabbix
Starting cluster dm-zabbix...
+ [ Serial ] - SSHKeySet: privateKey=/home/tidb/.tiup/storage/dm/clusters/dm-zabbix/ssh/id_rsa, publicKey=/home/tidb/.tiup/storage/dm/clusters/dm-zabbix/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.217
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.217
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.216
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.207
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.216
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.217
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.217
+ [Parallel] - UserSSH: user=tidb, host=192.168.0.207
+ [ Serial ] - StartCluster
Starting component dm-master
        Starting instance 192.168.0.216:8261
        Starting instance 192.168.0.207:8261
        Start instance 192.168.0.216:8261 success
        Start instance 192.168.0.207:8261 success
Starting component dm-worker
        Starting instance 192.168.0.217:8262
        Starting instance 192.168.0.207:8262
        Starting instance 192.168.0.216:8262
        Start instance 192.168.0.207:8262 success
        Start instance 192.168.0.216:8262 success
        Start instance 192.168.0.217:8262 success
Starting component prometheus
        Starting instance 192.168.0.217:9090
        Start instance 192.168.0.217:9090 success
Starting component grafana
        Starting instance 192.168.0.217:3000
        Start instance 192.168.0.217:3000 success
Starting component alertmanager
        Starting instance 192.168.0.217:9093
        Start instance 192.168.0.217:9093 success
Started cluster `dm-zabbix` successfully
#验证集群运行状态
[tidb@localhost ~]$ tiup dm display dm-zabbix

dmctl是集群控制工具,官方文档如下:https://docs.pingcap.com/zh/tidb-data-migration/stable/maintain-dm-using-tiup#%E9%9B%86%E7%BE%A4%E6%8E%A7%E5%88%B6%E5%B7%A5%E5%85%B7-dmctl

创建数据源

加密数据源密码

[tidb@localhost ~]$ tiup dmctl encrypt 'zabbixpassword'
rUv9S55dJHuInCHOzmgcWkBsDuaPoKZbJ+I=

创建数据源

[tidb@localhost ~]$ cat source-mysql-01.yaml
source-id: "mysql-01"    # 数据源 ID,在数据迁移任务配置和 dmctl 命令行中引用该 source-id 可以关联到对应的数据源
enable-gtid: false # DM-worker 是否使用全局事务标识符 (GTID) 拉取 binlog。使用前提是在上游 MySQL 已开启 GTID 模式。
from:
  host: "10.0.0.181"
  port: 3306
  user: "zabbix"
  password: "rUv9S55dJHuInCHOzmgcWkBsDuaPoKZbJ+I=" # 推荐使用 dmctl 对上游数据源的用户密码加密之后的密码

###执行创建数据源命令
[tidb@localhost ~]$ tiup dmctl --master-addr 192.168.0.207:8261 operate-source create source-mysql-01.yaml
###查看数据源
[tidb@localhost ~]$ tiup dmctl --master-addr 192.168.0.207:8261 get-config source mysql-01
###查看所有数据源
[tidb@localhost ~]$ tiup dmctl --master-addr 192.168.0.207:8261  operate-source show

已经通过dumping与lightning进行了全量的导出与导入(忽略本过程)

增量同步

查看dumping导出的主数据库状态pos与binlog

因为前期已经通过lightning进行全量导入,这里进行后续的导入。

###查看数据源的二进制日志位置信息
[tidb@esbprddb ~]$ cat /data/mysqlbak217/metadata 
Started dump at: 2021-09-09 14:11:54
SHOW MASTER STATUS:
        Log: mysql-bin.000007
        Pos: 200083775
        GTID:0-2-117716

SHOW SLAVE STATUS:
        Connection name: 
        Host:  
        Log: 
        Pos: 0
        GTID:

Finished dump at: 2021-09-09 14:22:10

配置增量同步任务文件

[tidb@esbprddb ~]$ cat task1.yaml 
name: task-zabbix1             # 任务名称,需要全局唯一
task-mode: incremental # 任务模式,设为 "incremental" 即只进行增量数据迁移
#####ignore-checking-items 配置table_schema 会导致同步速度巨慢,我在测试时无法追赶mariadb的主库,因为zabbix的history表无主键与唯一键,要dm同步必须配置忽略检查table_schema
ignore-checking-items: ["table_schema"]
## 配置下游 TiDB 数据库实例访问信息
target-database:       # 下游数据库实例配置
  host: "127.0.0.1"
  port: 4643
  user: "root"
  password: "tidb@pwd"         # 如果密码不为空,则推荐使用经过 dmctl 加密的密文

##  使用黑白名单配置需要同步的表
block-allow-list:   # 数据源数据库实例匹配的表的 block-allow-list 过滤规则集,如果 DM 版本早于 v2.0.0-beta.2 则使用 black-white-list
  bw-rule-1:        # 黑白名单配置项 ID
    do-dbs: ["zabbix"] # 迁移哪些库
    ignore-tables:                   # 忽略哪些表
    - db-name: "zabbix"
      tbl-name: "auditlog.*"

## 【可选配置】如果增量数据迁移需要重复迁移已经在全量数据迁移中完成迁移的数据,则需要开启 safe mode 避免增量数据迁移报错
##  该场景多见于,全量迁移的数据不属于数据源的一个一致性快照,随后从一个早于全量迁移数据之前的位置开始同步增量数据
syncers:            # sync 处理单元的运行配置参数
  global:           # 配置名称
    safe-mode: true # 设置为 true,则将来自数据源的 `INSERT` 改写为 `REPLACE`,将 `UPDATE` 改写为 `DELETE` 与 `REPLACE`,保证在表结构中存在主键或唯一索引的条件下迁移数据时可以重复导入 DML。在启动或恢复增量复制任务的前 1 分钟内 TiDB DM 会自动启动 safe mode
    worker-count: 32 # 我已开始没配置这个参数,默认16,但是发现同步越来越落后,secondsBehindMaster 越来越大,改为32后,同步速度变快

## 配置数据源
mysql-instances:
  - source-id: "mysql-02"         # 数据源 ID,可以从数据源配置中获取
    block-allow-list: "bw-rule-1" # 引入上面黑白名单配置
    syncer-config-name: "global"  # 引用上面的 syncers 增量数据配置
    meta:                         # `task-mode` 为 `incremental` 且下游数据库的 `checkpoint` 不存在时 binlog 迁移开始的位置; 如果 `checkpoint` 存在,以 `checkpoint` 为准
      binlog-name: "mysql-bin.000007"
      binlog-pos: 200083775

启动任务

[tidb@esbprddb ~]$ tiup dmctl --master-addr 192.168.0.207:8261 start-task task1.yaml 
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.6/dmctl/dmctl --master-addr 192.168.0.207:8261 start-task task1.yaml
{
    "result": true,
    "msg": "",
    "sources": [
        {
            "result": true,
            "msg": "",
            "source": "mysql-01",
            "worker": "dm-192.168.0.216-8262"
        }
    ]
}

查询任务

[tidb@esbprddb ~]$ tiup dmctl --master-addr 192.168.0.207:8261 query-status
Starting component `dmctl`: /home/tidb/.tiup/components/dmctl/v2.0.6/dmctl/dmctl --master-addr 192.168.0.207:8261 query-status
{
    "result": true,
    "msg": "",
    "tasks": [
        {
            "taskName": "task-zabbix",
            "taskStatus": "Error - Some error occurred in subtask. Please run `query-status task-zabbix` to get more details.",
            "sources": [
                "mysql-01"
            ]
        }
    ]
}
[tidb@esbprddb ~]$ tiup dmctl --master-addr 192.168.0.207:8261 query-status task-zabbix

停止任务(相当于删除任务删除,停止后不可查询任务状态)

[tidb@esbprddb ~]$ tiup dmctl --master-addr 192.168.0.207:8261 stop-task  task-zabbix

暂停任务(暂停后可以查询,可以resume-task启动)

[tidb@esbprddb ~]$ tiup dmctl --master-addr 192.168.0.207:8261 pause-task  task-zabbix1

本人在增量同步时,多次停止stop启动start task, stop时,我是先暂pause停再stop,记住当前同步的syncerBinlog 当前的值(通过query-status查看syncerBinlog),下次同步修改task文件的pos与binlog-name参数。但是发现有重复值插入到无主键的zabbix历史数据表中了。

如果同步过程中想跳过或者回放某个pos位置的sql,dm的下游有一个 dm_meta 库里面存放 checkpoint 的信息,如果你 pause 在 resume 任务时,是根据 dm-meta 库中记录的 checkpoint 位点开始进行数据同步的。

官方完整配task置文件跳转链接

停止、删除dm组件

##停止同步任务
[tidb@esbprddb ~]$ tiup dmctl --master-addr 192.168.0.207:8261 stop-task  task-zabbix1
###停止组件服务
[tidb@esbprddb ~]$ tiup dm stop dm-zabbix
###删除组件
[tidb@esbprddb ~]$ tiup dm destroy  dm-zabbix
登陆评论: 使用GITHUB登陆