From 430638d48ae5233cc0976906c04423f9fead3926 Mon Sep 17 00:00:00 2001 From: Eric Yang Date: Mon, 11 Jan 2016 16:04:45 +0800 Subject: [PATCH] update cn readme --- LICENSE | 2 +- README.md | 8 +++--- README_CN.md | 67 +++++++++++++++++++++++++++++++++++++++++++++++++ src/__init__.py | 1 + 4 files changed, 74 insertions(+), 4 deletions(-) create mode 100644 README_CN.md diff --git a/LICENSE b/LICENSE index 20efd1b..dbd643c 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ The MIT License (MIT) -Copyright (c) 2015 +Copyright (c) by Windfarer Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal diff --git a/README.md b/README.md index 1672c3d..2e23350 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ # py-mysql-elasticsearch-sync Simple and fast MySQL to Elasticsearch sync tool, written in Python. +[中文文档](https://github.com/zhongbiaodev/py-mysql-elasticsearch-sync/blob/master/README_CN.md) + ## Introduction This tool helps you to initialize MySQL dump table to Elasticsearch by parsing mysqldump, then incremental sync MySQL table to Elasticsearch by processing MySQL Binlog. Also, during the binlog syncing, this tool will save the binlog sync position, so that it is easy to recover after this tool being shutdown for any reason. @@ -9,7 +11,7 @@ Also, during the binlog syncing, this tool will save the binlog sync position, s By following these steps. ##### 1. ibxml2 and libxslt -Also, this tool depends on python lxml package, so that you should install the lxml's dependecies correctly, the libxml2 and libxslt are required. +This tool depends on python lxml package, so that you should install the lxml's dependecies correctly, the libxml2 and libxslt are required. For example, in CentOS: @@ -25,7 +27,7 @@ sudo apt-get install libxml2-dev libxslt-dev python-dev See [lxml Installation](http://lxml.de/installation.html) for more infomation. ##### 2. mysqldump -And then, mysqldump is required.(and enable binlog) +And then, mysqldump is required in the machine where this tool will be run on it.(and the mysql server must enable binlog) ##### 3. this tool @@ -58,7 +60,7 @@ es-sync path/to/your/config.yaml --fromfile to start sync, when xml sync is over, it will also start binlog sync. ## Deployment -We provide an upstart script to help you deploy this tool,since we use virtualenv for requirements isolation, you must edit it for your own condition, besides, you can deploy it in your own way. +We provide an upstart script to help you deploy this tool, you can edit it for your own condition, besides, you can deploy it in your own way. ## TODO - [ ] MultiIndex Supporting diff --git a/README_CN.md b/README_CN.md new file mode 100644 index 0000000..93c1d11 --- /dev/null +++ b/README_CN.md @@ -0,0 +1,67 @@ +# py-mysql-elasticsearch-sync +一个从MySQL向Elasticsearch同步数据的工具,使用Python实现。 + +## 简介 +在第一次初始化数据时,本工具解析mysqldump导出的数据,并导入ES中,在后续增量更新中,解析binlog的数据,对ES中的数据进行同步。在binlog同步阶段,支持断点恢复,因此无需担心意外中断的问题。 + +## 安装 + +##### 1. ibxml2 和 libxslt +本工具基于lxml库,因此需要安装它的依赖的libxml2和libxslt + +在CentOS中: + +``` +sudo yum install libxml2 libxml2-devel libxslt libxslt-devel +``` + +在Debian/Ubuntu中: + +``` +sudo apt-get install libxml2-dev libxslt-dev python-dev +``` + +查看[lxml Installation](http://lxml.de/installation.html)来获取更多相关信息 + +##### 2. mysqldump +在运行本工具的机器上需要有mysqldump,并且mysql服务器需要开启binlog功能。 + + +##### 3. 本工具 +安装本工具 + +``` +pip install py-mysql-elasticsearch-sync +``` + +## 配置 +你可以通过修改[配置文件示例](https://github.com/zhongbiaodev/py-mysql-elasticsearch-sync/blob/master/src/sample.yaml)来编写自己的配置文件 + +## 运行 +运行命令 + +``` +es-sync path/to/your/config.yaml +``` +工具将开始执行mysqldump并解析流进行同步,当dump结束后,将启动binlog同步 + +最近一次binlog同步位置记录在一个文件中,这个文件的路径在config文件中配置过。 + +你可以删除记录文件来从头进行binlog同步,或者修改文件里的内容,来从特定位置开始同步。 + + +你也可以把自己从mysql导出的xml文件同步进ES中(在mysqldump的命令中加上参数```-X```即可导出xml) + +然后执行 + +``` +es-sync path/to/your/config.yaml --fromfile +``` +启动从xml导入,当从xml导入完毕后,它会开始同步binlog + +## 服务管理 +我们写了一个[upstart脚本]来管理本工具的运行,你也可以用你自己的方式进行部署运行 + +## TODO +- [ ] 多索引支持 +- [ ] 多表支持 diff --git a/src/__init__.py b/src/__init__.py index 3dbb6e8..a719125 100644 --- a/src/__init__.py +++ b/src/__init__.py @@ -36,6 +36,7 @@ def encode_in_py2(s): DEFAULT_BULKSIZE = 100 DEFAULT_BINLOG_BULKSIZE = 1 + class ElasticSync(object): table_structure = {} log_file = None