Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO list #1

Open
zzt93 opened this issue Apr 7, 2018 · 3 comments
Open

TODO list #1

zzt93 opened this issue Apr 7, 2018 · 3 comments
Assignees
Projects

Comments

@zzt93
Copy link
Owner

zzt93 commented Apr 7, 2018

TODO

  • Cold-start (ETL) opt

    • hold buffer full handle
    • hold change after now
  • Test Framework more

    • join
    • performance test
  • Add /stat, /input endpoint for syncer

  • Timezone config

  • Convert MySQL integer as byte array

    • add auto conversion with meta info fetched: unsigned int: long, unsigned long: byte array
  • Dependency module: not package & load if not use mongo sync etc.

    • SLF4j
    • Nginx
  • IncludeBefore & IncludeUpdated config?

  • rpm and dpkg

  • Sync check: query input & output for comparing

    • Implement by a special SyncData?
    • Should has a http endpoint to invoke it.
  • Warning if multiple schema.table has different rows

  • A better serializer than json, which lost the type info: PB?Avro?

  • Test redis output & nested sql

  • Config thread for each consumer

  • Support set parent of ES

  • Row image format support?

    • Add must appeared field restriction -- now only primary key
    • Opt: keep only changed field in update event & primary key in delete event -- include must appear field
  • Batch module & failure module is coupled with channel module

    • Filter chain?
    • Failure module as last channel?
  • MDC.put eventId is necessary??

@zzt93 zzt93 self-assigned this Apr 7, 2018
@zzt93
Copy link
Owner Author

zzt93 commented Apr 11, 2018

Done:

  • Output need customization of Spring EL -- Remove spring EL
  • Mysql input field check
  • Add new cold start: batch select (order by id) & batch insert
  • Shorten id:
    • change serverId to port/clientId?
      • serverId: not for unique purpose, but for debug -- removed, to save memory
    • variable integer encoding for position: xxx/123456/gap/xxx
    • shorten offset
  • Support set start binlog file name & position in config file (make it easier to rebuild)
  • Refactor clone & dup semantic -- change to create
  • Reduce memory footprint of StandardEvaluationContext (20% memory reduction)
  • Add file as data source: to read binlog file
  • Update failure log format: not escape json string
  • Order problem: make same id to same thread; strict mode: retry error item and all left; retry only error item
  • Output channel reconnection logic: MySQL & ES
  • Adjust logging level dynamically
  • Add health check endpoint
  • upsert for es output channel if 404
  • Add shutdown hook to do clean up: stop sending data to output target, avoid dup key exception
  • Update to Spring Boot 2.0 for better yaml prompt when config
  • Skip synced item if already synced when startup
  • Add kafka output channel
    • kafka msg consumer has to handle event idempotently;
    • send event using primary key as key
    • deploy SyncData SyncUtil as separate jar to maven central
  • Refactor config naming:
input:
  masters:
    - connection:
        address: ${HOST_ADDRESS}
        port: 27018
      type: Mongo
      repos:
        - name: "chat"
          entities:
          - name: messages
            fields: [time, content]
  • Package refactor:
    • For syncer-data deploy
    • Refactor config package
  • Add kafka version compatiblity in readme.
  • Reduce useless dependency: remove spring boot
  • Refactor filter module design flaw & add nested if and/or enhance switcher
  • Use javassist/cglib/byte buddy JavaCompiler to generate code dynamically rather than spring el
  • Support config key like lower-hyphen
  • Binlog checksum type auto detection
  • kafka MESSAGE TOO LARGE
  • Share same table definition for multiple remote
  • Test framework
  • Refactor SyncData: update event should have before & fields data:
    • add updated() & udpated(String name) method for use
    • add before to get before data
  • Test framework: add update/delete test
  • Update README config example: remove and link to test config dir.
  • Test framework: mongo
  • Check MongoDB whether registered db/collection is exists
  • Batch buffer bug
  • Opt logging: Ack log, MasterConnector
  • Connect to latest binlog flag (cold start usage)
    • de-register cold-start consumer?
    • or use same consumer, different filter?
  • Add consumerId in log
    • or report thread-consumer relation in http port
    • or change thread name to syncer-consumerId-filter-1
  • ConsumerId syntax check: not support -
  • FileBasedMap record last removed position if map is empty
  • Change from tailing oplog to use change stream api: check mongo version when startup
  • ES output channel support nested obj
  • Alter table auto re-sync mysql column index so no need to restart
  • ES client upgrade (5.x, 7.x, not all features, 6.x all features) -- rest client & basic auth to replace xpack & low level rest client
  • Test framework:
    • Mongo update/delete
  • Change filter module to single thread, add partition key support in syncData which will be used in output module (multiple thread)
  • Order problem when id is changed: add scheduler key
    • Joining like this will inevitably cause data inconsistency because the at-least-once-semantic, not do.
    • ES can make it by nested obj
    • Kafka need this
  • Filter module not shutdown but use failure log- Pressure test continue
    • Degradation & Bound queue size: change to fixed sized queue
  • Column filter: _all
  • Cold start

@zzt93 zzt93 added this to To do in syncer Apr 16, 2018
@zzt93 zzt93 moved this from To do to In progress in syncer Apr 16, 2018
@zzt93
Copy link
Owner Author

zzt93 commented May 3, 2018

Testing & Implementing

  • Update position even not interested in

  • Share storage in k8sMode

    • Sync meta info to ZK like
    • k8sMode need a instanceId to differentiate
      • storage path /instanceId/syncer/xx
    • config file?
  • Kafka output: timestamp to long;

  • Mysql output: auto add id;

  • add mysql upsert basic impl. #8 [Test Pending] MySQL upsert support: for join table order problem -- ref

  • [Impl Pending] Update sync meta position when consumer not interested in this event?

    • Implement by a simple position flusher typed event?
    • emit when trying to shutdown?
    • emit when num not interested event happened

@zzt93
Copy link
Owner Author

zzt93 commented Sep 8, 2018

Not Do

  • Schema mis-match problem -- fix by new cold-start method -- ETL
    • Write schema of all tables to local file, then parse all DDL to update it.
    • Start to load schema from files
    • Cold start
      • connect to latest binlog (can't resolve mis-match in this situation)
  • Netty as http client (idempotence is hard to achieve)
  • Support rpc output channel (idempotence is hard to achieve)
  • Support websocket for long lived connection (idempotence is hard to achieve)
  • Join by query extra data source in output?
  • Make output module non-blocking with callback, so reduce filter-output thread?
    • May cause disorder of event -- make it as config option: non-block-mode

Repository owner locked and limited conversation to collaborators Sep 11, 2018
@zzt93 zzt93 pinned this issue Dec 16, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
syncer
  
In progress
Development

No branches or pull requests

1 participant