Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexOutOfBoundsException when used RDS, but single mysql server OK #469

Closed
iadmirezhe opened this issue Nov 4, 2016 · 15 comments
Closed

Comments

@iadmirezhe
Copy link

iadmirezhe commented Nov 4, 2016

When I used RDS as mysql server,maxwell v1.4.2, I encountered the error: Couldn't find table ha_health_check. Then I took some steps as follows:
1. Added mysql to blacklist_dbs, previously, blacklist_dbs had nothing.
2. Initialized the maxwell schema tables by deleting all data in those tables.
3. Started maxwell, log was like:
10:29:28,702 INFO Maxwell - Maxwell v1.4.2 is booting (StdoutProducer), starting at BinlogPosition[mysql-bin.000102:28008]
10:29:28,825 INFO OpenReplicator - starting replication at mysql-bin.000102:28008
10:29:29,070 INFO AbstractSchemaStore - Maxwell is capturing initial schema
4.Executed an insert clause in another test table with just two columns. Then maxwell shutdown. Error was:
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
at java.util.ArrayList.rangeCheck(ArrayList.java:635)
at java.util.ArrayList.get(ArrayList.java:411)
at com.zendesk.maxwell.schema.ColumnWithDefinitionList$ColumnWithDefinitionIterator.next(ColumnWithDefinitionList.java:67)
at com.zendesk.maxwell.schema.ColumnWithDefinitionList$ColumnWithDefinitionIterator.next(ColumnWithDefinitionList.java:1)
at com.zendesk.maxwell.replication.AbstractRowsEvent.jsonMaps(AbstractRowsEvent.java:206)
at com.zendesk.maxwell.replication.MaxwellReplicator.getTransactionRows(MaxwellReplicator.java:281)
at com.zendesk.maxwell.replication.MaxwellReplicator.getRow(MaxwellReplicator.java:428)
at com.zendesk.maxwell.replication.MaxwellReplicator.work(MaxwellReplicator.java:159)
at com.zendesk.maxwell.util.RunLoopProcess.runLoop(RunLoopProcess.java:35)
at com.zendesk.maxwell.Maxwell.start(Maxwell.java:160)
at com.zendesk.maxwell.Maxwell.main(Maxwell.java:181)

5. I added some logs in the class ColumnWithDefinitionList, I found that the columnDefList length was not equivalent to the usedColumns length, it seemed that the columnDefList had two elements, that was right, but the usedColumns had more, so the IndexOutOfBoundsException happened.

All the issues above have not happened when I used single mysql server.

Any reply would be greatly appreciated.

@iadmirezhe iadmirezhe changed the title RDS error, but single mysql server OK IndexOutOfBoundsException when used RDS, but single mysql server OK Nov 4, 2016
@osheroff
Copy link
Collaborator

osheroff commented Nov 5, 2016

  1. it's probably best that you don't screw around with blacklist_db.
  2. do you have the original error message? (around ha_health_check)
  3. if you want to re-initialize maxwell, do it by performing DROP DATABASE maxwell -- don't just delete all the data in the tables. I think what happened to you was you had a binlog position that was out of sync from your schema; so you were replicating old data with a newer schema, which won't work.

@iadmirezhe
Copy link
Author

Thank you very much for your advice.

Here is the original error message (around ha_health_check):
09:46:55,804 INFO Maxwell - Maxwell v1.4.2 is booting (StdoutProducer), starting at BinlogPosition[mysql-bin.000113:480400]
09:46:55,941 INFO OpenReplicator - starting replication at mysql-bin.000113:480400
09:46:56,141 INFO AbstractSchemaStore - Maxwell is capturing initial schema
java.lang.RuntimeException: Couldn't find table ha_health_check
at com.zendesk.maxwell.replication.TableCache.processEvent(TableCache.java:30)
at com.zendesk.maxwell.replication.MaxwellReplicator.getTransactionRows(MaxwellReplicator.java:287)
at com.zendesk.maxwell.replication.MaxwellReplicator.getRow(MaxwellReplicator.java:428)
at com.zendesk.maxwell.replication.MaxwellReplicator.work(MaxwellReplicator.java:159)
at com.zendesk.maxwell.util.RunLoopProcess.runLoop(RunLoopProcess.java:35)
at com.zendesk.maxwell.Maxwell.start(Maxwell.java:160)
at com.zendesk.maxwell.Maxwell.main(Maxwell.java:181)
09:47:03,934 INFO Maxwell - starting shutdown
09:47:04,036 INFO PositionStoreThread - Storing final position: BinlogPosition[mysql-bin.000113:480400]

When I tried other mysql replicator tool such as canal, the same error 'Couldn't find mysql.ha_health_check table', and then the same IndexOutOfBoundsException happened. So I thought maybe the RDS binlog had something different and special. The IndexOutOfBoundsException showed the test table had two columns but the usedColumns had three.

Finally, I altered the test table by adding primary key,and at the same time,the mysql table had already been added to the blacklist_db as mentioned previously.Then everything was OK!
1)Added mysql to the blacklist_db, 'Couldn't find table ha_health_check' disappeared.Maybe RDS has a table called ha_health_check,but ordinary user has no privilege to access it.
2)Added primary key to the test table, IndexOutOfBoundsException disappeared.Reason is that if a table has no primary key,RDS will add a hidden primary key.But the hidden key is only accessible to super user,so ordinary user will find table columns inconsistent with that in table meta data.So the exception is caused by RDS,not maxwell.

@osheroff
Copy link
Collaborator

osheroff commented Nov 7, 2016

i think you might be right in (1) -- it really seems like RDS is putting rows in its binlogs around the ha_health_check table, but maybe it's invisible to maxwell's schema capture code? Either that or there's a race condition between capturing schema and replication, but the fact that you ran into it with another tool makes it seem like there's something funky going on here...

what is the output of this command?

show binlog events in 'mysql-bin.000113' from 480400 limit 10;

@iadmirezhe
Copy link
Author

iadmirezhe commented Nov 8, 2016

mysql-bin.000113 has been expired. I re-initialized maxwell and started it. Then error:
10:35:34,875 INFO Maxwell - Maxwell v1.4.2 is booting (StdoutProducer), starting at BinlogPosition[mysql-bin.000118:36379]
10:35:35,038 INFO OpenReplicator - starting replication at mysql-bin.000118:36379
10:35:35,431 INFO AbstractSchemaStore - Maxwell is capturing initial schema
java.lang.RuntimeException: Couldn't find table ha_health_check
at com.zendesk.maxwell.replication.TableCache.processEvent(TableCache.java:30)
at com.zendesk.maxwell.replication.MaxwellReplicator.getTransactionRows(MaxwellReplicator.java:287)
at com.zendesk.maxwell.replication.MaxwellReplicator.getRow(MaxwellReplicator.java:428)
at com.zendesk.maxwell.replication.MaxwellReplicator.work(MaxwellReplicator.java:159)
at com.zendesk.maxwell.util.RunLoopProcess.runLoop(RunLoopProcess.java:35)
at com.zendesk.maxwell.Maxwell.start(Maxwell.java:160)
at com.zendesk.maxwell.Maxwell.main(Maxwell.java:181)
10:35:41,346 INFO Maxwell - starting shutdown
10:35:41,448 INFO PositionStoreThread - Storing final position: BinlogPosition[mysql-bin.000118:36379]

show binlog events in 'mysql-bin.000118' from 36379 limit 10; Output:
+------------------+-------+----------------+------------+-------------+-------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-------+----------------+------------+-------------+-------------------------------------------------------------------------+
| mysql-bin.000118 | 36379 | Gtid | 2428115145 | 36427 | SET @@SESSION.GTID_NEXT= '8fe2bd6b-8ee7-11e6-b6b6-6c92bf204623:2534142' |
| mysql-bin.000118 | 36427 | Query | 2428115145 | 36502 | BEGIN |
| mysql-bin.000118 | 36502 | Table_map | 2428115145 | 36566 | table_id: 587791 (maxwell.positions) |
| mysql-bin.000118 | 36566 | Write_rows_v1 | 2428115145 | 36634 | table_id: 587791 flags: STMT_END_F |
| mysql-bin.000118 | 36634 | Xid | 2428115145 | 36665 | COMMIT /* xid=31672098 / |
| mysql-bin.000118 | 36665 | Gtid | 2428115145 | 36713 | SET @@SESSION.GTID_NEXT= '8fe2bd6b-8ee7-11e6-b6b6-6c92bf204623:2534143' |
| mysql-bin.000118 | 36713 | Query | 2428115145 | 36788 | BEGIN |
| mysql-bin.000118 | 36788 | Table_map | 2428115145 | 36852 | table_id: 587791 (maxwell.positions) |
| mysql-bin.000118 | 36852 | Update_rows_v1 | 2428115145 | 36964 | table_id: 587791 flags: STMT_END_F |
| mysql-bin.000118 | 36964 | Xid | 2428115145 | 36995 | COMMIT /
xid=31672262 */ |
+------------------+-------+----------------+------------+-------------+-------------------------------------------------------------------------+

@osheroff
Copy link
Collaborator

osheroff commented Nov 8, 2016

hmm, that doesn't have the ticket update. maybe increase the LIMIT until you see something referencing ha_health_check?

@iadmirezhe
Copy link
Author

iadmirezhe commented Nov 9, 2016

Yes! I increased the LIMIT. ha_health_check finally shows up, and many events around ha_health_check. Here it is a part of binlog:

mysql-bin.000121 | 1585489 | Query | 2428115145 | 1585557 | BEGIN |
| mysql-bin.000121 | 1585557 | Table_map | 2428115145 | 1585619 | table_id: 623477 (mysql.ha_health_check) |
| mysql-bin.000121 | 1585619 | Update_rows_v1 | 2428115145 | 1585675 | table_id: 623477 flags: STMT_END_F |
| mysql-bin.000121 | 1585675 | Xid | 2428115145 | 1585706 | COMMIT /* xid=32287265 */

@osheroff
Copy link
Collaborator

just looks like a normal table as far as I can tell.

What about the output of:

SELECT TABLES.TABLE_NAME, CCSA.CHARACTER_SET_NAME 
FROM INFORMATION_SCHEMA.TABLES 
JOIN  information_schema.COLLATION_CHARACTER_SET_APPLICABILITY AS CCSA
 ON TABLES.TABLE_COLLATION = CCSA.COLLATION_NAME WHERE TABLES.TABLE_SCHEMA = 'mysql'

Maybe there's something weird with the join, or something.....

@iadmirezhe
Copy link
Author

Here it is:
mysql> SELECT TABLES.TABLE_NAME, CCSA.CHARACTER_SET_NAME
-> FROM INFORMATION_SCHEMA.TABLES
-> JOIN information_schema.COLLATION_CHARACTER_SET_APPLICABILITY AS CCSA
-> ON TABLES.TABLE_COLLATION = CCSA.COLLATION_NAME WHERE TABLES.TABLE_SCHEMA = 'mysql';
+---------------------------+--------------------+
| TABLE_NAME | CHARACTER_SET_NAME |
+---------------------------+--------------------+
| event | utf8 |
| general_log | utf8 |
| help_category | utf8 |
| help_keyword | utf8 |
| help_relation | utf8 |
| help_topic | utf8 |
| proc | utf8 |
| slow_log | utf8 |
| time_zone | utf8 |
| time_zone_leap_second | utf8 |
| time_zone_name | utf8 |
| time_zone_transition | utf8 |
| time_zone_transition_type | utf8 |
| func | utf8 |
+---------------------------+--------------------+
14 rows in set (0.00 sec)

@osheroff
Copy link
Collaborator

osheroff commented Nov 11, 2016

now we're getting somewhere; the table isn't there. What about:

SELECT TABLES.TABLE_NAME, TABLES.TABLE_COLLATION  FROM INFORMATION_SCHEMA.TABLES WHERE TABLES.TABLE_SCHEMA = 'mysql'

@iadmirezhe
Copy link
Author

Sorry for being late to reply.
Here is the output:
mysql> SELECT TABLES.TABLE_NAME, TABLES.TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES WHERE TABLES.TABLE_SCHEMA = 'mysql';
+---------------------------+-----------------+
| TABLE_NAME | TABLE_COLLATION |
+---------------------------+-----------------+
| event | utf8_general_ci |
| func | utf8_bin |
| general_log | utf8_general_ci |
| help_category | utf8_general_ci |
| help_keyword | utf8_general_ci |
| help_relation | utf8_general_ci |
| help_topic | utf8_general_ci |
| proc | utf8_general_ci |
| slow_log | utf8_general_ci |
| time_zone | utf8_general_ci |
| time_zone_leap_second | utf8_general_ci |
| time_zone_name | utf8_general_ci |
| time_zone_transition | utf8_general_ci |
| time_zone_transition_type | utf8_general_ci |
+---------------------------+-----------------+

I think RDS has the table to keep master\slave communication,but why the table isn't there? I'll send the issue to RDS provider.

osheroff pushed a commit that referenced this issue Nov 14, 2016
I guess it doesn't appear in INFORMATION_SCHEMA tables.

addresses #469.
@osheroff
Copy link
Collaborator

sounds good, does sound like an RDS problem to me. For now, have this pr: #476

@iadmirezhe
Copy link
Author

RDS provider's answer: ha_health_check table is RDS system table which is not visiable to other users. So it's RDS privilege problem. Thank you again.

osheroff pushed a commit that referenced this issue Nov 18, 2016
I guess it doesn't appear in INFORMATION_SCHEMA tables.

addresses #469.
@osheroff
Copy link
Collaborator

1.5.1 includes the fix

@iadmirezhe
Copy link
Author

thank you so much!

@osheroff
Copy link
Collaborator

you're welcome, thanks for the assist in finding out what was going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants