-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
Description
In the line https://github.com/github/gh-ost/blob/master/go/sql/builder.go#L270 there's a sql statement to get an id value, which I guess based on the function name is for the range of ids to copy. This produces a query like:
select /* gh-ost `schemaname`.`tablename` iteration:12345 */ `id`
from (
select
`id`
from
`schemaname`.`tablename`
where ((`id` > 123456789)) and ((`id` < "987654321") or ((`id` = "987654321")))
order by
`id` asc
limit 100
) select_osc_chunk
order by
`id` desc
limit 1I was wondering if this could be rewritten to:
select id
from schemaname.tablename
where ((`id` > 123456789)) and ((`id` < "987654321") or ((`id` = "987654321")))
order by id asc
limit 1 offset 99The reason is that the query as-is creates a temporary table and does a filesort on it:
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 100 | Using filesort |
| 2 | DERIVED | tablename | range | PRIMARY | PRIMARY | 4 | NULL | 8425424 | Using where; Using index |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
mysql> show session status like 'handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 2 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 99 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 1 |
| Handler_read_rnd_next | 101 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 100 |
+----------------------------+-------+
18 rows in set (0.00 sec)
While my suggestion just does an index range scan:
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | SIMPLE | tablename | range | PRIMARY | PRIMARY | 4 | NULL | 8425428 | Using where; Using index |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
mysql> show session status like 'handler%';
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_commit | 1 |
| Handler_delete | 0 |
| Handler_discover | 0 |
| Handler_external_lock | 2 |
| Handler_mrr_init | 0 |
| Handler_prepare | 0 |
| Handler_read_first | 0 |
| Handler_read_key | 1 |
| Handler_read_last | 0 |
| Handler_read_next | 99 |
| Handler_read_prev | 0 |
| Handler_read_rnd | 0 |
| Handler_read_rnd_next | 0 |
| Handler_rollback | 0 |
| Handler_savepoint | 0 |
| Handler_savepoint_rollback | 0 |
| Handler_update | 0 |
| Handler_write | 0 |
+----------------------------+-------+
18 rows in set (0.00 sec)
My suggestion was also faster in my testing for chunks larger than 100 but I would appreciate other testing. I don't like the use of OFFSET normally, but this isn't pagination, and the query gets run millions of times for a large table so that's a lot of created temporary tables.
Reactions are currently unavailable