Skip to content

Subquery improvement for select_osc_chunk #422

@ggunson

Description

@ggunson

In the line https://github.com/github/gh-ost/blob/master/go/sql/builder.go#L270 there's a sql statement to get an id value, which I guess based on the function name is for the range of ids to copy. This produces a query like:

     select /* gh-ost `schemaname`.`tablename` iteration:12345 */ `id`
				from (
					select
							`id`
						from
							`schemaname`.`tablename`
						where ((`id` > 123456789)) and ((`id` < "987654321") or ((`id` = "987654321")))
						order by
							`id` asc
						limit 100
				) select_osc_chunk
			order by
				`id` desc
			limit 1

I was wondering if this could be rewritten to:

select id 
from schemaname.tablename
where ((`id` > 123456789)) and ((`id` < "987654321") or ((`id` = "987654321")))
order by id asc
limit 1 offset 99

The reason is that the query as-is creates a temporary table and does a filesort on it:

+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows    | Extra                    |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
|  1 | PRIMARY     | <derived2>  | ALL   | NULL          | NULL    | NULL    | NULL |     100 | Using filesort           |
|  2 | DERIVED     | tablename   | range | PRIMARY       | PRIMARY | 4       | NULL | 8425424 | Using where; Using index |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+

mysql> show session status like 'handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 99    |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 1     |
| Handler_read_rnd_next      | 101   |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 100   |
+----------------------------+-------+
18 rows in set (0.00 sec)

While my suggestion just does an index range scan:

+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows    | Extra                    |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+
|  1 | SIMPLE      | tablename   | range | PRIMARY       | PRIMARY | 4       | NULL | 8425428 | Using where; Using index |
+----+-------------+-------------+-------+---------------+---------+---------+------+---------+--------------------------+

mysql> show session status like 'handler%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 2     |
| Handler_mrr_init           | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1     |
| Handler_read_last          | 0     |
| Handler_read_next          | 99    |
| Handler_read_prev          | 0     |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
18 rows in set (0.00 sec)

My suggestion was also faster in my testing for chunks larger than 100 but I would appreciate other testing. I don't like the use of OFFSET normally, but this isn't pagination, and the query gets run millions of times for a large table so that's a lot of created temporary tables.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions