Saturday, June 18, 2016

Query rewrite with ProxySQL : use case scenario

One of the interesting features of ProxySQL is the support of query rewrite.
In fact, this functionality was one of the main motivations for writing a proxy that sits between the application and the database server to rewrite queries on the fly, without the need to have a developer involved in rewriting the query ASAP (if ever possible!).


In this short post I will describe a use case scenario on how to use ProxySQL to troubleshoot performance and quickly rewrite queries if needed.
The setup in this specific blog post is Nylas platform, a sharded environment where there are over 80 ProxySQL instances running.

After noticing some odd workload originated from Nylas' open source sync engine , as per the attached graph on InnoDB rows read:


We could ask ProxySQL to get the list of the most time consuming queries:


Or even ask which is the query with the highest average execution time:



Note that we could have get the same information from information_schema.events_statements_summary_by_digest , but we already knew that the load was coming from a specific node so we queried stats.stats_mysql_query_digest inside ProxySQL itself.

From the output above, it was obvious that the problem was with a specific query type running at that time.

Without copying again the whole query, the query could be simplified as:

SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 IN (? , ?)
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ? 


col2 and col3 are not indexes because do not provide enough cardinality, while idx_col1 does, so it is indexed.
Because of the WHERE clauses and the ORDER BY , mysqld processes it performing a range scan on the primary key.
The optimizer could be smart enough to optimize this query, but it is not.

The query can be rewritten as:

SELECT * FROM (
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
UNION ALL
(SELECT list_of_columns
    FROM tablename
WHERE
    PK >= ? 
    AND idx_col1 = v1
    AND col2 = ?
    AND col3 = ?
    ORDER BY PK ASC
LIMIT ?)
) t ORDER BY PK LIMIT ?

How to do this with ProxySQL? Extremely easy!
The snippet below shows how to create a rule to rewrite the query, load the new rule into runtime, and persist the new rule on disk:



From the following graph of InnoDB rows read it is immediately clear the effect of the new query rewrite:



In conclusion: query rewrite is a feature that, depending from the queries running against the database server, quickly allows to isolate and correct problematic queries and improve performance.