Tuesday, August 28, 2012



The problem

Sometime back a member of the MySQL support team informed us that the slaves are very sluggish, when he tried to run a delete query on the master. "What may be the problem?" we wondered. This is in RBR and its pretty usual that we have slower slaves when using RBR as compared to SBR. We requested for some more details and found that it is something else. It's a problem when a lot of rows are getting modified in a table without PRIMARY KEY.

"UPDATE t1 set a = a+1;" and other queries such as this are generally found to make slaves sluggish if the table does not have proper index. Is it a problem with the fact that a lot of data is being transferred over the network? After all the total amount of data that will be transferred over the network will be more than twice (before and after image + some header) the sizeof data that is modified. Could this be the problem? We decided to confirm this and found that the major chunk of the total time is taken while applying the event present in the relay-log.

Before going forward lets try to understand how the row events are applied on the slave.

A standard row event looks as follows.

+----------------------------+
|     Header: table id       |
|   column information etc.  |
+--------------+-------------+
| BEFORE IMAGE | AFTER IMAGE |
+--------------+-------------+
| BEFORE IMAGE | AFTER IMAGE |
+--------------+-------------+

As you can see here each row event has a set of rows which are modified. For each row we store the Before Image(BI) and the After Image(AI)[1]. when these events are read from the relay log  we do the following in case no primary key is present on the slave.

for each row in the event do
{
  search for the correct row to be modified using BI
  replace the  row in the table with the corresponding AI
}

In this finding the appropriate row is done by using a table scan which again has a linear complexity. There for this algorithm has now a complexity of O(n^2).

Pretty costly right? Yes it is and it is for this reason we decided to optimize this approach.

The Fix:

The fix we thought is pretty simple and yet elegant. We decided to the use in-memory Hash. We hash the rows in the event and generate key using the BI.  we store the BI as well in the hash to bail us out in case of collision, which is quite possible since the BIT and BLOB fields are not considered while creating the HASH KEY.

If we assume HASH() as the hashing function then the HASH contains:

|---KEY----||---structure containing ----|
            |-----the BI and the AI------| 
+-----------+--------------+-------------+
| HASH (BI) | BEFORE IMAGE | AFTER IMAGE |
+-----------+--------------+-------------+

We then iterate over each row in the table, hash it and check if the key exists in the hash. If so we apply the AI corresponding to the key in the HASH to the row in the table.

The above can be explained using this pseudo-code.

for each row in the event do
{
  hash the row. 
}

for each row in the table do
{
  key= hash the row;
  if (key is present in the hash)
  {
    apply the AI to the row.
  }
}

This can run in linear time since the search in the hash takes constant time.

Along with this another optimization has been done when the table has a non-unique index. We hash the rows in the events and store the distinct key set in a list. We then iterate over this key set and fetch the rows corresponding to this index value and iterate over them. This prevents us from iterating over all the rows in the table but just the rows with a given set of index values. For the further usage in this post we will use HASH OVER TABLE(Ht) for the previous scenario and HASH OVER INDEX(Hi) for this one.

This can be explained using this pseudo-code

for each row in the event do
{
  hash the row.
  store the key in a list of distinct key.
}

for each row corresponding key values in the key list do
{
  key= hash the row;
  if (key is present in the hash)
  {
    apply the AI to the row.
  }
}

How to use:

To use this start the slave  with

--slave-rows-search-algorithms= 
HASH_SCAN | TABLE_SCAN | INDEX_SCAN

or a combination of the three. You may also change the value of the variable @@slave_rows_search_algorithms on the running server as well.

For further information regarding the usage check the mysql developers' page

The following matrix shows how the algorithm is chosen for a given table.

KEYS AVAILABLE ON THE TABLE
    - PK  --> Primary Key
    - NNUK--> Not Null Unique Key
    - K   --> Key
SCAN ALGORITHMS 
    - I   --> Index scan / search
    - T   --> Table scan
    - Hi  --> Hash over index
    - Ht  --> Hash over the entire table

    +--------------+-----------+------+------+------+
    | Index\Option | I , T , H | I, T | I, H | T, H |
    +--------------+-----------+------+------+------+
    | PK / NNUK    | I         | I    | I    | Hi   |
    | K            | Hi        | I    | Hi   | Hi   |
    | No Index     | Ht        | T    | Ht   | Ht   |
    +--------------+-----------+------+------+------+

Pros and Cons:
As you must have figured out the fact the implementation of HASH_SCAN uses in memory hash and may cause the performance to degrade if the amount of memory available is low. Also if the average number of rows modified by transactions are less, the performance improvement may not be very high.

Performance:

We ran some two slaves with HASH_SCAN and TABLE_SCAN and found the results to be as expected.

The following graphs show the improvements in case of  Ht and Hi

1. Hash over index.
2. Hash over table.
Note: These runs were made as a comparative analysis with two slaves to a master on one machine and should not be used for bench-marking etc.

This feature is available for trail in the latest MySQL-5.6  community edition.


[1] We are assuming that --binlog-row-image=FULL. Even though if it is set to MINIMAL|NOBLOB it won't make a difference to the point i am trying to make.  See developers' page

Reactions:

About the author

Rohit Kalhans is a Software Development Engineer working with Oracle MySQL replication team and his area of focus revolves around Row-based replication and Multi-threaded parallel slave. His interests include system programming, highly-available and scalable systems. In his free time, he plays guitar and piano and loves to write short stories and poems. More Information can be found on his homepage.

0 comments :

Post a Comment