Tuesday, October 29, 2013

Two weeks back I decided to switch to Ubuntu 13.04. I was using 12.04 LTS version and everything was fine.  However I was seeing crash in bzr-gtk  upon bzr gcommit and hence I decided to upgrade. All was well until I saw the sys-tray icon of Network Manager crash upon switching connections due to  bug#1164631 and bug#1159063. I have to switch between connections often and this was frustrating.

The worst part of all this is that you cannot go online and find a solution to the issue since you are no longer connected to the internet.  So much for our dependency on the Internet! And you cannot restart your system if you have a huge test suite being run.

Skimming through the man pages I came across the old friend,   NMCLI

$  nmcli dev
DEVICE     TYPE              STATE        
ttyUSB0    cdma              disconnected    
wlan0      802-11-wireless   disconnected 
eth0       802-3-ethernet    unavailable 


Okay, So this was the epiphany I was looking for, Cool! So what's next.


$ nmcli con
NAME                      UUID                                   TYPE              TIMESTAMP-REAL                    
VPN connection 1          de9fd222-7a27-4e29-ad9e-9095f953e837   vpn               never                             
LAN                       c26d0089-0f41-422a-9997-8a9b832d7159   802-3-ethernet    Saturday 26 October 2013 07:23:48 PM IST
Tata Indicom (Photon+) connection b3c0f820-7521-4eb4-8507-8c931f44d691   cdma              Tuesday 29 October 2013 04:25:28 PM IST

To connect copy the uuid or the name of the connection you want to connect to  e.g. I wanted to connect to my cdma dongle and hence I used

$ nmcli con up uuid b3c0f820-7521-4eb4-8507-8c931f44d691

This is cool but I wanted to switch the connection from the systray. I used Python appIndicator3  to create an appindicator and added a menu with the list of all the available connections.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
from gi.repository import AppIndicator3 as appindicator

def getIndicator():                                                             
   indicator = appindicator.Indicator.new (                                     
                     "Network-switcher",                                        
                     "indicator-messages",                                      
                     appindicator.IndicatorCategory.APPLICATION_STATUS)         
   indicator.set_icon('/usr/share/icons/ubuntu-mono-dark/status/24/gnome-netstatus-0-24.svg')
   indicator.set_status (appindicator.IndicatorStatus.ACTIVE)                   
   indicator.set_attention_icon('/usr/share/icons/ubuntu-mono-dark/status/24/gnome-netstatus-75-100.svg')
   NSSignalHandler.registerSignalHandler(indicator)                             
   return indicator

Next was to register the appindcator to dbus notification interface.  This will ensure that we get notifications of network connect or disconnect,  whether from this appindicator or from outside.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
from gi.repository import Gdk                                                   
import dbus                                                                     
from dbus.mainloop.glib import DBusGMainLoop 

def registerSignalHandler(ind):                                                 
        global indicator                                                        
        indicator= ind                                                          
        dbus_loop = DBusGMainLoop()                                                 
        system_bus = dbus.SystemBus(mainloop=dbus_loop)                             
        system_bus.add_signal_receiver(connection_handler,                          
                dbus_interface="org.freedesktop.NetworkManager.Connection.Active",    
                signal_name="PropertiesChanged")                                    
        Gdk.threads_init()


 From here everything was downhill :)  I added a call back to check if the notification is a  connect or a disconnect.  You may get the active connection from


$ nmcli con status
NAME                      UUID                         launchpad          DEVICES    DEFAULT  VPN   MASTER-PATH                                 
Tata Indicom (Photon+) connection b3c0f820-7521-4eb4-8507-8c931f44d691   ttyUSB0    yes      no    --  

A screenshot of the systray icon.



It is a simple python script and is available  on launchpad.



Saturday, September 21, 2013

MySQL 5.7.2 features enhanced Multi-threaded slave which can be used to apply transactions in parallel even within a single database. Internal details of its working can be found in an earlier post. In this  post we will see how we can configure our replication slave to use this enhancement.

MySQL 5.7.2 has a new system variable  --slave-parallel-type which is dynamic. It can be set to the following values:

1. DATABASE  : (Default) Use the db partitioned MTS (1 worker per database)
2. LOGICAL_CLOCK:  Use logical clock based parallelization mode.

Apart from this the original option of --slave-parallel-workers=N is still valid and it sets that number of workers that we need to spawn. Also since the slave leverages the group of transactions that have committed in parallel on the slave, it makes sense to leave --binlog-max-flush-queue-time=0 which is the default value intact, on the master. This will ensure that the leader thread on the master flushes all the transactions queued in the FLUSH QUEUE of binlog group commit without getting timed out, thereby delivering maximum  parallelization on the slave.

Finally to summarize the steps to set up the enhanced MTS

ON MASTER:
1. start master with --binlog-max-flush-queue-time=0

ON SLAVE:
1.a. Start slave server with --slave-parallel-type=LOGICAL_CLOCK --slave-parallel-workers=N

Or alternatively,

1.b Start the slave server normally. Change the MTS options dynamically using
the following

mysql: STOP SLAVE: --if the slave is running
mysql: SET GLOBAL SLAVE_PARALLEL_TYPE='LOGICAL_CLOCK';
mysql: SET GLOBAL SLAVE_PARALLEL_WORKER=N;
mysql: START SLAVE:


A small Demo:

1. We created 5 tables in a single test database on master and used 5 clients to do inserts on them, in parallel.
2. The slave was configured as --slave-parallel-type="logical_clock"  and --slave-parallel-workers=5.
3. We let the slave replicate from the master and we checked the status of the workers by using
    performance schema tables for replication and show processlist command

Here is the sample output on the slave (click on the image to zoom)

When to use enhanced MTS

Since the slave uses the parallelization information from the master, it performs best when there are multiple clients on the master and there are multiple transactions committing at the same time. In case the master is underloaded, spawning multiple threads may not have effect on the slave performance, and may even lead to performance degradation.

Conclusion

This enhancement is available in MySQL 5.7.2 which can be downloaded from the MySQL download page. So try it out and let us know your valuable feedback.

Introduction

Re-applying binary logs generated from highly concurrent master on the slave has always been an area of focus. It is important for various reasons. First, in real-time systems, it becomes extremely important for the slave to keep up with the master. This can only be guaranteed if the slaves’ performance in reapplying the transactions from the binary log is similar (or at-least comparable) to that of master, which is accepting queries directly from multiple clients. Second, in synchronous replication scenarios, having a fast slaves, aids in reducing the response times as seen by the clients to the master. This can be made possible by applying transactions from the binary log in parallel. However if left uncontrolled, a simple round-robin multi-threaded applying will lead to inconsistency and the slave will no longer be the exact replica of the leader.

The infamous out of order commit problem

The Out of order execution of transaction on the slave if left uncontrolled will lead to the slave diverging from the master. Here is an example: consider two transactions T1 and T2 being applied on an initial state.

On Master we apply T1 and T2 in that order.
State0: x= 1, y= 1
T1: { x:= Read(y);
          x:= x+1;
          Write(x);
          Commit; }
State1: x= 2, y= 1

T2: { y:= Read(x);
          y:=y+1;
          Write(y);
          Commit; }
State2: x= 2, y= 3

On the slave however these two transactions commit out of order (Say T2 and then T1).
State0: x= 1, y= 1
T2: { y:= Read(x);
          y:= y+1;
          Write(y);
          Commit; }
State1: x= 1, y= 2

T1: { x:= Read(y);
          x:=x+1;
          Write(x);
          Commit; }
State2: x= 3, y= 2


As we see above the final state state 2 is different in the two cases. Needless to say that we need to control the transactions that can execute in parallel.

Controlled parallelization

The above problem can be solved by controlling what transactions can be executed in parallel with the ones being executed by the slave. This means we need to have some kind of information in the transactions themselves. Interesting to note that we can use the information of parallelization from the master on the slave. Since we have multiple transactions committing at the same time on the master, we can store the information of the transactions that were in the "process of committing" when this transaction committed. Now let's define the phrase "process of committing".

The process of committing: On the slave we need to make sure that the transactions that we schedule for parallel execution will be the one which do not have conflicting read and write set. This is the only and the necessary requirement for the slave  workers to work without conflicts. This also implies that if the transactions being executed in parallel do not have intersecting read and write sets, we don't care if they are committed out of order. Since MySQL uses lock based scheduling, all the transactions that have entered the prepared stage but not as yet committed will have disjoint read and write sets and hence can be executed in parallel.


Logical clock and commit parent

We have introduced a logical clock. Now before I am tackled by a mathematician from one side and a computer engineer from the other, let me explain. It is a simple counter which is stepped when a binlog group of transaction commits on the master. Essentially this clock is stepped every time the leader execute the flush stage of binlog group commit. The value of this clock is recorded on each transaction when it enters the prepare stage. This recorded value is the "commit parent"

The pseudo code is as follows.

During Prepare
trx.commit_parent= commit_clock.get_timestamp();

During Commit
for every binlog group
  commit_clock.step();

As it is evident by now the transactions with the same commit parent follow our guiding principle of slave side parallelization i.e. transactions that have entered the prepared stage but has not as yet committed, and hence can be executed in parallel.

Schematics of inlog prepare stage and commit parent
In the example we will take up three transactions (T1 T2 T3), two of which have been committed as a part of the same binlog group. T1 enters the prepare stage and get the commit parent as 0 since none of the group have been committed as yet. T1 assigns itself as the leader and then goes on to flush its transaction/statement cache. In the meanwhile transaction T2 enters the prepare stage. It is also assigned the same commit parent "0"(CP as used in the figure) since the commit clock has not as yet been stepped. T2 then goes on a wait for the leader to flush its cache in to the binlog. After the flush has been completed by the leader, it signals T2 to continue and both of them enter the Sync stage, where the leader thread  calls fsync() there by finishing the binlog commit process. The  transaction T3 however enters the prepare stage after the previous group has been synced and there-by ends up getting the next CP.

Another thing to note here is that the "group" of transactions that are being executed in parallel are not bounded by binlog commit group. There is a possibility that a transaction have entered the binlog prepare stage but could not make it to the current binlog group. Our approach takes care of such cases and makes sure that we relax the boundary of the group being executed in parallel on the slave.

On the slave we use the existing infrastructure of DB partitioned MTS to execute the tranactions in parallel, simply by modifying the scheduling logic.

Conclusion

This feature provides the great enhancement to the existing MySQL replication. To know more about the configuration options of this enhancement refer to this post.
This feature is available in MySQL 5.7.2 release. You can try it out and let us know the feedback.

Monday, May 6, 2013


Introduction

MySQL replication slave features  a powerful capability of ignoring conflicts like duplicate key error, key not found errors etc. while applying row events.  This is exceptionally useful while doing row based replication(RBR) from the master when the slave already contains some data which may conflict with the data coming from the master. In MySQL 5.7 we extend this capability while applying row events from mysql-binlog files using mysqlbinlog tool.  This enhancement will prevent such errors from aborting mysql client in case of conflicts like the ones mentioned above.

Rationale

Prior to MySQL 5.7 we have been using the mysqlbinlog tool as follows.

shell$> mysqlbinlog master.000001|mysql -uroot -hexample.com -ps3cret  

This allows us to pipe the output of mysqlbinlog to mysql, and works as long as the row events from the mysqlbinlog do not conflict with the data already present on the mysql server on example.com. However in case of conflicts, the mysql client aborts and the we have to restart the whole process again skipping the event and starting from the next event using the start-position option.  This is a problem in case of multiple conflicts.

How to use

The enhancement is pretty simple to use. mysqlbinlog in MySQL 5.7.0 provides command line options to enable this feature.

--idempotent, -i 

To use this from the command line  simply use the short option

shell$> mysqlbinlog master.000001 -i | mysql -uroot -hexample.com -ps3cret  

or use full option
shell$> mysqlbinlog master.000001 --idempotent | mysql -uroot -hexample.com -ps3cret  

How does it work

In MySQL server version  5.7.1 we have introduced a new session system variable as rbr_exec_mode which can be set to STRICT or IDEMPOTENT. When set to IDEMPOTENT, the server does not throw out any conflict errors for that particular session.

When executed with -i or --idempotent  option the mysql binlog writes
SET SESSION RBR_EXEC_MODE=IDEMPOTENT;
to the output file or to the stdout. When piped into MySQL client this directs the server to use the idempotent mode while applying row events.

This enhancement is available in MySQL 5.7.1 which can be downloaded from the MySQL download page. So go try it out and let us know your valuable feedback using the comment section.

Monday, October 1, 2012

With MySQL 5.6 coming closer to its release, I got a chance to sneak a look into the Row Based Replication (RBR). 5.6 release of MySQL will be a replication features packed one, and it is for this reason I write  another post on the new enhancements in Row Based Replication(RBR). RBR provides a safe way of replicating between master and slave and thats why RBR enhancements become even more important. RBR in 5.6 is far more optimized than what it was before and we hope it will be fun to deploy, maintain and use.

Folowing are some interesting enhancements in RBR in MySQL 5.6 Replication.

1. Optimized row image:

Large data transfer between master and slave while using RBR is a thing of past. Why log complete rows on the master? Why not just the changed colums?  The MySQL 5.6 release enables a DBA to configure the master to log only the changed columns in the row.  The Before Image (BI) will contain the columns for identifying the rows and the After Image (AI) will just contain the column that has changed.
  

Before MySQL 5.6: 

As one can observe in the illustration that the row images were full, i.e. all the columns in the row were logged for both the Before Image(BI) and the After Image AI

In MySQL 5.6:

In MySQL 5.6 the DBA can configure if full or minimal or NOBLOB image of rows should be logged in the binary log in the following ways.

1. start the server with --binlog-row-image=MINIMAL|FULL|NOBLOB
2. SET SESSION|GLOBAL binlog_row_image= MINIMAL|FULL|NOBLOB

In minimal image if there is a Primary Key (PK) or any kind of Unique Index/Key (UK) that can identify a row uniquely on the slave then the BI contains just that column. In the After Image (AI) only the changed columns are logged (See illustration below). 


In case of NOBLOB is used, the BLOB fields will be skipped from logging unless changed. In case  FULL is used the logging will continue as before.

As you can see the amount of data transfer will be reduced by a lot especially if you have used proper indexes/keys for the tables on the master and the slave. A detailed post of this enhancement be found here.

2. Batch operations on table without PK or UK:

I have already mentioned the details of this enhancement in a previous post. This is one of the long awaited optimization when using RBR. This enables the DBA to configure what algorithm will be used  on the slave to find the appropriate row for updating/deleting. We have introduced the concept of HASH_SCAN, in which an in-memory hash is being created on the slave, and provides a way to find and apply rows in a row event in O(n) time.  This enhancement provides a great scope for performance improvement on the slave for the events logged in row format.

3. Informational Log events: 

One of the difficulties DBA have faced till now, while using RBR is that, it is difficult to maintain. The Row images are binary in nature and is not suitable for human "consumption". This is where this enhancement comes handy. There are two ways of using this.

    1. Enabling switch --binlog-rows-query-log-events during the server startup
    2. Changing the corresponding session variable.

Once enabled this will cause the original query for the row event to be logged in the binary log as a special event which the DBA can check using one of the two  methods.

1. One can check these special events in the binary-log by using SHOW BINLOG EVENTS. The query corresponding the the row event will be logged as an event type: Row_query. below is one such example

mysql> SHOW BINLOG EVENTS;
+---------------+-----+-------------+-------+--------+-----------------------------------------------+
| Log_name      | Pos | Event_type  | S..id | End... | Info                                          |
+---------------+-----+-------------+-------+--------+-----------------------------------------------+
| master.000001 |   4 | Format_desc |     1 |    114 | Server ver: 5.6.7-m5-debug-log, Binlog ver: 4 |
| master.000001 | 114 | Query       |     1 |    200 | use `test`; CREATE TABLE NUM_INTS (a INT)     |
| master.000001 | 200 | Query       |     1 |    268 | BEGIN                                         |
| master.000001 | 268 | Rows_query  |     1 |    323 | # INSERT INTO NUM_INTS VALUES (1), (2), (3)   |
| master.000001 | 323 | Table_map   |     1 |    364 | table_id: 54 (test.NUM_INTS)                  |
| master.000001 | 364 | Write_rows  |     1 |    408 | table_id: 54 flags: STMT_END_F                |
| master.000001 | 408 | Query       |     1 |    477 | COMMIT                                        |
+---------------+-----+-------------+-------+--------+-----------------------------------------------+

In the above terminal output (snip-ed), one can see the event at pos= 268 is an informational event for the next Table_map and Write_rows events. 


2. Another way is to use the mysqlbinlog utility with  --vv switch (verbosity level 2 or more).

shell> mysqlbinlog -vv /master.000001
# at 268
#110401 14:24:29 server id 1  end_log_pos 323   Rows_query
# INSERT INTO t1 VALUES (1), (2), (3)
# at 323
#110401 14:24:29 server id 1  end_log_pos 364   Table_map: `test`.`NUM_INTS` mapped to number 54
# at 364
#110401 14:24:29 server id 1  end_log_pos 408   Write_rows: table id 54 flags:STMT_END_F

Conclusion: 

So we see how MySQL-5.6 Replication will provide a lot of features that enables a better, faster, reliable slave that is easy to maintain.  So why not just go try it out?

Tuesday, August 28, 2012



The problem

Sometime back a member of the MySQL support team informed us that the slaves are very sluggish, when he tried to run a delete query on the master. "What may be the problem?" we wondered. This is in RBR and its pretty usual that we have slower slaves when using RBR as compared to SBR. We requested for some more details and found that it is something else. It's a problem when a lot of rows are getting modified in a table without PRIMARY KEY.

"UPDATE t1 set a = a+1;" and other queries such as this are generally found to make slaves sluggish if the table does not have proper index. Is it a problem with the fact that a lot of data is being transferred over the network? After all the total amount of data that will be transferred over the network will be more than twice (before and after image + some header) the sizeof data that is modified. Could this be the problem? We decided to confirm this and found that the major chunk of the total time is taken while applying the event present in the relay-log.

Before going forward lets try to understand how the row events are applied on the slave.

A standard row event looks as follows.

+----------------------------+
|     Header: table id       |
|   column information etc.  |
+--------------+-------------+
| BEFORE IMAGE | AFTER IMAGE |
+--------------+-------------+
| BEFORE IMAGE | AFTER IMAGE |
+--------------+-------------+

As you can see here each row event has a set of rows which are modified. For each row we store the Before Image(BI) and the After Image(AI)[1]. when these events are read from the relay log  we do the following in case no primary key is present on the slave.

for each row in the event do
{
  search for the correct row to be modified using BI
  replace the  row in the table with the corresponding AI
}

In this finding the appropriate row is done by using a table scan which again has a linear complexity. There for this algorithm has now a complexity of O(n^2).

Pretty costly right? Yes it is and it is for this reason we decided to optimize this approach.

The Fix:

The fix we thought is pretty simple and yet elegant. We decided to the use in-memory Hash. We hash the rows in the event and generate key using the BI.  we store the BI as well in the hash to bail us out in case of collision, which is quite possible since the BIT and BLOB fields are not considered while creating the HASH KEY.

If we assume HASH() as the hashing function then the HASH contains:

|---KEY----||---structure containing ----|
            |-----the BI and the AI------| 
+-----------+--------------+-------------+
| HASH (BI) | BEFORE IMAGE | AFTER IMAGE |
+-----------+--------------+-------------+

We then iterate over each row in the table, hash it and check if the key exists in the hash. If so we apply the AI corresponding to the key in the HASH to the row in the table.

The above can be explained using this pseudo-code.

for each row in the event do
{
  hash the row. 
}

for each row in the table do
{
  key= hash the row;
  if (key is present in the hash)
  {
    apply the AI to the row.
  }
}

This can run in linear time since the search in the hash takes constant time.

Along with this another optimization has been done when the table has a non-unique index. We hash the rows in the events and store the distinct key set in a list. We then iterate over this key set and fetch the rows corresponding to this index value and iterate over them. This prevents us from iterating over all the rows in the table but just the rows with a given set of index values. For the further usage in this post we will use HASH OVER TABLE(Ht) for the previous scenario and HASH OVER INDEX(Hi) for this one.

This can be explained using this pseudo-code

for each row in the event do
{
  hash the row.
  store the key in a list of distinct key.
}

for each row corresponding key values in the key list do
{
  key= hash the row;
  if (key is present in the hash)
  {
    apply the AI to the row.
  }
}

How to use:

To use this start the slave  with

--slave-rows-search-algorithms= 
HASH_SCAN | TABLE_SCAN | INDEX_SCAN

or a combination of the three. You may also change the value of the variable @@slave_rows_search_algorithms on the running server as well.

For further information regarding the usage check the mysql developers' page

The following matrix shows how the algorithm is chosen for a given table.

KEYS AVAILABLE ON THE TABLE
    - PK  --> Primary Key
    - NNUK--> Not Null Unique Key
    - K   --> Key
SCAN ALGORITHMS 
    - I   --> Index scan / search
    - T   --> Table scan
    - Hi  --> Hash over index
    - Ht  --> Hash over the entire table

    +--------------+-----------+------+------+------+
    | Index\Option | I , T , H | I, T | I, H | T, H |
    +--------------+-----------+------+------+------+
    | PK / NNUK    | I         | I    | I    | Hi   |
    | K            | Hi        | I    | Hi   | Hi   |
    | No Index     | Ht        | T    | Ht   | Ht   |
    +--------------+-----------+------+------+------+

Pros and Cons:
As you must have figured out the fact the implementation of HASH_SCAN uses in memory hash and may cause the performance to degrade if the amount of memory available is low. Also if the average number of rows modified by transactions are less, the performance improvement may not be very high.

Performance:

We ran some two slaves with HASH_SCAN and TABLE_SCAN and found the results to be as expected.

The following graphs show the improvements in case of  Ht and Hi

1. Hash over index.
2. Hash over table.
Note: These runs were made as a comparative analysis with two slaves to a master on one machine and should not be used for bench-marking etc.

This feature is available for trail in the latest MySQL-5.6  community edition.


[1] We are assuming that --binlog-row-image=FULL. Even though if it is set to MINIMAL|NOBLOB it won't make a difference to the point i am trying to make.  See developers' page