Saturday, April 19, 2014

Ever wonder how we can write simulations simulating real world situations? Consider a simulation of molecules of hydrogen gas in a cylinder. To start the simulation there has to be some initial random movement of the molecules, otherwise there will be no movement at all. Also to determine realistic trajectories we need to introduce some random noise to simulate the effects of external forces to the cylinder. The question is simple. How do we do something randomly, on a computer? We need random numbers. As easy as it may seem it is not at all easy to generate random numbers. Computers are not human beings, they don’t do random. Also modern computers are not analog, they cannot simply produce random digits out of nowhere. Initial computers use random noise in their resistor circuits to generate random numbers. But as we moved to the digital computer systems this problem became bigger and bigger. There are multiple algorithms to find random number we need one which is fast and requires least resources. 
A wonderful comic strip by xkcd shows an elegant solution.

Sadly this solution is flawed for obvious reasons.  In this post I will try to explore some authentic ways of generating random numbers.

What are Random Numbers?

As the name suggests random numbers are random in nature. It mean that given a sufficiently huge list of such numbers there is no real pattern of their occurrence. Also given the nth random number one should not be able to predict the mth random number where m > n.  Furthermore the list should be consistent with some kind of distribution. In general unless specified the distribution is linear which means that if the upper and the lower bound of the numbers are known and there are only N possible random numbers possible between the bounds(this can be  attributed to the limitation of the precision of a machine), then probability of each number occurring in the list is

Do we really need Random Numbers?

The answer is No. For most practical purposes it is sufficient to find a pseudo-random numbers or quasi-random numbers. These number sequences are sometime also known as “apparently random sequence” since they seem random. In this post we see how we can generate seemingly random or quasi-random numbers.

Von Nuemann's Method

This method is inherently flawed but was developed for early machines, with limited computational capabilities. Suppose we have a 3 digit number and we need to find the next 3 digit number which is random.
Consider a starting number as 112. We then square this number and take the middle 3 digits.

This means the next number is 254.  If we keep doing it over and over again, we will build ourselves a list of pseudo-random/quasi-random numbers.

The problem with this approach is that this sequence is often affected by the kind of digits that are generated and digits like zeros and ones often stop the progression. This method is therefore not practical.

The Linear Congruential Method

This method is also known as Lehmer’s method after D.H. Lehmer. In this method we choose 4 magic numbers. m, a, c, X. Here 0 and m are the lower and upper bounds respectively for a, c, X and m > 0. Once we have these four numbers we define the sequence as 

If you choose a = c = X = 3 and m = 10 we have the following progression.

3, 2, 9, 0, 3, 2, 9, 0 …

This means that we have a sequence but the period of reoccurrence of repeated pattern is very small.  For increasing the length of we can simply choose a really big m. We will discuss this a little later. Next we will try to look at some trivial values of a and c. c = 0 is an interesting sequence but it will give you a seemingly random sequence for a big value of m. a= 0 and a= 1 however cannot be used since the progression will no longer be random.  This means for all practical purposes a >= 2.

Next we will try to find the (n+k)th term of the sequence once the nth term is given.  Considering
 b= a-1 ( b >= 1 since a >= 2) we have


Then according to above equation 

How to choose the modulo?

In general if we choose the modulus as 2w where w= word size of the machine we are good.  Since this means two things, the range of the number is sufficiently big to suite our practical needs and it is easy to calculate the modulus since the result of the modulo is usually the lower half of the result during multiplication.

How to implement rand() function?

The random number generator in C/C++ (or any other language for that matter) takes a seed as an input and gives you the a random number between 0 and 1.  This can be achieved by using the random seed to get the value of a, c, X0 and then using the above formula to generate X1. Since ‘m’ is fixed for us the output will always be less than m, we can always return

which will lie between 0 and 1.

Analysis of distribution

To analyze how the random numbers generated by this method are distributed I made a small implementation and generated 1000 numbers between 0-99. The distribution came out to be "essentially linear".


To conclude this method is easy to implement and generating a random number will be a constant time operation, and essentially amazing for all practical purposes.

Tuesday, October 29, 2013

Two weeks back I decided to switch to Ubuntu 13.04. I was using 12.04 LTS version and everything was fine.  However I was seeing crash in bzr-gtk  upon bzr gcommit and hence I decided to upgrade. All was well until I saw the sys-tray icon of Network Manager crash upon switching connections due to  bug#1164631 and bug#1159063. I have to switch between connections often and this was frustrating.

The worst part of all this is that you cannot go online and find a solution to the issue since you are no longer connected to the internet.  So much for our dependency on the Internet! And you cannot restart your system if you have a huge test suite being run.

Skimming through the man pages I came across the old friend,   NMCLI

$  nmcli dev
DEVICE     TYPE              STATE        
ttyUSB0    cdma              disconnected    
wlan0      802-11-wireless   disconnected 
eth0       802-3-ethernet    unavailable 

Okay, So this was the epiphany I was looking for, Cool! So what's next.

$ nmcli con
NAME                      UUID                                   TYPE              TIMESTAMP-REAL                    
VPN connection 1          de9fd222-7a27-4e29-ad9e-9095f953e837   vpn               never                             
LAN                       c26d0089-0f41-422a-9997-8a9b832d7159   802-3-ethernet    Saturday 26 October 2013 07:23:48 PM IST
Tata Indicom (Photon+) connection b3c0f820-7521-4eb4-8507-8c931f44d691   cdma              Tuesday 29 October 2013 04:25:28 PM IST

To connect copy the uuid or the name of the connection you want to connect to  e.g. I wanted to connect to my cdma dongle and hence I used

$ nmcli con up uuid b3c0f820-7521-4eb4-8507-8c931f44d691

This is cool but I wanted to switch the connection from the systray. I used Python appIndicator3  to create an appindicator and added a menu with the list of all the available connections.

from gi.repository import AppIndicator3 as appindicator

def getIndicator():                                                             
   indicator = (                                     
   indicator.set_status (appindicator.IndicatorStatus.ACTIVE)                   
   return indicator

Next was to register the appindcator to dbus notification interface.  This will ensure that we get notifications of network connect or disconnect,  whether from this appindicator or from outside.

from gi.repository import Gdk                                                   
import dbus                                                                     
from dbus.mainloop.glib import DBusGMainLoop 

def registerSignalHandler(ind):                                                 
        global indicator                                                        
        indicator= ind                                                          
        dbus_loop = DBusGMainLoop()                                                 
        system_bus = dbus.SystemBus(mainloop=dbus_loop)                             

 From here everything was downhill :)  I added a call back to check if the notification is a  connect or a disconnect.  You may get the active connection from

$ nmcli con status
NAME                      UUID                         launchpad          DEVICES    DEFAULT  VPN   MASTER-PATH                                 
Tata Indicom (Photon+) connection b3c0f820-7521-4eb4-8507-8c931f44d691   ttyUSB0    yes      no    --  

A screenshot of the systray icon.

It is a simple python script and is available  on launchpad.

Saturday, September 21, 2013

MySQL 5.7.2 features enhanced Multi-threaded slave which can be used to apply transactions in parallel even within a single database. Internal details of its working can be found in an earlier post. In this  post we will see how we can configure our replication slave to use this enhancement.

MySQL 5.7.2 has a new system variable  --slave-parallel-type which is dynamic. It can be set to the following values:

1. DATABASE  : (Default) Use the db partitioned MTS (1 worker per database)
2. LOGICAL_CLOCK:  Use logical clock based parallelization mode.

Apart from this the original option of --slave-parallel-workers=N is still valid and it sets that number of workers that we need to spawn. Also since the slave leverages the group of transactions that have committed in parallel on the slave, it makes sense to leave --binlog-max-flush-queue-time=0 which is the default value intact, on the master. This will ensure that the leader thread on the master flushes all the transactions queued in the FLUSH QUEUE of binlog group commit without getting timed out, thereby delivering maximum  parallelization on the slave.

Finally to summarize the steps to set up the enhanced MTS

1. start master with --binlog-max-flush-queue-time=0

1.a. Start slave server with --slave-parallel-type=LOGICAL_CLOCK --slave-parallel-workers=N

Or alternatively,

1.b Start the slave server normally. Change the MTS options dynamically using
the following

mysql: STOP SLAVE: --if the slave is running

A small Demo:

1. We created 5 tables in a single test database on master and used 5 clients to do inserts on them, in parallel.
2. The slave was configured as --slave-parallel-type="logical_clock"  and --slave-parallel-workers=5.
3. We let the slave replicate from the master and we checked the status of the workers by using
    performance schema tables for replication and show processlist command

Here is the sample output on the slave (click on the image to zoom)

When to use enhanced MTS

Since the slave uses the parallelization information from the master, it performs best when there are multiple clients on the master and there are multiple transactions committing at the same time. In case the master is underloaded, spawning multiple threads may not have effect on the slave performance, and may even lead to performance degradation.


This enhancement is available in MySQL 5.7.2 which can be downloaded from the MySQL download page. So try it out and let us know your valuable feedback.


Re-applying binary logs generated from highly concurrent master on the slave has always been an area of focus. It is important for various reasons. First, in real-time systems, it becomes extremely important for the slave to keep up with the master. This can only be guaranteed if the slaves’ performance in reapplying the transactions from the binary log is similar (or at-least comparable) to that of master, which is accepting queries directly from multiple clients. Second, in synchronous replication scenarios, having a fast slaves, aids in reducing the response times as seen by the clients to the master. This can be made possible by applying transactions from the binary log in parallel. However if left uncontrolled, a simple round-robin multi-threaded applying will lead to inconsistency and the slave will no longer be the exact replica of the leader.

The infamous out of order commit problem

The Out of order execution of transaction on the slave if left uncontrolled will lead to the slave diverging from the master. Here is an example: consider two transactions T1 and T2 being applied on an initial state.

On Master we apply T1 and T2 in that order.
State0: x= 1, y= 1
T1: { x:= Read(y);
          x:= x+1;
          Commit; }
State1: x= 2, y= 1

T2: { y:= Read(x);
          Commit; }
State2: x= 2, y= 3

On the slave however these two transactions commit out of order (Say T2 and then T1).
State0: x= 1, y= 1
T2: { y:= Read(x);
          y:= y+1;
          Commit; }
State1: x= 1, y= 2

T1: { x:= Read(y);
          Commit; }
State2: x= 3, y= 2

As we see above the final state state 2 is different in the two cases. Needless to say that we need to control the transactions that can execute in parallel.

Controlled parallelization

The above problem can be solved by controlling what transactions can be executed in parallel with the ones being executed by the slave. This means we need to have some kind of information in the transactions themselves. Interesting to note that we can use the information of parallelization from the master on the slave. Since we have multiple transactions committing at the same time on the master, we can store the information of the transactions that were in the "process of committing" when this transaction committed. Now let's define the phrase "process of committing".

The process of committing: On the slave we need to make sure that the transactions that we schedule for parallel execution will be the one which do not have conflicting read and write set. This is the only and the necessary requirement for the slave  workers to work without conflicts. This also implies that if the transactions being executed in parallel do not have intersecting read and write sets, we don't care if they are committed out of order. Since MySQL uses lock based scheduling, all the transactions that have entered the prepared stage but not as yet committed will have disjoint read and write sets and hence can be executed in parallel.

Logical clock and commit parent

We have introduced a logical clock. Now before I am tackled by a mathematician from one side and a computer engineer from the other, let me explain. It is a simple counter which is stepped when a binlog group of transaction commits on the master. Essentially this clock is stepped every time the leader execute the flush stage of binlog group commit. The value of this clock is recorded on each transaction when it enters the prepare stage. This recorded value is the "commit parent"

The pseudo code is as follows.

During Prepare
trx.commit_parent= commit_clock.get_timestamp();

During Commit
for every binlog group

As it is evident by now the transactions with the same commit parent follow our guiding principle of slave side parallelization i.e. transactions that have entered the prepared stage but has not as yet committed, and hence can be executed in parallel.

Schematics of inlog prepare stage and commit parent
In the example we will take up three transactions (T1 T2 T3), two of which have been committed as a part of the same binlog group. T1 enters the prepare stage and get the commit parent as 0 since none of the group have been committed as yet. T1 assigns itself as the leader and then goes on to flush its transaction/statement cache. In the meanwhile transaction T2 enters the prepare stage. It is also assigned the same commit parent "0"(CP as used in the figure) since the commit clock has not as yet been stepped. T2 then goes on a wait for the leader to flush its cache in to the binlog. After the flush has been completed by the leader, it signals T2 to continue and both of them enter the Sync stage, where the leader thread  calls fsync() there by finishing the binlog commit process. The  transaction T3 however enters the prepare stage after the previous group has been synced and there-by ends up getting the next CP.

Another thing to note here is that the "group" of transactions that are being executed in parallel are not bounded by binlog commit group. There is a possibility that a transaction have entered the binlog prepare stage but could not make it to the current binlog group. Our approach takes care of such cases and makes sure that we relax the boundary of the group being executed in parallel on the slave.

On the slave we use the existing infrastructure of DB partitioned MTS to execute the tranactions in parallel, simply by modifying the scheduling logic.


This feature provides the great enhancement to the existing MySQL replication. To know more about the configuration options of this enhancement refer to this post.
This feature is available in MySQL 5.7.2 release. You can try it out and let us know the feedback.

Monday, May 6, 2013


MySQL replication slave features  a powerful capability of ignoring conflicts like duplicate key error, key not found errors etc. while applying row events.  This is exceptionally useful while doing row based replication(RBR) from the master when the slave already contains some data which may conflict with the data coming from the master. In MySQL 5.7 we extend this capability while applying row events from mysql-binlog files using mysqlbinlog tool.  This enhancement will prevent such errors from aborting mysql client in case of conflicts like the ones mentioned above.


Prior to MySQL 5.7 we have been using the mysqlbinlog tool as follows.

shell$> mysqlbinlog master.000001|mysql -uroot -ps3cret  

This allows us to pipe the output of mysqlbinlog to mysql, and works as long as the row events from the mysqlbinlog do not conflict with the data already present on the mysql server on However in case of conflicts, the mysql client aborts and the we have to restart the whole process again skipping the event and starting from the next event using the start-position option.  This is a problem in case of multiple conflicts.

How to use

The enhancement is pretty simple to use. mysqlbinlog in MySQL 5.7.0 provides command line options to enable this feature.

--idempotent, -i 

To use this from the command line  simply use the short option

shell$> mysqlbinlog master.000001 -i | mysql -uroot -ps3cret  

or use full option
shell$> mysqlbinlog master.000001 --idempotent | mysql -uroot -ps3cret  

How does it work

In MySQL server version  5.7.1 we have introduced a new session system variable as rbr_exec_mode which can be set to STRICT or IDEMPOTENT. When set to IDEMPOTENT, the server does not throw out any conflict errors for that particular session.

When executed with -i or --idempotent  option the mysql binlog writes
to the output file or to the stdout. When piped into MySQL client this directs the server to use the idempotent mode while applying row events.

This enhancement is available in MySQL 5.7.1 which can be downloaded from the MySQL download page. So go try it out and let us know your valuable feedback using the comment section.

Monday, October 1, 2012

With MySQL 5.6 coming closer to its release, I got a chance to sneak a look into the Row Based Replication (RBR). 5.6 release of MySQL will be a replication features packed one, and it is for this reason I write  another post on the new enhancements in Row Based Replication(RBR). RBR provides a safe way of replicating between master and slave and thats why RBR enhancements become even more important. RBR in 5.6 is far more optimized than what it was before and we hope it will be fun to deploy, maintain and use.

Folowing are some interesting enhancements in RBR in MySQL 5.6 Replication.

1. Optimized row image:

Large data transfer between master and slave while using RBR is a thing of past. Why log complete rows on the master? Why not just the changed colums?  The MySQL 5.6 release enables a DBA to configure the master to log only the changed columns in the row.  The Before Image (BI) will contain the columns for identifying the rows and the After Image (AI) will just contain the column that has changed.

Before MySQL 5.6: 

As one can observe in the illustration that the row images were full, i.e. all the columns in the row were logged for both the Before Image(BI) and the After Image AI

In MySQL 5.6:

In MySQL 5.6 the DBA can configure if full or minimal or NOBLOB image of rows should be logged in the binary log in the following ways.

1. start the server with --binlog-row-image=MINIMAL|FULL|NOBLOB

In minimal image if there is a Primary Key (PK) or any kind of Unique Index/Key (UK) that can identify a row uniquely on the slave then the BI contains just that column. In the After Image (AI) only the changed columns are logged (See illustration below). 

In case of NOBLOB is used, the BLOB fields will be skipped from logging unless changed. In case  FULL is used the logging will continue as before.

As you can see the amount of data transfer will be reduced by a lot especially if you have used proper indexes/keys for the tables on the master and the slave. A detailed post of this enhancement be found here.

2. Batch operations on table without PK or UK:

I have already mentioned the details of this enhancement in a previous post. This is one of the long awaited optimization when using RBR. This enables the DBA to configure what algorithm will be used  on the slave to find the appropriate row for updating/deleting. We have introduced the concept of HASH_SCAN, in which an in-memory hash is being created on the slave, and provides a way to find and apply rows in a row event in O(n) time.  This enhancement provides a great scope for performance improvement on the slave for the events logged in row format.

3. Informational Log events: 

One of the difficulties DBA have faced till now, while using RBR is that, it is difficult to maintain. The Row images are binary in nature and is not suitable for human "consumption". This is where this enhancement comes handy. There are two ways of using this.

    1. Enabling switch --binlog-rows-query-log-events during the server startup
    2. Changing the corresponding session variable.

Once enabled this will cause the original query for the row event to be logged in the binary log as a special event which the DBA can check using one of the two  methods.

1. One can check these special events in the binary-log by using SHOW BINLOG EVENTS. The query corresponding the the row event will be logged as an event type: Row_query. below is one such example

| Log_name      | Pos | Event_type  | | End... | Info                                          |
| master.000001 |   4 | Format_desc |     1 |    114 | Server ver: 5.6.7-m5-debug-log, Binlog ver: 4 |
| master.000001 | 114 | Query       |     1 |    200 | use `test`; CREATE TABLE NUM_INTS (a INT)     |
| master.000001 | 200 | Query       |     1 |    268 | BEGIN                                         |
| master.000001 | 268 | Rows_query  |     1 |    323 | # INSERT INTO NUM_INTS VALUES (1), (2), (3)   |
| master.000001 | 323 | Table_map   |     1 |    364 | table_id: 54 (test.NUM_INTS)                  |
| master.000001 | 364 | Write_rows  |     1 |    408 | table_id: 54 flags: STMT_END_F                |
| master.000001 | 408 | Query       |     1 |    477 | COMMIT                                        |

In the above terminal output (snip-ed), one can see the event at pos= 268 is an informational event for the next Table_map and Write_rows events. 

2. Another way is to use the mysqlbinlog utility with  --vv switch (verbosity level 2 or more).

shell> mysqlbinlog -vv /master.000001
# at 268
#110401 14:24:29 server id 1  end_log_pos 323   Rows_query
# INSERT INTO t1 VALUES (1), (2), (3)
# at 323
#110401 14:24:29 server id 1  end_log_pos 364   Table_map: `test`.`NUM_INTS` mapped to number 54
# at 364
#110401 14:24:29 server id 1  end_log_pos 408   Write_rows: table id 54 flags:STMT_END_F


So we see how MySQL-5.6 Replication will provide a lot of features that enables a better, faster, reliable slave that is easy to maintain.  So why not just go try it out?