Friday, March 20, 2015

Tuesday, March 10, 2015

RUNNABLE thread - It just looks normal

Its been a very interesting day. I just solved a bug that we were chasing after for quite some time. Generally, we collect failed HTTP requests to certain web services into a dedicated Message Queue. We process the messages in parallel by adding them into a fixed Thread Pool.
It worked successfully 99% of the time and happens that several threads have been started and unfortunately didn't accomplish their job.

We couldn't reproduce the issue and it could happen once a week at 5am :)

I couldn't find any exceptions in the server logs and the severs state were just fine. Performance was not affected, RAM & CPU state were normal.
By reading the logs, I knew that the thread hangs during an HTTP request but I knew that whenever there is no response it returns with the appropriate HTTP status code (i.e. 500).

In order to drill down a bit, I wanted to take a look at the JVM of the specific process and for that purpose I used jVisualVM. I couldn't identify anything special - memory state was fine, couldn't identify any deadlocks, parking threads, etc. so decided to generate a Thread Dump:

then passed through the above thread which is in a RUNNABLE state while infinitely waiting for the socket reader to finish its job.
It turn out that even though you set timeout properly (connection and socket timeouts), it can still hang in an infinite state. Since we are using the Google http client API to access several Google services, I couldnt get into the underlying connection manager to cancel the request so I had to wrap the request in a single Future<V> thread with a reasonable timeout that can force an abort to existing request and add the failed message back into the Message Queue.

Finally its all stable again...hmmm lets deal with the next challenge :)

Thursday, December 25, 2014

MySQL server - Lost connection to server during query

There are many reasons for such an error, thats why MySQL wrote a dedicated subsection at their reference manual (
It happened to me just yesterday while trying to query a large table. 
The way I debugged it was by setting the following: 
1. Log level to 2:  --log-warnings=2 
2. Turn on the slow queries log: slow-query-log=1
Running the query again, revealed the following error in MySQL log file: 
"...Got timeout writing communication packets"

Make sure that your timeout parameters were not set in your MySQL configuration file - their default is: 480 hours which is enough:
Set the following parameters to 28800 seconds - the default is: 60 seconds which is very low for such a query: 

Sunday, October 12, 2014

MySQL Cluster - Migrating a large InnoDB table to NDB Cluster

If this is the first time you try to migrate a large InnoDB table into a NDB cluster, most probably you will run into some issues.
By nature, both storage engines (InnoDB and NDBCluster) have many differences - its highly recommended to read the following:
It really depends on your MySQL cluster architecture, setup and the amount of data you are trying to migrate. But in general, you will probably run into common issues:
1. The table 'table_1' is full
2. Lock wait timeout exceeded; try restarting transaction

These errors can be related to many different configuration issues, but usually you need to consider setting the following important config.ini parameters:
1. DataMemory and IndexMemory - 
The DataMemory value is in bytes and defines the available space to store database records. Note that this will allocate the entire amount specified in your config file, so make sure that you have the required actual space. 
IndexMemory value is in bytes as well and in case you got index, you will need to define this parameter. It controls the amount of storage used to hash the MySQL cluster indexes - depends on your cluster configuration. Lets say you got 4 Nodes (fragments), 2 replicas and there are 1M rows, you can calculate the size using the following formula (see MySQL documentation):

size  = ( (fragments * 32K) + (rows * 18) ) * replicas

2. Lock wait timeout error can be caused by many different configuration issues, but usually its related to the TransactionDeadlockDetectionTimeout parameter. 
Setting the timeout will require you to set the MaxBufferedEpochs parameter as well.
NOTE: while setting your TransactionDeadlockDetectionTimeout, its highly recommended to make sure that your mysqld innodb_lock_wait_timeout set to the correct timeout (

I suggest to read the following blog post as well: 

Finally, don't forget to delete the ndbd_mgm config bin (cache file located in the config location) each time you make changes to your config.ini.

Good luck

Friday, May 10, 2013

Moving your development team to GIT from SVN

About a year ago I had to move our development team completely off of SVN. This isn't an easy task since GIT is conceptually different from SVN. GIT outclasses other SCM tools (SVN, CVS and friends) because they are all built in the same version control repository concept - I strongly suggest you to watch Linus Trovalds talk on YouTube: "Tech Talk: Linus Trovalds on git".

Why did we move completely off of SVN?

GIT is a distributed SCM and this feature allowed us to manage efficiently all the integration of code - each developer could commit his changes to some repository and we could integrate their code into one centralized repository. All the advanced workflows of feature branching and merging are allowed, while in SVN it's very difficult to perform them. Usually you would like to keep two main branches: first branch is your production branch where you maintain your production code and the other branch is your development day-to-day place where you write new features, fixing bugs, etc. Merging your maintenance code from your production branch into your development branch isn't an easy task to do while using SVN. GIT makes it a lot easier.

One major difference is that in GIT nearly all operations are done locally, just as an example, you can still commit your changes while you are not connected to the VPN or even check your project history while you are offline.
There are many other benefits of using GIT: git only adds data and saving storage while you are working with many branches, its fast, folders and files are check-summed and a lot more. I'm not going to go through them here, there are plenty of great docs at:

What do you need to remember before you migrate to GIT?
There are three main states that your working copy files can reside in:
1. Committed - files are stored in the GIT repository.
2. Modified - working copy files have been modified.
3. Staged  - the modified files have been marked and will go at your next commit.
GIT keeps track of file system by taking snapshots of it each time you commit and stores a reference to that snapshot. GIT is efficient and adds data only, which means that if files haven't been changed, it just stores a reference link to the previous identical file version already stored.
The general workflow is that you first pull a working copy, modify files and adding a snapshot of them to your staging area and then you do a commit of your staging.
It's very important to understand the GIT basics before you are moving, I strongly suggest reading more here:

Migration steps:

We had one centralized SVN repository server installed and stored on a CentOS Linux server. Developers could access the SVN repository through HTTP (Apache).  We didn't have a complex SVN repository hierarchy - just trunk and tags. So the first step was to install a GIT bare repository and configure our Apache server to serve GIT requests:
1. Download and install GIT
2. Create a dedicated GIT repositories folder, i.e. /var/www/repositories:
cd /var/www/repositories
git init --bare TestRepository.git (this will create a new bare GIT repository called: TestRepository)
3. There are many ways to access GIT (SSH, git daemon, HTTP, and more). We used git-http-backend  (read more here:
Add a new VirtualHost node into your Apache vhosts configuration file, for example:
<VirtualHost localhost:80>
       ServerAdmin webmaster@localhost
       DocumentRoot /var/www/repositories/TestRepository.git
       ServerName localhost
       ErrorLog "logs/localhost-error.log"
       CustomLog "logs/localhost-access.log" common
       <Directory /var/www/repositories/TestRepository.git>
               Options Includes Indexes FollowSymLinks ExecCGI
               AllowOverride All
               Order allow,deny
       Allow from all  
SetEnv GIT_PROJECT_ROOT /var/www/repositories/TestRepository.git
ScriptAlias /TestRepository.git $PATH-TO-YOUR-GIT-INSTALLED-DIR$\git-http-backend

4. add a Location node to your httpd.conf file:
<Location /TestRepository.git>
     AuthType Basic
     AuthName "Git"
     AuthUserFile auth-users
     Require valid-user

NOTE: use the same AuthUserFile you used to authenticate the SVN repository.
5. Restart Apache
6. Validate that it's possible to clone your TestRepository project:
cd /home/eran
git clone http://localhost/TestRepository.git
cd /home/eran/TestRepository

git config --global "Eran Levy"
git config --global eran@localhost
git-config remote.upload.url http://eran@localhost/TestRepository.git
echo test > testfile.txt
git add testfile.txt
git commit -m "test commit"
git push origin master

If you didn't have any error, you are ready to start the migration:
It's a bit tricky now, there are a few things that you have to consider. Since SVN tags and branches are different from GIT, there isn't a simple way to migrate them into your GIT repository. There are several ways that you can go with:

7. As written above, we moved the latest production and development trunk into GIT. In order to do so, we used git svn clone - you can read more in here:

8. The developers were using the Eclipse IDE installed on a Windows client. Each one of them had to perform the following:

Please read the getting started documentation in order to tune the GIT configuration for your needs ( 

There are many other things to consider: how do you handle your branches, are you going to merge or rebase, what's the main workflow that your development team is going to work in, etc - it's strongly suggested to read the GIT documentation and make decisions. Try not to stick to your "SVN Workflows" and be open minded to use the "power of GIT".

Good luck,

Thursday, May 9, 2013

Windows Virtual Memory

A great series of articles written by Mark Russinovich - Pushing the limits of Windows. 
One of his articles - "Pushing the limits of Windows: Virtual Memory" - is a really interesting artcile describing the Process address spaces, Commit limits and Page File settings. It's much appreciated and I would like to mention his article here, so please check it out:

Thursday, August 16, 2012

Configuration management with Mercurial SCM

The increase of globalization has led to more organizations support everywhere and deploy anywhere in the world. From an experience of having been involved in providing deployment services, I've found that companies can waste lots of time and cost while trying to maintain the software installed at the customer site. 
If you have done that before, you must remember the minute you were sitting in front of your desk trying to remember which configuration files have been changed or what files been updated the last time you have been there.
It doesn’t matter what your role in the organization is, but as soon as you get in the doors at the customer’s office, you are the face of the organization and the customer trusts you. Trust is crucial in relationships – but this is for another discussion.
It's not your fault, just two hours before your flight back home the development team fixed a bug in a high severity state and you couldn’t leave your customer with a faulty system, so you just done what have to be done - replace the file or modify another one, test it and run to the wrap-up meeting.

An efficient management tool and a simple workflow can help organizations complete deployment activities faster and no matter what the size of the organization is.
Let’s get to the point, I suggest using Mercurial for that purpose (of course you can use others…).

Mercurial as stated in their website, is a free, distributed source control management tool. It handles projects of any size and every clone contains the whole project history and almost all actions are local.
You can download the Mercurial installer from here, binary packages are available for almost all platforms (Windows, Linux and others).
Just double-click the exe file to setup Mercurial and add the main Mercurial folder path to the “PATH” environment variable.
I suggest downloading TortoiseHG (available for non-windows platforms as well), which allows to interact with your Mercurial project in a friendly GUI other than using the command line version.
Right after all installed successfully, perform the following:
1.     hg init” in software folder to initiate the repository.
2.     hg add” to add all files to the local repository.
Note that Mercurial saves a local copy of tracked changes in hidden folders at the project folder. If there are any folders, large files or log files that you don’t like to track changes in, use the .hgignore file (
3.     hg commit –m <message>” to commit all files added (except the files/folders stated in the .hgignore file). Note that last version tagged as “tip”.
4.     Tags add a name to a revision and are part of the history.
Mark release changes by using the Tag (“hg tag –r 1 version1.4”).
5.     For further information read the Mercurial guide:

That’s it! From now on, you can track what have been changed and what files have been replaced. Next time you would like to make any changes, just read the repository logs and see what have been changed.

Good Luck