Scala – Function vs Procedural quick sort performance comparison

Scala is becoming dominant in many scalable, distributed framework \ toolset. Kafka, Spark, Akka are developed in Scala. I recently had to used both Kafka and Spark, and understanding Scala make it easy to debug the framework code. So i started learning Scala. I am reading Scala By Example. 

One of first example is quick sort. In that author Martin Odersky describe the imperative (i call it procedural) way vs functional way. Martin explains – “But
where the imperative implementation operates in place by modifying the argument array, the functional implementation returns a new sorted array and leaves the argument array unchanged. The functional implementation thus requires more transient memory than the imperative one.”

So I wanted to find out how performance and memory usage between the 2 approaches. The code is checked in @ https://github.com/shrikantpatel/scala_learning/blob/master/src/QuickSortComparision.scala

The output for arraySize (10, 100, 1000, 10000, 100000, 1000000, 50000000) is

**********************************
array size – 10
Functional Sort Time : 64
Procedural Sort Time : 0
**********************************
array size – 100
Functional Sort Time : 13
Procedural Sort Time : 0
**********************************
array size – 1000
Functional Sort Time : 59
Procedural Sort Time : 1
**********************************
array size – 10000
Functional Sort Time : 75
Procedural Sort Time : 2
**********************************
array size – 100000
Functional Sort Time : 344
Procedural Sort Time : 12
**********************************
array size – 1000000
Functional Sort Time : 2159
Procedural Sort Time : 104
**********************************
array size – 50000000
Functional Sort Time : 112694
Procedural Sort Time : 6390

Below is jconsole snapshot for final run of array size of 50000000. The red highlighted point indicate where the Functional sort completes and the Procedural sort starts. ( I put readline between 2 executions, so i wait for few seconds before 2nd execution). So as obvious we see memory usage in 1.5Gb in functional style vs 0.3Gb in procedural style. Also CPU hovers around 25% for entire duration of functional style.

I am still learning about Scala. I believe Author’s reason for putting this in the beginning it to caution developers to being mindful of where to use procedural \ object oriented way and where to use functional way.

QuickSortScala_Comparision

SSL Part 3 – HTTPS – Communication Type

In Part 1 we cover the basic of SSL, PKI. CA. In Part 2 we covered truststore and identitystore. I classify the HTTPS – SSL communication in 2 sub types.

1. 1 way SSL

2. 2 way SSL

They are not 2 types but i call them as such. The protocol wise there are minor variations but from setup perspective there is lot of difference between 2.

 


1 Way SSL

This most commonly used for consumer to business communication. When I launch my bank’s website to login into my account, it is 1 way SSL.

In this case we (the client) wants to validate the server’s certificate, to make sure it talking to right server \ entity. The server verifies the client’s identity using username and password.

In this case, client will have to add server certificate or signing CA (CA that singed server’s certificate)  to its truststore. In case of banking example, most bank has cert signed by well known CAs, and certificate for these CAs are already in the browser. 


2 way SSL 

This mostly used for server to server communication. 

In case, let say entity 1 is server and entity 2 is client. So when client entity 2 connects to server entity 1, it will provide it’s certificate to client, but also ask \ challenge client to provide its certificate. So client certificate instead of username \ password is use to authenticate client identity. So in case client also has to have certificate for itself.

So client will have to include server’s certificate or singing CA in its truststore. Similar server will need to have client’s certificate or signing CA in its truststore.


So when debugging SSL issue you need to first understand whether you communication is 1 way ssl or 2 way ssl. Based on sub type right certificate are in right keystore file.

If i have customer facing website i cannot expect everyone to have certificate generated by verisign to validate their identities, i would use 1 way SSL there. If its back end service i can use using user name \password or 2 way SSL for authenticating client identity.

SSL Part 2 – HTTPS – What is java truststore and identity store

In Part 1 we covered the basics of SSL, PKI. CA. Before we move to Part 3, it is important to understand the difference between trust store and identity store. This discussion is kind of specific how SSL works in Java. From little background in .NET this is identical but term could be little different.

In Java certificate are stored in file called Java Key Store (JKS) file. This has extend .jks. I am using Keystore Explorer utility to show you the internal of jks file, you can keytool that comes with java, but it outputs in text and hard to visualize.


Trust Store 

This jks stores all public certificate of all the entities that we trust. In java, under jdk or jre install you will find trust store to be located at \lib\security\cacerts (this jks file even though it does not have extension). The default password for it is “changeit”. It will ask for this when you try to open it. 

 part 1 pic 1

 


Identity Store 

Some call this keystore. This is jks file that store my entity’s public certificate along with private key.  This file does not come with java out of box. (You can either create one using keytool or you can use keystore explorer). Typically you will have 1 or in some cases handful of entries in this file.

part 1 pic 2

 As you will see in this snapshot, this enter looks different that other enters in truststore. Trust store has all entries with symbol of certificate besides it, in identity store beside you will pair of key, and lock. The keypair indicate that this has both private and public key, lock symbol indicates the private key is locked using passphrase.

SSL Part 1 – Basic of SSL, PKI and HTTPS

This is very basic. This will help in case you are just starting with SSL. I have kept things simple to make it easy to understand. For folks seeking more in depth details,  i will cover each area in more details in follow up blogs.

 

SSL \ TLS  is protocol to communicate securely between 2 system\computers. It works at transport layer. The 2 system can be client to server, server to server. SSL use PKI – public key infrastructure.

PKI uses concept of private key and public key. As name suggest – private key is private and kept securely with entity (can be either server or client) it belongs to. Let call this entity 1. Entity 1 will distribute its public key to everyone. Private key and public key are unique pair, complex mathematical algorithm is used to generate the pair. So its impossible to reverse engineer private key from public key.

If you encrypt message using public key, it can only be decrypted with private key and vice versa.

So when entity 2 wants to send secure message to entity 1. Entity 2 uses entity 1’s public key (which is publicly available) to encrypt message. So only entity 1 which has private key can decrypt the message.

Conversely, entity 1 can sign message with its private key so that when entity 2 received it use entity 1’s public key to valid the message is indeed coming from entity 1.

So this simple.

 

 

But there is loop hole in this, public key are exchanged on internet. Communication between 2 systems passes through lot of intermediate entities (router, other servers), so how do know for sure that public of entity 1 is true entity 1’s public key. There would intermediate entity 3 which can represent itself to be entity 1, and provide it public key as entity 1’s public key. This man in the middle (MITM) attach. (more detailed explanation is at http://en.wikipedia.org/wiki/Man-in-the-middle_attack)

So we require neutral 3 party entity that can be trusted on the internet. Those entities are called Certificate Authority (CA). Verisign, Geotrust, Comodo etc are some of CA out there. We understand CA, which try to resolve MITM issue.

All CAs have cert publicly available and distributed with installation of software. There is list of CA already included in all browsers, Java installations. If you are interest you can google to find how to look at CAs in our browser and java. This is called as trust store, so store of all the Cert Authorities that software or entity trusts.

So what difference between Cert and Public key? In simple term certificate is nothing but entity’s public key which is signed by CA and has some additional information regarding usage of that certificate. Certificate has information as to which CA signed it and CA’s own public certificate or key. It also has information when certificate was issued and till when its validate.

 

 

So now coming back to MTIM attack. So instead of distributing public key, we take entity 1’s public key sent to CA like Verisign to get public certificate. And distribute that public certificate instead of public key. So Verisign is vouching that this is public certificate belonging only to entity 1.

So when entity 2 receives entity 1’s public certificate it does following

1. Get the entity’s CA from certificate.

2. Validate if that CA is present in Trust Store.

3. Validate if certificate is valid (not expired etc, there are few more things it will check but not going into details at this point).

If the presented certificate is valid, it initiate the communication.

In case MITM attack is avoided because CA will never (almost likely) issue certificate to entity 3 saying ti entity 1.

 

 Https is nothing but http protocl that use SSL protocol to secure http communication between 2 system. 

 

Different network exception and their causes

In client server communication, typically one will get subclass of IOException if there is connection issue between the client and server.  Based on the exact exception one can determine the possible causes.

Java application gets  java.net.UnknownHostException: spatel because of following reasons,

  1. Cannot perform DNS lookup because of network issue
  2. Some of issue with DNS server itself
  3. Host’s IP address cannot be determined by DNS.

(The server url in this case was https://spatel:443/testapp)

Java application gets  java.net.NoRouteToHostException: No route to host: connect because of following reasons

  1. When client is not able to reach the server because either server or intermediate router is down
  2. Some sort of network issue.

Java application gets  java.net.ConnectException: Connection refused: connect because of following reasons

  1. When client is able to ping server but the said service is not running on the specified port.
  2. May be server is down or under maintenance.

Java application gets java.net.SocketException: Connection reset because of following reasons

  1. When server terminates the connection abruptly instead of sending the complete response and then properly terminating the connection.
  2. Server goes down while processing the request.

Java application gets java.net.SocketTimeoutException: Read timed out because of following reasons

  1. Client has specified read timeout or socket timeout on connection and server does not respond before this time out occurs.
  2. This may indicate the socket\read timeout on client is short, so the server is not responding as per agreed SLA.

enq: TX – row lock contention. Deadlock – When parent and child transaction try to update the same row.

In any system, careful attention should be paid to transaction boundaries, also the transaction isolation and transaction scope. Otherwise we may shoot ourselves in foot.

We have service which update a row, this service calls another service, which in turn calls another service and so on. There is long chain of service call. Somewhere in the chain one of service starts new transaction, and tries to update the same records, its cause the 2 transaction to create row lock contention.

This simplified version of real life example.

Service Class

@Override
@Transactional (readOnly = false, propagation=Propagation.REQUIRES_NEW)
public void parentUpdatePassword (String password) throws BackingStoreException {
                          updatePassword(“shri”, password+1);
                          userDAO.getSessionFactory().getCurrentSession().flush(); // this cause update to go to database, but it does not commit.
                          userService1.updatePassword(“shri”, password+2); // this method is in service 1 class which is below
}

Service 1 Class

@Override
@Transactional (propagation=Propagation.REQUIRES_NEW)
public void updatePassword (String login, String password) {
                        User user = loadUserByLogin(login);
                         user.setPassword(password);
                         updateUser(user);
}

In above case the transaction T1 starts when parentUpdatePassword() is called. It updates the password, and as soon as flush is called it send this update to database, which locks this row in database. Since transaction T1 is not finished, it does not commit and does not release the lock on the row. This method calls userService1.updatePassword() which also starts in new transaction T2. T2 also tries to update the same record, but it has to wait for T1 to complete. But T1 cannot complete till T2 completes, so it sort of deadlock. Call to method parentUpdatePassword() never terminate. That particular row in user table is locked for ever. Unless we kill method execution or kill session from database. 

At this point, if try to see the session in oracle, select lockwait, event, program, type from v$Session

You will record like this – AFA025DC | enq: TX – row lock contention | JDBC Thin Client | USER

If we remove flush() in method parentUpdatePassword(), the row contention would not happen. First I do not recommend calling hibernate flush manually, hibernate will automatically flush before committing the transaction. (There are few rear scenarios where user of flush() is unavoidable because way hibernate orders insert, update, delete when transaction commit, will cover that in separate blog)

More fundamental problem is creating a new child transaction and trying to update the same record in the child transaction, we should not be doing that. Even if removing flush resolve the row contention, negative effect is the update that we intended to achieve in child transaction T2 will be over written when parent transaction T1 commits. So password will always be password + 1.

There are valid reasons for creating new transaction, but in this case analysis revealed that we did not require new transaction.

In conclusion, whenever creating new transaction be extra cautious. Keep in mind there is overhead associate with creating a new transaction. Always pay attention to transaction boundary. 

Optimizing inserts \ updates in hibernate

Optimizing insert \ update

(the previous article was about optimizing fetches)

Lets says we have big table or small with big clob \ blob column. We trying to perform a big set of inserts or updates on this table. For simplicity this example will look like .. 

// begin transaction

for (i = 0; i < 100000; i++) {
                    TestTable tt = new TestTable();
                    tt.setValue(t);
                    tt.setCharStorage(charStorage);  // big clob, say we storing some file
                    //tt.setId(Long.valueOf(i)); // using hibernate sequence so not required
                    session.save(tt);
}

//end transaction

In case the memory foot print of the application is huge – 

Capture1

As seen in jconsole my process memory jumped from 5MB initial to 53 MB. The reason for this hibernate is cache all the newly create test table object in session cache. Once the transaction completes the memory usage comes down. It has potential of giving out of memory error.

We can avoid this by using batch inserts and updates. We would use below code, 

// begin transaction

for (i = 0; i < 100000; i++) {
TestTable tt = new TestTable();
tt.setValue(t);
tt.setCharStorage(charStorage);  // big clob, say we storing some file
//tt.setId(Long.valueOf(i)); // using hibernate sequence so not required
session.save(tt);

if ( i%50 == 0 ) { //50 is our batch size.
session.flush();
session.clear();
}
} // end of for loop

//end transaction

Now this code is place, the memory usage hardly increases.

Capture3

If i was not using sequence generator for my primary key, i would have also used hibernate property

<property name=”hibernate.jdbc.batch_size”>50</property>

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

This same approach works for updates as well.

Link to hibernate documentation on batching. http://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/batch.html