Optimizing inserts \ updates in hibernate

Optimizing insert \ update

(the previous article was about optimizing fetches)

Lets says we have big table or small with big clob \ blob column. We trying to perform a big set of inserts or updates on this table. For simplicity this example will look like .. 

// begin transaction

for (i = 0; i < 100000; i++) {
                    TestTable tt = new TestTable();
                    tt.setValue(t);
                    tt.setCharStorage(charStorage);  // big clob, say we storing some file
                    //tt.setId(Long.valueOf(i)); // using hibernate sequence so not required
                    session.save(tt);
}

//end transaction

In case the memory foot print of the application is huge – 

Capture1

As seen in jconsole my process memory jumped from 5MB initial to 53 MB. The reason for this hibernate is cache all the newly create test table object in session cache. Once the transaction completes the memory usage comes down. It has potential of giving out of memory error.

We can avoid this by using batch inserts and updates. We would use below code, 

// begin transaction

for (i = 0; i < 100000; i++) {
TestTable tt = new TestTable();
tt.setValue(t);
tt.setCharStorage(charStorage);  // big clob, say we storing some file
//tt.setId(Long.valueOf(i)); // using hibernate sequence so not required
session.save(tt);

if ( i%50 == 0 ) { //50 is our batch size.
session.flush();
session.clear();
}
} // end of for loop

//end transaction

Now this code is place, the memory usage hardly increases.

Capture3

If i was not using sequence generator for my primary key, i would have also used hibernate property

<property name=”hibernate.jdbc.batch_size”>50</property>

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

This same approach works for updates as well.

Link to hibernate documentation on batching. http://docs.jboss.org/hibernate/orm/3.3/reference/en-US/html/batch.html

Optimizing fetches in hibernate

Optimizing fetches 

(the next article will be on optimizing updates)

This helps in case of one to many, or many to many entity relationship. If one has lazy loading turned on, typically hibernate load the actual child (many end of entity relation) only when the get method is called. This is behavior is desired in most cases. The downside is very time the get method is involves, we make second call to database to fetch that entity (if it no already in hibernate cache).

I take simple example for illustration, we have user and address entities, user can have one or more addresses.

I have this code

Query query = session.createQuery(“from User”);
List<User> users = (List<User>)query.list();

for (User user : users) {
List<Address> addresses = user.getAddress();
for (Address address :addresses) {

// do some processing the on the address.

}
}

If I have show sql property turned on in hibernate i see following

Hibernate: select user0_.id as id0_, user0_.email_address as email2_0_, user0_.last_access_time as last3_0_, user0_.login_name as login4_0_, user0_.password as password0_, user0_.registration_date as registra6_0_, user0_.verified as verified0_ from users user0_

Hibernate: select address0_.user_id as user4_1_, address0_.id as id1_, address0_.id as id1_0_, address0_.address as address1_0_, address0_.index1 as index3_1_0_, address0_.user_id as user4_1_0_ from address address0_ where address0_.user_id=?


‘ repeat till all address related to all users are read.


Hibernate: select address0_.user_id as user4_1_, address0_.id as id1_, address0_.id as id1_0_, address0_.address as address1_0_, address0_.index1 as index3_1_0_, address0_.user_id as user4_1_0_ from address address0_ where address0_.user_id=?

This is not optimal in this case, think of how many times we will be going to database to fetch all the addresses.

One way to optimize this to use batch size annotation,

@Cascade( {CascadeType.ALL, CascadeType.DELETE_ORPHAN})
@OneToMany(mappedBy = “user”, fetch = FetchType.LAZY)
@JoinColumn(name=”user_id”)
@org.hibernate.annotations.BatchSize(size=4) // added this annotation later.
public List<Address> getAddress() {
return address;
}

The same code now executes with following output

Hibernate: select user0_.id as id0_, user0_.email_address as email2_0_, user0_.last_access_time as last3_0_, user0_.login_name as login4_0_, user0_.password as password0_, user0_.registration_date as registra6_0_, user0_.verified as verified0_ from users user0_

Hibernate: select address0_.user_id as user4_1_, address0_.id as id1_, address0_.id as id1_0_, address0_.address as address1_0_, address0_.index1 as index3_1_0_, address0_.user_id as user4_1_0_ from address address0_ where address0_.user_id in (?, ?, ?, ?)
Hibernate: select address0_.user_id as user4_1_, address0_.id as id1_, address0_.id as id1_0_, address0_.address as address1_0_, address0_.index1 as index3_1_0_, address0_.user_id as user4_1_0_ from address address0_ where address0_.user_id in (?, ?, ?, ?)

So in this case, we go fetch addresses for 4 user ids at time. 4 because our fetch size is 4. I have this at individual entity to entity mapping level, you can also set it globally using hibernate property

<prop key=”hibernate.default_batch_fetch_size”>4</prop>

I would advise when performing this at global level, make sure this does not have adverse effect your applications memory foot print.