Monday, July 23, 2007

Hashing with java : HashCode & Equals, is that obvious?

Hash code and equals. If you ask java developers what it is, you will get a simple answer to so called simple question.
- “Equals is how we determine the object uniqueness“
- “Hash code is an numeric value representing object state”

So, is the answer is that simple?

The answer to that is yes and no. the general goal of these basic methods is obvious, but some developers might get it wrong.

I hope now you wondering why.

Let us use a simple example
-------------------------------------
public class Person {

private String name;
private String address;


public Person(String name, String address) {
this.name = name;
this.address = address;
}


public String getAddress() {
return address;
}
public void setAddress(String address) {
this.address = address;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}

-------------------------------------
Now we make an assumption.
Two persons can have the same name but not on same addreres.
So, equivalence of Person object is name + address.

lets add equals method:
-------------------------------------
public boolean equals(Object obj) {
//return true if adress and name are the same in both objects

}

-------------------------------------

After adding equals method we must add hashCode:
-------------------------------------
public int hashCode(Object obj) {
//return hash code using adress and name are the same in both objects
}

-------------------------------------

Now, say we want to hold a Set of Persons :
-------------------------------------
Set mySet = new HashSet();
Person madonna= new Person("Madonna","LA");
Person bart = new Person("Bart simpson","NY");

mySet.add(madonna);

mySet.add(bart);



madonna.hashCode() //say the value is 2
bart.hashCode() //say the value is 4

mySet.size() // will give 2.

mySet.contains(madonna) // true

mySet.contains(bart) // true

-------------------------------------



As you can see, equals and hashCode did the work for us in this case.
But.. let us continue:
-------------------------------------
bart.setAddress("California"); // bart moved to Califronia.


mySet.contains(madonna) //true.
mySet.contains(bart) //FALSE !!!

-------------------------------------

We got false when asking if Bart exists in set because we changed the value of hash code of his Person object while he is in a set.


mySet is holding Bart under is old hashcode : 4.
We changed Barts address, so his hashCode now is not 4 any more.
so, when we will ask the set if Bart exists, the set will not find Bart any more due to the hashCode change we made by mistake.

So, how we can implement hashCode and equals?
  • equals should define equivalence not uniqueness
  • hash code and equals always should be implement together.
  • if equals is true between two object, they should return same hash code.
  • if two object return same hash code they don't have to be equals.
  • equals and hash code fields should be based on immutable fields who cant changed during life cycle of an application.

So, we need to add another field to person : id. this field will be immutable, and both hash code and equals will be based on his value.

So, dont forget, equals and hashCode are very easy but yet, very tricky.

3 comments:

Curt said...

It is clearer to consider equals as defining object equivalence, not object uniqueness.

Stas Ostapenko said...

Hi !

There is a really nice explanation of hashCode + equals issues in excellent book "Effective Java : Programming Language Guide" by Joshua Bloch – “Item 8 : Always override hashCode when you override equals”.

Lavnish said...

nice article man