Java Collections API Made Stupid

by Steven J. Owens (unless otherwise attributed)

This is a "stupid" take on the java Collections API. If I were smart and clever, I'd talk about interfaces and abstract classes and, and, and... so that by the time I get into how to actually use the concrete classes, you'll be fast asleep. Instead, I'm going to use brute force; I'll leap straight into the concrete classes and not get to the interfaces for about 500 lines of text. At least that way you'll start to use the Collections API and can some day aspire to use the Collections API Interfaces properly. If you want more ranting on this topic, skip to the end and read the rationale there, then return here and continue.

I will assume that a) you have a copy of the javadocs handy and b) you understand basic concepts of programming (like what it means to index an array, what looping over an array means (which we call "iterating" in the Collections world), and so forth).

Intro

Usually your java project has a bunch of objects, or bits of data, and you need to have some way to keep track of them in some coherent, organized fashion. Most beginners know about arrays, but they're pretty limited and you have to do a lot of roll-your-own code for common tasks. Enter the java Collections API.

The original version of this article had a couple paragraphs here about the early versions of java and the two classes it had to do this stuff, the Vector and Hashtable. I moved them to here:

History Lesson

Okay, on to the quick & dirty stuff.

Quick & Dirty Collections API

Oddly enough, the Collections API classes are not a separate package. The various java Collections API classes (and interfaces) are all under the java.util package. However, most of the classes in the Collections API implement the interface java.util.Collection, so that's a pretty good way to see most of the objects that are in the Collections API.

Near as I can tell, with some groveling through the java API docs for java 1.4.1, there are 8 concrete classes that you really want to know about. I'm going to list them, then briefly describe them and when you'd use them, then go into detail on them.

Originally I had a big table here listing all of the relevant classes, but then I decided that really kind of sidetracks you from the main point of this article, which is to get down and dirty, as quickly as possible, with the Collections themselves. So I moved it to the end, and instead this is just a list of the ones you'll use most of the time.

Workhorse Collection Classes
ArrayListlist of objects, an automatically resizable array; use this flavor of list most of the time, it's faster for indexing and looping
LinkedListlike ArrayList, use this flavor of list if you're going to be doing a lot of inserting and removing
HashMapbunch of key/value pairs of objects
Note that the keys are inherently a set - if you try to store a second value object under an already-used key, the first value will be dropped from the HashMap.
TreeMapsorted bunch of key/value pairs of objects
Not a hierarchical data structure; the tree refers to the data structure used to keep track of the sorted order.
The sort is done according to the object.equals() method of the objects you put in it.
HashSetset of objects (set in the mathematical sense, i.e. only one of each value, but not key/value pairs)
TreeSetset of objects (set in the mathematical sense, etc), kept sorted in some order
See the comments in TreeMap.
LinkedHashMapbunch of key/value pairs of objects kept in order-of-entry
LinkedHashSetset of objects (in the mathematical sense), kept in order-of-entry
Collection Support Classes
Iteratortool for iterating (looping) over List or Set collections;
usually implemented as an inner class of a given collection.
Collectionsholds a bunch of static methods for doing handy manipulations on various collections, mostly Lists

You quickly start to see different ways you can group these into categories, like:

Or:

This is what leads people writing about Collections to go off into questions of interfaces and abstract types. Which we're not going to do here.

ArrayList and LinkedList

ArrayList is the general-purpose workhorse List object. I use ArrayList unless I have a good reason to use LinkedList.

I have a little problem here; I want to avoid talking about interfaces, like the List interface, but it's hard to talk about lists without using the word List. If List is too vague for you, then think "ordered group of objects that may have duplicates".

Both ArrayList and LinkedList have methods like the following:

Typical List Methods
methoddoes
someList.add(Object)add an object to the end of the list
someList.get(int)get the object at position int in the list (remember, the first one is position 0)
someList.remove(int)remote the object at position int in the list (remember, the first one...)
someList.indexOf(Object)find the object's position in the list (remember, the first one...)
someList.size()get the size of the list (remember, the first one is position 0, so the last one in the list is at position (size() - 1))

ArrayList is backed by an array, and will resize or shrink the internal array as necessary. This is handy, and the array means it's faster for cases where you build a list and iterate over it a lot, but slower for where you have to insert things a lot, or remove things from the beginning or the middle a lot (removing things from the end is still fairly fast).

LinkedList is backed by a linked list data structure (each object in the series is stored in list node object, the node has a reference to the next node in the series; in a doubly-linked list the node has reference to the previous node as well). A LinkedList is slower for indexing, but generally faster for inserting and removing and rearranging.

Note: Of course, "faster" and "slower" in this context is almost irrelevant to 99.99% of your programming. It's an extremely rare case where this will actually matter, and of those extremely rare cases, it's an extremely rare case where the right choice is not blindingly obvious. If you have to spend more than a few seconds deciding it, odds are fairly high that you're over-braining it. Or that you've got a black swan.

LinkedList also has some handy extra methods for positional stuff like addFirst(), addLast(), removeFirst(), removeLast(), which also means you can use it as a queue. Also, a LinkedList can produce a ListIterator, which gives you more options to iterate forward, iterate backward, remove stuff while iterating, etc.

Note: In JDK 1.5 various queue interfaces and implementations were added to the API, and List was retrofitted to implement the Queue interface. Considerable new BlockingQueue implementations were added as part of java.util.concurrent. In JDK 1.6 a double-ended queue interface and implementation were added: Deque, BlockingDeque.

Neither ArrayList nor LinkedList are synchronized, but you can get a synchronized version by instantiating the list, then passing the list reference into java.util.Collections.synchronizedList() and then using the list reference that method returns.

// A synchronized List
List mySynchronizedList = java.util.Collections.SynchronizedList(new ArrayList()) ;

But remember that synchronizedList() only returns a List, so you can't use the special LinkedList methods with the object you get back from synchronizedList(). Hm, maybe you can downcast it to LinkedList and get those special LinkedList methods; I'll have to try that someday.

ArrayList and LinkedList and Iterator

For iterating (looping over the contents of a collection), ArrayList and LinkedList (and pretty much all of the various Collections classes) use an Iterator object, which you get by calling the .iterator() method on whatever Collection object you're dealing with.

Note: Iterator replaces the old Enumeration class that you used with Vectors and Hashtables, and has simpler method names, just .hasNext() and .next() instead of .hasMoreElements() and .nextElement().

List foo = new ArrayList() ;
foo.add("bar") ;
foo.add("bat") ;
foo.add("baz") ;
Iterator it = foo.iterator() ;
while (it.hasNext()) {
    String xyzzy = (String)it.next() ;
    System.out.println("" + xyzzy) ;
}

This produces output like:

bar
bat
baz

Oddly enough, Iterator is actually an interface. The implementation is produced by each Collection's .iterator() method, but there's no visible concrete Iterator class in the Java API docs. I guess they're all implemented as inner classes of the matching Collection (probably so they have permission to get at the innards of the Collection in question).

See Iterating, Enumerating and For Loop, below, for further details.

HashMap

HashMap keeps track of key/value pairs. Getting a key and looking up the corresponding value is sometimes called "mapping" in programming. A Hash is a mathematical technique for doing this very quickly, which I'm not going to get into here. The methods you'll generally care the most about are:

Typical Map Methods
methoddoes
someMap.put(keyObject, valueObject)add an entry (a key/value pair of objects) to the hashmap
someMap.get(keyObject)get the value object that matches this key object
someMap.containsKey(Object)see if the hashmap contains an entry with this key object
someMap.containsValue(Object)see if the hashmap contains an entry with this value object

GOTCHA: HashMap and related classes, and Collections in general use the key or value object's .equals() method when they're checking to see if object A is the same as object B. For many of the core java classes, .equals() does what you would generally hope it would do, that is return some common sense answer to the question. But the default behavior is to return object equality (e.g. is this object the same as the parameter object) so check the javadocs to make sure .equals will do something useful, before relying on it.

For example, String.equals() returns true if both Strings represent the same sequence of letters. Integer.equals() returns true if both Integers represent the same int value. This doesn't always work the way you might expect. A related gotcha is that you have to be careful not to use == when you should use .equals().

Integer foo = new Integer(5) ;
Integer bar = new Integer(5) ;
if (foo == bar) {  // bad programmer, no donut!
  System.out.println("No!") ;
} 
if (foo.equals(bar)) { // much better
  System.out.println("Yes!") ;
}

For more info on both these topics, see the section below Equality, Identity and Comparison.

If you need a synchronized version of HashMap you can get one by, when you first instantiate the Map, by passing it into java.util.Collections.synchronizedMap().

// A synchronized Map
Map = java.util.Collections.synchronizedMap(new HashMap()) ;

HashMap Iterating

Unlike most other Collections, HashMap does not have a .iterator() method. If you want to iterate through the keys you have to use someMap.keySet(), which returns a Set of all the keys in this hashMap. Of course then you have to call someSet.iterator() on the Set object:

Iterator it = someHashMap.keySet().iterator() ;

Or you could use .values(), which returns a Collection of all the values in this hashmap:

Iterator it = someHashMap.values().iterator() ;

Or if you were planning to iterate through all the keys and, for each key, get the matching value, and then do something with both the key and the value, you want to use .entrySet(), which returns a Set full of java.util.Map.Entry objects (Entry is similar to the Iterator, in that the implementation is hidden, probably an inner class inside Map).

Iterator it = someHashMap.entrySet().iterator() ;
while (it.hasNext()) {
    Map.Entry entry = (Map.Entry)it.next() ;
    String key = (String)entry.getKey() ;
    String value = (String)entry.getValue() ;
    System.out.println("Entry " + key + " has value " + value) ;
}

Of course, this example assumes that we used Strings for both keys and values when we created the HashMap. If we're not sure, we could just get the key and value as class Object, and see what their toString() method produces.

While we're on .entrySet(), here's a neat trick. This method returns a SortedSet (the interface that TreeSet implements). SortedSet has a .containsAll() method. So if you have two HashMaps and you want to see if one HashMap contains all of the keys and values in some other HashMap:

boolean aContainsB = someHashMap.entrySet().containsAll(someOtherHashMap.entrySet()) ;
GOTCHA: All of these methods (keySet(), values() and entrySet()) return objects that are essentially connected to ("backed by") the original HashMap, so if you make changes on these objects (the key set, or value set, or entry set), you're making changes on the original HashMap.

If you really want to make changes on a copy without changing the original, you're going to have to duplicate the Collection. It's been a while since I did this myself, so test this out for yourself before you go betting the company's future on it, but usually you can do this by simply instantiating the new Collection with the old Collection as a parameter:

HashMap foo = new HashMap(someOtherHashMap) ;
ArrayList bar = new ArrayList(someOtherList) ;

TreeMap

A TreeMap is pretty much like a HashMap, only it keeps the keys sorted in some order.

You might think that with a name like "TreeMap" it lets you keep stuff in some sort of hierarchy, but nope. It's called a "tree" because the data structure that it uses to keep track of the sorted order is a binary tree. Instead of being backed by a hash data structure, it's backed by a "red-black tree" data structure (go look it up somewhere, I skipped Data Structures and Algorithms in college), and it uses the key objects' Comparable interface to sort them. This means that the key objects have to implement Comparable, and the keys have to have compatible Comparable implementations (say that ten times fast).

Other than the sorted order stuff, it works just like a HashMap.

Why "sorted in some order"? Sorting is a bit tricky to explain, since you have lots of options for how things are sorted; we'll explain that in detail later.

GOTCHA: Typically, an object's .compareTo(Object) method will just blithely try to cast the argument object to the same class as it and then compare them. This means that .compareTo() can throw a runtime exception of ClassCastException, and thus so can TreeMap.put(key, value), if the keys aren't all safe to compare to each other.

However, once you get that all lined up, you just add the key/value pairs to the TreeMap and then iterate through the TreeMap keys, and by default they're sorted.

Many of the objects in java.lang already have Comparable implementations. See the javadocs for java.lang.Comparable for an up to date list. Also see the Equality, Identity and Comparison section below.

HashSet and TreeSet

HashSet and TreeSet are both like mathematical sets; they only allow one of any object to be added. "Hash" might mislead you to think it's some sort of key/value set, but no, it just contains individual values; the "hash" is because it uses a HashMap under the covers to implement the "only one of each" stuff.

They aren't classes I find myself using a lot, but then again I can think of more than a few times that I coded up little sequences like:

List foo = new ArrayList();
public void addItem(Integer item) {
  if (!(foo.contains(item)) {
    foo.add(item);
  }
}

With a HashSet or a TreeSet, you don't have to test to make sure you're not adding an object twice.

Since they're Sets, they're simpler than Lists, they don't need to know about order and where an object is in the Set.

HashSet, as mentioned above, is backed by a HashMap.

TreeSet is backed by a TreeMap, so it will automatically keep the objects in their "natural" order. See TreeMap for details about order.

LinkedHashMap and LinkedHashSet

LinkedHashMap and LinkedHashSet are a HashMap and HashSet that also keeps a doubly-linked list of the elements in "predictable iteration order" (usually the order in which you added the elements to it). They have the same methods as HashMap and HashSet.

IdentityHashMap and WeakHashMap

For experts only.

IdentityHashMap and WeakHashMap are kinda weird, and you should really study up on them before using them.

Unlike a normal HashMap, which just cares about the key object's .equals() and .hashcode() method, IdentityHashMap actually keys on the object reference value. This value will generally mean nothing at all to a human being, but there are situations where you would want to do this sort of thing, and I'm sure a good example will come to me sometime. More likely, you'll find some good examples if you do some googling on your own.

A WeakHashMap, on the other hand, is for when you want to keep track of things while they exist, but you don't want the WeakHashMap itself to keep the garbage collector from cleaning them up. This can be trickier than it sounds, which is why I note it's "for experts only" above. WeakHashMap uses a weak reference to keep track of the key. If nothing else but the WeakHashMap refers to the key, it will automatically disappear from the WeakHashMap (the garbage collector will clean it up).

GOTCHA: it only disappears if nothing else refers to it. So if you do someWeakMap.put(val.key, val), it won't get cleaned up because val has a reference to the key. You can avoid this by using a WeakReference, i.e. someWeakMap.put(val.key, new WeakReference(val)). This is all tricky, I suggest you go read up on this (a lot) before you use it.

All Those Wonderful Interfaces

Okay, now that we've introduced you to the concrete Collections classes, it's time to talk about interfaces. If you're not familiar and comfortable with the general concept of interfaces, check out this little tutorial and then come back here.

Now that you know about all of these concrete classes, you need to learn to immediately - usually in the same line of code that you instantiate a concrete class - cast the object to one of its interfaces, usually the one that's the most appropriate to what you're going to do with the object.

For example:

List foo = new ArrayList() ;
foo.add("one") ;
foo.add("two") ;
foo.add("three") ;
foo.add("four") ;
...
Map bar = new HashMap() ;
bar.add("one", "for the money,") ;
bar.add("two", "for the show,");
bar.add("three", "to get ready,") ;
bar.add("four", "to go!") ;
...

Generally you'll deal in terms of the Collections trinity - the generalist interfaces.

  • List
  • Map
  • Set

    There's also their helper, Iterator. You'll never really see a concrete implementation class for Iterator, because you always get an instance of Iterator from one of the collection classes .iterator() methods. But it actually is an interface.

    Sometimes you'll need to use the meta-interface, which you can cast most List or Set implementations to:

  • Collection

    (but not Map implementations, though they do seem to have most of the methods you need to meet the Collection interface requirements). Generally you'll run into this when you deal with somebody else's code that returns a Collection instead of something more specific.

    And on rare occasions you'll use some specialized interfaces:

  • SortedSet
  • SortedMap

    Every now and then you'll need the super-specialist interfaces:

  • LinkedList
  • ListIterator

    Actually, I'm lying about LinkedList, it's not an interface, it's an implementation, but there are times when you have to deal with a LinkedList object as a LinkedList object. (See the LinkedList entry in the Excruciating Detail section, below).

    Collection Class Summary

    Workhorse Collection Classes
    ArrayList resizable array
    LinkedList like ArrayList, but handier for insert/remove
    HashMap key/value pairs, backed by a HashMap
    TreeMap sorted key/value pairs
    HashSet set of unique values, backed by a HashMap
    TreeSet sorted set of unique values, backed by a red-black tree
    LinkedHashMap ordered key/value pairs (in order of entry)
    LinkedHashSet ordered set of unique value (in order of entry)
    Collection Support Classes
    Iterator Interface for a tool for iterating over the contents of a Collection (replacement for Enumeration, simpler method names), available for many Collections (the List and Set collections) by calling foo.iterator()
    Entry An Entry represents a single key/value pair of a HashMap, TreeMap, or LinkedHashMap. Like Iterator, you get it by calling a method, but it's slightly more complex - you call foo.entrySet() to get a Set of Entry objects, usually you use something like foo.entrySet().iterator() and just iterate over the Entry objects.
    java.lang.Comparable Interface that's used to figure out sorting order for TreeMap and TreeSet. A bunch of core java objects have default implementations (e.g. String, Integer, Long, etc).
    Comparator interface the programmer can implement to override the Comparable ordering that TreeMap and TreeSet use by default
    java.util.Collections A bunch of collection-munging methods, good for all sorts of fun, most notably producing synchronized or unmodifiable versions of ArrayList, LinkedList, HashMap, TreeMap or LinkedHashmap
    Oddball Collection Classes
    IdentityHashMap for experts only: essentially a HashMap-ish sort of thing (but technically not a HashMap) that keys on the key object's reference rather than the key object's value
    WeakHashMap for experts only: HashMap using weak references, useful for implementing caches without interfering with garbage collection
    Leftovers
    Vector Legacy version of List, inherently synchronized.
    Hashtable Legacy version of HashMap, inherently synchronized.
    Properties Legacy collection, essentially a String-only HashTable
    BitSet A Vector of bits (1/0 values), essentially an efficient way to simulate Vector of Booleans.
    Stack Wrapper around Vector, implements a classic last-in/first-out "stack" data structure: put something on the stack with push(), take it off the stack with pop(), take the next thing off the stack with another call to pop(), etc.
    Arrays Holds utility methods for munging arrays.
    System Mainly relevant for the System.arraycopy() method

    Note: I wrote this for JDK 1.4. A number of collections interfaces and implementations have been added in JDK 1.5 and JDK 1.6. I don't have time to compile a table as above, so for now I'll simply list them and point you at the Collections Framework Change Summary URLs.

    JDK 1.5

    http://java.sun.com/javase/6/docs/technotes/guides/collections/changes5.html

    JDK 1.5 had a bunch of new stuff added, mostly about queues but also some java.util.concurrent related stuff.

    http://java.sun.com/javase/6/docs/technotes/guides/collections/changes6.html

    "The primary theme of the API changes was better bi-directional collection access."

    Some day I'd like to come back here and build a huge cross-indexed table of classes and methods, so you can see (for example) what methods LinkedList has that ArrayList doesn't have.

    Before We Move On

    A couple of tips before we get into the Excruciating Details. First, check out the utility methods on the Collections class. I mentioned them only slightly, but they provide a lot of handy little methods for manipulating collections

    Second, why reinvent the wheel when other people have already reinvented it for you?

    http://commons.apache.org/collections/

    http://code.google.com/p/google-collections/

    Also, these are kinda neat:

    http://lavender.cime.net/~ricky/constrainedcollections.html

    http://code.google.com/p/concurrentlinkedhashmap/

    Excruciating Detail

    Warning: Beyond This Point Lies Madness (or at least boredom)

    Iterating, Enumerating and For Loop

    Remember with arrays, you loop through the elements in the array using a for loop:

    for (int i=0; i > someArray.length; i++) {
      System.out.println("" + someArray[i]) ;
    }
    

    You loop through an ArrayList by using the ArrayList.iterator() method to get an Iterator object, then using the Iterator object's .hasNext() and .next() methods.

    You can think of the iterator as sort of an object-oriented version of the contents of the for loop parentheses:

    (int i=0; i > someArray.length; i++)

    At any given time, the Iterator knows whether its counter is past the end of the ArrayList, which you can find out by calling iterator.hasNext() (which returns a boolean). The iterator.next() method returns the next object and increments the counter.

    List foo = new List(10) ;
    foo.add("bar") ;
    foo.add("bat") ;
    foo.add("baz") ;
    Iterator it = foo.iterator() ;
    while (it.hasNext()) {
      String xyzzy = (String)it.next() ;
      System.out.println("" + xyzzy) ;
    }
    

    In fact, I've seen people advocate using both a for loop and an interator:

    for (Iterator it = foo.iterator(); it.hasNext(); ) {  // 
      String xyzzy = (String)it.next() ;
      System.out.println("" + xyzzy) ;
    }
    

    Notice that the third chunk of the for loop (innards) is missing above. The normal pattern is (initialize; test; increment), but in this case we leave the increment portion blank, because calling it.next() in the body of the loop increments the iterator.

    People argue about which way is better. Using the for loop limits the Iterator to inside the scope of the loop, which is nice. But there's a case that the while loop is more legible. Personally, while I favor legibility, I also favor expressing intent cearly. A while loop seems to say "we're going to loop until we find some condition we're looking for". A for loop seems to say "we're going to loop across the whole thing."

    Sure, the contents of the for loop parentheses could be cleaner, but if your for loop parentheses are commonly so cluttered that you can't tell at a glance whether a given example is using an iterator or doing something clever, you should move your cleverness out to a separate method (unless that loop is a vital piece of high-performance code, which it rarely is).

    Having said all that, I've never gotten in the habit of using for loops with iterators. I'll have to work on that.

    Casting

    Either way you go, notice that cast to (String). The ArrayList just contains objects, it has no way of knowing (or caring) what class the object has in it, so you have to cast the object to String when you assign it to a String variable. See the section on Wrapper Classes, below.

    With the pre-1.2 Vector and Hashtable you use an Enumeration instead:

    Vector foo = new Vector(10) ;
    foo.add("bar") ;
    foo.add("bat") ;
    foo.add("baz") ;
    Enumeration en = foo.elements() ;
    while (en.hasMoreElements()) {
      String xyzzy = (String)en.nextElement() ;
      System.out.println("" + xyzzy) ;
    }
    

    If you're using a synchronized List that you got via java.util.Collections.synchronizedList(), you have to be careful when iterating. If some other thread modifies the List while you're iterating, your next call to an Iterator method will get you an exception.

    LinkedList

    LinkedList builds the list as a linked list (each object is stored inside a holder object, which has a reference to the next holder object and the previous holder object). For most List purposes, it works the exact same way as ArrayList:

    List foo = new LinkedList();
    String monday = "monday" ;
    foo.add(monday) ;
    String tuesday = "tuesday" ;
    foo.add(tuesday) ;
    String wednesday = "wednesday" ;
    foo.add(wednesday) ; 
    String thirdDay = (String)foo.get(3) ;
    System.out.println(thirdDay) ;
    

    There are one or two operations that you can only do with a LinkedList, they're not part of the standard List interface. They have to do with inserting and removing, and generally other stuff that you can do more easily with a LinkedList than with an ArrayList.

  • addFirst(Object)
  • Object getFirst()
  • removeFirst()
  • addLast(Object)
  • getLast()
  • removeLast()
  • lastIndexOf(Object)

    Also, besides the normal .iterator() method, LinkedList has a .listIterator() method, that returns a special ListIterator object instead of a normal Iterator. Okay, to be specific, ArrayList has a .listIterator() method too, but that returns a limited ListIterator that will throw UnsupportedOperationException at the drop of a hat, if you try to use ListIterator methods like remove(), set(), or add().

    A ListIterator is sort of like a WonkaVator. With a normal Iterator, you can't mess with the contents of the List while you're iterating over it. With a ListIterator, you can iterate forward, or backwards, or remove elements while iterating.

    Sets: HashSet and TreeSet

    A Set is like a mathematical set - it's a bag of objects, and you can be sure that there's only one of each object in the bag. Unlike a List, it's not ordered, and if you try to add a value that's already in it, it ignores it.

    Generally you'll use a HashSet, which is named that because it builds a Set by using a HashMap underneath.

    A TreeSet builds a Set by using a "red-black tree", which means that the elements of the set will automatically be sorted when you add them - it'll add each object to the Set in the right place.

  • boolean someSet.add(Object)
  • boolean someSet.remove(Object)
  • boolean someSet.contains(Object)
  • int someSet.size()

    Synchronization

    As mentioned repeatedly above, these collections are not synchronized, they are not thread-safe. If you need synchronization, you have two choices.

    1) Use the java.lang.Collections.synchronizedFoo() methods on your collection (see the javadocs for java.lang.Collections for more info, also check the java tutorial).

    2) Use a Vector or Hashtable instead.

    Equality, Identity and Comparison

    As said above, classes in the Collections API are supposed to use the .equals() method when they're checking for objects. If you do someArrayList.indexOf(object), or if you do someHashMap.containsKey(object), those classes should be calling object.equals() on each of their objects to see if that object matches.

    For a great many objects in the core java API, .equals() does what you would hope it would do. For example, String.equals() returns true if both Strings represent the same sequence of letters. Integer.equals() returns true if both Integers represent the same int value:

    HashMap foo = new HashMap() ;
    Integer one = new Integer(1) ;
    foo.put(one, "Right") ; 
    Integer anotherOne = new Integer(1) ;
    foo.put(anotherOne, "Wrong") ;
    System.out.println(foo.get(one) ;
    

    Unless I'm on crack (which I could be) this should print out: "Wrong". However, I've never particularly set out to break things in this way. Give it a try and see what happens (and email me to let me know).

    However, .equals() doing what you hope it will do is NOT universally true. The default Object.equals() implementation just checks to see if the object reference you pass in is the same as that object's reference. This is seldom what you want. Check the javadocs and make sure it'll work the way you want it to work before you rely on it.

    A related gotcha is that you have to be careful not to use == when you should use .equals(). For example, this does work:

    int foo = 5 ;
    int bar = 5 ;
    if (foo == bar) {
      System.out.println("Yes!") ;
    } else {
      System.out.println("No") ;
    }
    

    But this does not work:

    Integer foo = new Integer(5) ;
    Integer bar = new Integer(5) ;
    if (foo == bar) {
      System.out.println("Yes!") ;
    } else {
      System.out.println("No") ;
    }
    

    In this case, == just tells you whether foo and bar are the same object, not if they have the same int value.

    GOTCHA: If you decide you need to code up your own .equals() method so you can store your objects in a Collection and they'll work the way you want them to, then you also have to implement .hashCode() to work the same way. Classes like HashMap use object.hashCode() to figure out where to put an object. If the .hashCode() method of your custom class doesn't work the same way your .equals() method works, you're in for pain.

    For those cases where you actually do want to keep track of things by object reference, read up on IdentityHashMap.

    If you're working with a class that doesn't have Comparable already implemented on it (see below), you can either add your own compareTo() method to that class, or you can put the comparison logic on a separate object that implements the Comparator interface. TreeMap uses a default Comparator that just calls the key objects' .compareTo() methods, but it may not be convenient for you to go muck about, adding compareTo() everywhere.

    Many of the objects in java.lang already have Comparable implementations - see java.lang.Comparable in your javadocs for an up to date list, but in my javadocs (for 1.4.1) it says the following (but I've ordered the list according to how often I seem to run into these classes in coding):

    Wrapper Classes

    One big difference between a java array primitive and the Collection classes is that array works with primitive data types (int, boolean, char, byte, short, long, float, double). You can have, for example, an array of ints. Collection classes only know about objects (or more specifically, object references), so you can't have, for example, an ArrayList of boolean primitives, you need to use an ArrayList of Boolean object references.

    Note: For each primitive type, there's a corresponding wrapper class in java.lang, e.g. java.lang.Integer, java.lang.Boolean, java.lang.Character, etc.

    On the other end of the equation, another limitation of the Collections is that they only know about objects, not about classes, so your code has to remember what class of object you stuck in there, and cast the object reference back to that class when you pull it out.

    So you have to do a lot of casting when you pull objects out of an ArrayList, and you have to be careful only to put the right (same) class of object into an ArrayList. And sometimes you have to wrap primitive values in their corresponding java.lang wrapper class before sticking it in, and then (after casting the object reference you get back from the ArrayList) pull the primitive value out and return it. Anywhere you're going to be doing a lot of this sort of stuff, it's a good idea to build a wrapper class that has the ArrayList as an instance variable, and makes sure you put the right class of object in, and casts the outgoing objects back to that class.

    The same goes for any other Collections API class, of course.

    One place where wrappers are essential is anywhere you're dealing with keys or values that are primitive types, i.e int, float, etc. These aren't "real" objects, so you can't just stick them in an ArrayList or HashMap. Instead, you have to wrap each one in an instance of its matching java.lang class (e.g. Integer or Float) and stick that in the Collection. Then when you pull the object out, you have to extract the primitive value from it. After you've done this a few times, you'll probably want to just create a wrapper class that does the dirty work for you.

    Note: Java 1.5 has a feature called "autoboxing" that should make it a lot easier to use primitive types with Collection API classes.

    Before I get into an example of how to do this yourself, there's also this library, developed by a regular on freenode.org's #java:

    http://lavender.cime.net/~ricky/constrainedcollections.html

    Here's an example of a wrapper for ArrayList, for handling File objects:

    public class FileList() {
        public List fileList ;
        public List getFileList() {
            if (null == this.fileList) {
              this.fileList = new ArrayList() ;
            }
            return this.fileList
        }
        public void add(File file) {
            getFileList().add(file) ;
        }
        public File get(int index) {
            return (File)(getFileList().get(index)) ;
        }
        public boolean contains(File file) {
            return getFileList().contains(file) ;
        }
        public int indexOf(File file) {
            return getFileList().indexOf(file) ;
        }
        public int size() {
            return getFileList().size() ;
        }
    }
    

    Don't go overboard - add the delegation methods as you find you need them. I do this a fair bit, but I've never gone as far as creating a wrapper Iterator. It wouldn't be that hard - just have a wrapper.iterator() method that instantiates a special wrapper Iterator around the object returned by the ArrayList.iterator():

    public class FileList {
     ...see example above...
        public FileListIterator iterator() {
            return new FileListIterator(getFileList().iterator()) ; 
        }
    }
    public class FileIterator {
        public FileIterator(Iterator iterator) {
          this.iterator = iterator ;
        }
        public Iterator iterator = null ; 
        public boolean hasNext() {
            return this.iterator.hasNext() ;
        }
        public File next() {
            return (File)(this.iterator.next()) ; 
        }
    }
    

    Unfortunately, because the whole point of our specialized FileList and FileIterator is to accept and return File class objects, we can't declare the classes themselves as implementing List and implementing Iterator.

    Note: Java 1.5 has generics, which are supposed to make it easy to create typed Collections, so for example you'd create an ArrayList that just contains Strings. I haven't used it yet, so I couldn't comment on how well it works.

    Meta Collection Objects

    There are a few recurring patterns of Collection wrapper objects that I see:

  • HashMap of Lists
  • HashMap of TreeMaps
  • TreeMap of Lists
  • TreeMap of HashMaps
  • TreeMap of TreeMaps

    If I find time, I'll come back here and put some boilerplate implementations of these up for your convenience. It's kinda annoying that they aren't part of the standard Collections API, but I guess they had enough to do by that point. Meanwhile, before you go reinventing the wheel, check the links above in the section Before We Move On to see if there's already one.

    That's all (so far)

    Please feel free to email me with comments, questions, clarifications, suggestions, or just plain old ego-stroking.

    Why Another Collections API Tutorial?

    This little rant was originally at the beginning of this article, but I decided it was slowing things down, so I moved it down here.

    The java Collections API classes are the third or fourth most important thing you should learn about, when learning java. This is an attempt to give a really quick intro to them.

    When the Collections API first came out, there were a zillion articles on it, but they all got bogged down in explaining the various Interfaces first. For example, this article at javaworld:

    http://www.javaworld.com/javaworld/jw-11-1998/jw-11-collections-p2.html

    The article is three pages long but the first page and a half are about interfaces!

    It took me years to get around to really learning to use the Collections API, and even now I sometimes revert to old reflexes and use Vector or Hashtable instead, in a crunch. I still routinely run into otherwise-competent programmers who don't "get" Collections at all. So this tutorial is going to try a different approach. I'm going to ignore all those Interfaces and get right to the Concrete stuff (I can hear the screams of anguished OO gurus from here :-).

    Note: However, I do recommend you take a look at the javaworld article after you finish reading this article. It also has a nifty little HTML table telling you what concrete Collection classes implement what Collection interfaces.

    Or you may want to jump straight to:

    http://java.sun.com/docs/books/tutorial/collections/index.html


    History Lesson

    As I wrote in the intro, these paragraphs were originally in the intro. Well, several years have gone by and the java Collections stuff has been around long enough that we can pretty much forget about that old stuff, unless you run into it in old code or tutorials. Hopefully by now Vector and Hashtable are kinda irrelevant. So I moved them down here. Enjoy.

    In the first few releases of java, they just had Vector and Hashtable. For those of you familiar with perl, Vector is similar to a perl list (or array), and Hashtable is similar to a perl hash (or associative array). For those of you not familiar with perl, a Vector is like an array, except it grows and shrinks as needed. A Hashtable is a set of key/value pairs, which lets you put in a key and get back a value (a Hashtable also grows and shrinks as needed).

    Both Vector and Hashtable are considered fairly heavyweight and old-fashioned nowadays. For one thing, they're both inherently synchronized, which makes them thread-safe, but back in the day, was relatively performance-expensive, especially if you were using them in a situation where you didn't really need synchronization -- i.e. if only a single thread is using a particular Vector or Hashtable instance. However, I hear that these days, uncontested synchronization is actually quite cheap and fast. But if you need a synchronized list or map, use java.util.Collections.synchronizedList() and java.util.Collections.synchronizedMap() (see further down).

    Somewhere during Java 1.1 they came up with the Collections API, and by Java 1.2 the Collections API were made part of the core java APIs (earlier than that, they were available as a separate downloadable jar).