Generics and type erasure

Generics and Type Erasure on the JVM

Nicolas Frankel Developer Tips, Tricks & Resources Leave a Comment

Introduction

In UML, it’s possible to parameterize types in a class. Those types can then be used in different locations: attribute types, parameter types and return types. This is called a template class.

Here’s an example of such a class in UML:

Here’s an example of such a class in UML

This Foo class should be read as the following:

  • The bar attribute is of type T
  • The baz() method returns a value of type T and requires an argument of type T as well

It means that the attribute, return value and parameter are all constrained by the same type.

At this point, the type T can be bound to a “concrete” type during instantiation:

Foo<String> foo = new Foo<String>();

The baz() method can now be called like this:

String value = foo.baz("argument");

Note that nothing prevents a class from having multiple parameterized types:

Note that nothing prevents a class from having multiple parameterized types

Java implementation

Generics is the direct Java implementation of template classes. They have been available since version 5 (2004).

One major usage that takes advantage of generics was the rework of the Java Collections API to avoid casting:

// Pre-Java 5 
List objects = new ArrayList();
objects.add("One"); 
objects.add("Two"); 
objects.add("Three"); 
Object object = objects.get(0); 
String string = (String) object; // Explicit casting required 
// Java 5 and more 
List<String> strings = new ArrayList<String>(); 
strings.add("One"); 
strings.add("Two"); 
strings.add("Three"); 
String string = strings.get(0); // No more casting!

There have been many changes in the syntax and API throughout Java history. One of the most important guidelines regarding those changes is backward compatibility.

Type erasure

Regarding generics, this means that parameterized types are not stored in the bytecode. This is called type erasure, because parameterized types are “erased”. Generics are enforced at compile-time by the compiler itself.

For example, here’s a sample snippet:

List objects = new ArrayList(); 
List<String> strings = new ArrayList<String>(); 
List<Long> longs = new ArrayList<Long>();

Let’s check the corresponding bytecode:

L0 
 LINENUMBER 9 L0 
 NEW java/util/ArrayList 
 DUP 
 INVOKESPECIAL java/util/ArrayList.<init> ()V 
 ASTORE 1 
 L1 
 LINENUMBER 10 L1 
 NEW java/util/ArrayList 
 DUP 
 INVOKESPECIAL java/util/ArrayList.<init> ()V 
 ASTORE 2 
 L2 
 LINENUMBER 11 L2 
 NEW java/util/ArrayList 
 DUP 
 INVOKESPECIAL java/util/ArrayList.<init> ()V 
 ASTORE 3

Obviously, the bytecode is exactly the same!

Problems of type erasure

Type erasure hinders development in at least two different ways.

Method names

Since generics are not written in the bytecode, they do not affect the signature of a method. Hence, methods that have the same name and the same arguments – stripped of generics, have the same signature.

For example, the following class cannot be compiled because the signature of its methods is the same, though they have different generic types.

public class Invalid { 
 
    public void invalid(List<String> strings) { } 
    public void invalid(List<>Object> objects) { } 
}

The output is:

name clash: invalid(java.util.List<java.lang.Object>) and invalid(java.util.List<java.lang.String>) have the same erasure

Reflection

As generics are not stored in the bytecode, there’s no way to get parameterized types using reflection.

Overcoming type erasure

There are several ways to overcome type erasure.

Changing the method name

The easiest way to work around the method signature collision is to use different names for the methods to get different signatures:

public class Valid { 
 
    public void validStrings(List<String> strings) { } 
    public void validObjects(List<Object> objects) { } 
}

However, this doesn’t solve the reflection issue. It’s still not possible to get the type of the list elements using the Reflection API.

Passing an extra Class parameter

To overcome that, and if it’s not required to have different implementations, the trick is to pass a Class object to match the collection parameterized type:

public class GenericTrick { 
 
   public <T> void withClass(List<T> list, Class<T> clazz { 
       if  (clazz == Integer.class) { } 
       else if (clazz == Date.class) { } 
       else { } 
   } 
}

This way, the compiler enforces that the collection and the class both have the same parameterized type. The class type is written in the bytecode, and thus it can be obtained using reflection.

A naive approach would be to get the first element of the list, check its type and infer that the type of all elements in the list are of this type. Unfortunately, if T has any child class, there’s no way to know for sure whether any element of the list is of type T or a subclass of T.

The extra Class arguments does set the lower bound for T.

Using reflection

The final way is quite tricky. When I mentioned type erasure and that parameterized types cannot be accessed through the Reflection API, I deliberately omitted three cases:

  1. Superclass information, Class.getGenericSuperclass()
  2. Field information, Field.getGenericType()
  3. Method information, Method.getGenericParameterTypes()

Here’s a simple class:

public class ReflectionApi { 
 
    public void withoutClass(List<Date> list) { } 
}

Using the reflection API, it’s possible to get the parameterized type of List – Date, though it’s not for the faint of heart:

// For brevity's sake the code has been stripped of exception handling and does no check before casting
Method method = ReflectionApi.class.getDeclaredMethod("withoutClass", List.class); 
Type[] parameterTypes = method.getGenericParameterTypes(); 
Type parameterType = parameterTypes[0]; 
ParameterizedType parameterizedType = (ParameterizedType) parameterType; 
Type[] typeArguments = parameterizedType.getActualTypeArguments(); 
for (Type typeArgument : typeArguments) { 
    System.out.println(typeArgument.getTypeName()); 
}

This correctly yields:

java.util.Date

Note that relevant data can only be obtained if the parameterized type is “real”. For example, tf the method signature is changed to public <T> void withoutClass(List<T> list), the previous code now outputs:

T

Obviously, this cannot be considered helpful.

Kotlin’s approach

By design, Kotlin aims to generate Java-compatible bytecode: it also suffers from type erasure. Hence, it has the same issue as Java regarding method signature clash and reflection.

Here’s a sample of Kotlin code:

val objects = ArrayList<Any>() 
val strings = ArrayList<String>() 
val longs = ArrayList<Long>()

It yields the following bytecode, same as in Java:

L0 
 LINENUMBER 7 L0 
 NEW java/util/ArrayList 
 DUP 
 INVOKESPECIAL java/util/ArrayList.<init> ()V 
 ASTORE 1 
 L1 
 LINENUMBER 8 L1 
 NEW java/util/ArrayList 
 DUP 
 INVOKESPECIAL java/util/ArrayList.<init> ()V 
 ASTORE 2 
 L2 
 LINENUMBER 9 L2 
 NEW java/util/ArrayList 
 DUP 
 INVOKESPECIAL java/util/ArrayList.<init> ()V 
 ASTORE 3 
 L3

Likewise, the GenericTrick Java class above can be translated directly to Kotlin like this:

class GenericTrick { 
 
    fun <T: Any> withClass(list: List<T>, clazz: KClass<T>) { 
        when (clazz) {
            Int::class -> { } 
            Date::class -> { } 
            else -> { }
        } 
    } 
}

However, Kotlin offers a great way to do without the extra parameter. It means T can be used as it is. This is achieved by using the reified keyword, while declaring T. There’s a caveat though: reified generics can only be used when functions are inlined.

A note on inline function

inline functions are special in that they are not called. Instead, the compiler sort of copy-pastes the code of the inline function where it should have been called.

For example, here’s a snippet using inline:

fun foo() { 
    bar() 
} 
 
inline fun bar() { 
    doSomething() 
}

The compiler will replace it with the following:

fun foo() { 
    doSomething() 
}

Of course, this specific example is not very useful, but it’s meant to show how it works.

Let’s rewrite the above GenericTrick class using reified generics:

class GenericTrick { 
 
    inline fun <reified T: Any> withClass(list: List<T>) { 
        when (T::class) {
            Int::class -> { } 
            Date::class -> { } 
            else -> { }
        } 
    } 
}

Conclusion

This post showed the limitations of using parameterized types on the JVM, most importantly type erasure, and some ways to overcome them. It also showed how Kotlin improves the situation in some specific scenarios compared to Java.

To go further:

About Nicolas Frankel

Nicolas Fränkel is a Developer and Software Architect with 15 years of experience. He also double as a teacher in universities and higher education schools, a trainer and triples as a book author. He shares his thoughts weekly in his blog, A Java Geek