In this Q&A, we'll go over how Strings get duplicated in Java and how to avoid duplication.
String duplication and de-duplication
Let's first go over how String objects are duplicated. new String() would create a new String object even if the string literal is same. Pl see the code snippet below:
String s = new String("abc");
String s1 = new String("abc");
assertEquals("Reference inequality", false, s==s1);
assertEquals("String equality", true, s.equals(s1));
Java has a String pool. String.intern() method would check if a String is in the pool, If exists, reuses it and creates a new object if String does not exist in the pool.
String s2 = new String("abcd").intern();
String s3 = new String("abcd").intern();
assertEquals("Reference equality", true, s2==s3);
String pool performance
Pool is like a map of WeakReference objects. Java 7 and later, String objects in the pool are garbage collected when there are no other references to the object.
Java 7 and later default String pool size is 60013. Note that its a prime number. It's by design. Tests have proved that lookup and insertion performance is optimal when pool size is a prime number.
Java provides ability to set and view String pool statistics using JVM arguments. Use -XX:StringTableSize to set String pool size and -XX:+PrintStringTableStatistics to view stats.
De-duplication without String.intern()
String intern() is quite useful when the app developer has a good understanding on Strings that need to be interned. If its is not well known, then Java provides an option to de-duplicate Strings by setting JVM command line arguments: :
-XX:+UseG1GC -XX:+UseStringDeduplication.
Pre-requisite for string de-duplication is usage of G1 garbage collector(GC). It cannot be used with parallel or concurrent mark sweep GC. Use -XX:+PrintStringDeduplicationStatistics option to check String de-duplication stats.
This feature is available from Java 8u20. See JEP192 on how de-duplication is done and how it might impact GC pause performance.
Comments
Post a Comment