Crash ! Boom ! Bang ! What Happens When A Serializable Object Contains a Non-Serializable Field?

This blog was created by Arafat Tanin, Software Security Engineer, OpenRefactory and edited by Charlie Bedard.

Introduction
In the enchanting realm of Java, a powerful sorcery, known as serialization, enables objects to transcend their earthly forms and be reborn as byte streams.

During my recent days working as a part of a team that is looking into various kinds of bugs in Java code, I have encountered numerous inconsistencies in the utilization of Serialization-Deserialization within many well-known open source projects. We found that in many projects a Serializable class is declared but it contains a non-Serializable field.

At first glance, the code may not appear to have significant security implications. However, our findings suggest otherwise. These issues have the potential to lead to program crashes during runtime and, in more severe cases, can result in greater security vulnerabilities, including the possibility of remote code execution through insecure deserialization.

In this blog post, we will delve into a critical logical issue associated with the serialization process, specifically addressing what occurs when a serializable object contains a non-serializable field and its implications.

Serialization
First, let us briefly review the concept of serialization in Java.

Serialization is the process of converting an object into a byte stream that can be saved to a file, sent over a network, or stored in a database. Deserialization is the reverse process, where a byte stream is used to recreate an object. Not all objects can be serialized, and not all fields of an object should be serialized.

By default, when you make a class implement the Serializable interface, all of its non-static, non-transient fields are considered serializable, meaning their values will be included in the serialized representation of the object.

Some languages, such as Java, use binary serialization formats. This is more difficult to read, but serialized data still can be identified if you know how to recognize a few tell-tale signs. For example, serialized Java objects always begin with the same bytes, which are encoded as aced in hexadecimal and ro0 in Base64. It comes handy during other types of attacks like Insecure Deserialization. To know more about Insecure Deserialization see here.

To verify the statement, let’s consider the following code snippet:
The output of the code snippet will be like:

Transient in Java
Transient is a variable modifier that is used during serialization. It is used to mark any member variable which is not to be serialized when it is persisted to a stream of bytes. By default, all of the object’s variables that are serializable get converted into a persistent state during the process of serialization. When, for different use cases or from different design perspectives, it is not desirable to persist the variables, then they are declared as transient. If the variable is declared as transient, then it will not be persisted. That is the main purpose of the transient keyword.

Why is the Transient modifier used?

The transient keyword in Java is used to indicate that a field should not be serialized. Here are some common scenarios where you might want to use transient:
  1. Security

    Fields that contain sensitive data, such as passwords, should be marked as transient to prevent them from being included in the serialized object. This helps protect sensitive information.
  2. Caching or Temporary Data

    If a field contains data that can be easily recomputed or is meant for temporary caching, marking it as transient can help reduce the size of the serialized object and improve performance.
  3. Non-Serializable Objects

    If a field references an object that is not serializable, making that field transient is essential to avoid serialization errors.
  4. Derived Data

    Fields that can be derived from other fields or can be recalculated should be marked as transient to avoid redundancy in the serialized form.
In this blog post, we will explore the use case of the transient keyword when dealing with references to non-serializable objects within a serializable class.

Transient with Non-Serializable Fields

Use it with fields which are not marked as Serializable inside the JDK or application code. This is because classes which do not implement the Serializable interface, and are referenced within any Serializable class, cannot be serialized and will throw a java.io.NotSerializableException exception. Note that these non-serializable references should be marked transient before serializing the main class. We can illustrate the value of using transient via an example. Let’s consider the following non-serializable class instance:

Now, imagine a class which chooses to have a field of type X. Let’s consider the following serializable class Y:

Clearly, class Y is serializable and, inside this serializable class, there is a non serializable private field of type X. For any Java program which uses class Y and needs to serialize its instances, there are 3 cases to consider:
  1. There is a reference to an instance of class X within some particular instance of class Y;
  2. There is no reference to an instance of class X within some particular instance of class Y; and
  3. There is a reference to an instance of class X within an object of class Y where the class X field is declared as transient.
We will look into each scenario and will illustrate the differences between the behaviors.

Scenario 1: Reference of non-serializable object inside a serializable object

Here is an example program using class Y:

In this code snippet, the constructor used to construct this instance of Y is a zero-parameter constructor. Consequently, a private instance of type X was initialized. So, whenever we are trying to serialize this instance of Y, it will throw a “java.io.NotSerializableException” as the instance of X is neither serializable nor transient. The output during the serialization will be:

Here is a video that shows the execution and the source code used to generate this example.

But, what if it does not contain any reference of X? This is scenario 2…

Scenario 2: No Reference of non-serializable object inside a serializable object

Here is another example of a program using class Y.
In this code snippet, the constructor was invoked using an argument which bypassed the creation of the private field of type X. Consequently, this instance of Y does not contain any object reference to an instance of type X, so during the serialization process, it will not cause an exception because the JVM does not get disturbed about non-serializable fields.

Here is a video that shows the execution and the source code used to generate the example.

Now, what if the instance of Y contains a reference to an instance of X but where the field was marked as transient?

Scenario 3: Reference of non-serializable transient object inside a serializable object

If the non-serializable field is transient, the purpose of being transient having been discussed earlier, then it will not cause any problem during the process of serialization. Also, in this case, the JVM understands how to correctly handle this non–serializable field thanks to the transient qualifier. To prove the point, consider the following modified instance of class Y where the private instance of X was declared as transient:

Now, it does not matter if we use either the empty or the parameterized constructor to create an instance of type Y, as the instance of X is marked as transient. The JVM will correctly handle the serialization process and the code will compile and execute successfully.

Here is a video that shows the execution and the source code used to generate this case.

Conclusion

In Java, when working with serializable classes, the decision to use transient for class variables depends upon the specific requirements of the application and the nature of the data someone is working with. Use transient for fields that should not be included in the serialized form, such as sensitive data or temporary caches. 

Understanding the implications of these keywords is crucial for ensuring the correctness, security, and performance of serialized objects. By making informed decisions about which fields to mark as transient or not, someone can create efficient serializable classes in Java applications.

In summary, the careful use of transient ensures that serialized objects are both accurate and secure, while also optimizing performance where necessary.

End Note:

  • Image at the start is the cover image of the popular Nintendo DS game.
  • The work done by OpenRefactory has been supported by a grant from the Alpha Omega project. 

Recent Posts