Ghost in the Shell: PhantomData

GIF from the 1995 anime movie "Ghost in the Shell"
I feel confined, only free to expand myself within boundaries.

Thu Rust compiler is notoriously strict, but this is not without good reason. Everything the Rust compiler does is to help guarantee type and memory safety so that (for the most part) the only bugs that can occur in a program’s runtime are logical in nature. Sometimes, the errors that the compiler throws may seem arbitrary and pedantic. Let’s investigate an interesting case where the compiler complains:

generics_error.rs
Rust
struct Car<Make> { // Unused generic type parameter
color: String,
}

If we attempt to compile code that contains this struct definition, the Rust compiler would throw an error message:

❯ cargo build
Compiling phantom v0.1.0 (/home/workspace/phantom)
error[E0392]: type parameter `Make` is never used
--> src/generics_error.rs:1:12
|
1 | struct Car<Make> {
| ^^^^^ unused type parameter
|
= help: consider removing `Make`, referring to it in a field, or using a marker such as `PhantomData`
= help: if you intended `Make` to be a const parameter, use `const Make: /* Type */` instead
For more information about this error, try `rustc --explain E0392`.
error: could not compile `phantom` (bin "phantom") due to 1 previous error

In the above example, the Car struct has a generic type parameter named Make. This can be used to create a stronger typing regime for delineating instantiations of the Car struct at compile time by tagging them with extra information, i.e., the Make of the car. The Rust compiler, however, is complaining that the type parameter is never used in the struct. Typically, when defining a type parameter for a struct, function, etc., the type parameter would be used to implement a polymorphic member field or perhaps generalize the implementation of a function to allow multiple types. The compiler is telling us that we need to use Make in this way; it expects a member field in the struct to be of the Make type.

What if we don’t want to have to drag along a member field to satisfy the compiler, especially a variable that may not be used? Doing so would unnecessarily increase the size of the struct and could potentially have undesired side effects at runtime. What if we just want to use the type parameter for extra static type checking and nothing else? Luckily, the Rust designers implemented a feature in the Standard Library to accommodate this use case while still appeasing the compiler’s demands.

ghosts.rs
Rust
use std::marker::PhantomData;
struct Car<Make> {
color: String,
make: PhantomData<Make>,
}

But wait, it looks like we ended up making a member field anyways. What gives? Well, what actually happens is that the compiler will use the Make type parameter to perform static type checking, but since it is marked with the PhantomData struct, it gets optimized out after type checking is complete. This is very useful for specialization wherein we can provide distinct behaviors to the same object based on the type parameter value assigned to it at compile time.

ghosts.rs
Rust
use std::marker::PhantomData;
struct Car<Make> {
color: String,
make: PhantomData<Make>,
}
struct Nissan {}
struct Chevrolet {}
struct Ford {}
impl Car<Nissan> {
fn make(&self) -> &str {
"Nissan"
}
}
impl Car<Chevrolet> {
fn make(&self) -> &str {
"Chevrolet"
}
}
impl Car<Ford> {
fn make(&self) -> &str {
"Ford"
}
}
fn main() {
let nissan: Car<Nissan> = Car {
color: "blue".into(),
model: PhantomData,
};
let chevy: Car<Chevrolet> = Car {
color: "silver".into(),
model: PhantomData,
};
let ford: Car<Ford> = Car {
color: "red".into(),
model: PhantomData,
};
println!("{}", nissan.make());
println!("{}", chevy.make());
println!("{}", ford.make());
}

For validation, let’s run it:

❯ cargo run --release --bin ghosts
Compiling phantom v0.1.0 (/home/workspace/phantom)
Finished `release` profile [optimized] target(s) in 0.15s
Running `target/release/ghosts`
Nissan
Chevrolet
Ford

Let’s also prove that the compiler is actually optimizing out the PhantomData. We will compare the sizes of two structs: one with an unmarked member field and one with a marked member field. Everything else about the struct definitions will be the same for control.

spirits.rs
Rust
use std::marker::PhantomData;
use std::mem::size_of;
// Just using this to suppress warnings for cleaner
// terminal output
#[allow(dead_code)]
struct NoGhosts<T> {
id: u32,
data: T,
}
#[allow(dead_code)]
struct Haunted<T> {
id: u32,
data: PhantomData<T>
}
fn main() {
println!("Size of NoGhosts: {}", size_of::<NoGhosts<u32>>());
println!("Size of Haunted: {}", size_of::<Haunted<u32>>());
}

Running this will give us the sizes of the structs so that we can see if the compiler optimized out the phantom type.

❯ cargo run --release --bin spirits
Compiling phantom v0.1.0 (/home/workspace/phantom)
Finished `release` profile [optimized] target(s) in 0.13s
Running `target/release/spirits`
Size of NoGhosts: 8
Size of Haunted: 4

It did! The Haunted struct is half the size of the NoGhosts struct. The phantom type declared in the Haunted struct informs the compiler that it is just being used for static typing, whereas the same (unmarked) field in the NoGhosts struct is actually handled as a member field, thus the size of the struct is increased to allow that data to be stored.

We have appeased the compiler while simultaneously allowing us to perform stronger static type checking and specialization without needing to implement or store a member field using the generic type parameter. Thanks to the Rust standard library for giving us flexibility without sacrificing type or memory safety!

undead.rs
Rust
use std::marker::PhantomData;
// Does not compile because Undead must outlive 'a,
// but its members do not use the lifetime
// struct Undead<'a, T> {
// arbitrary_memory: *const T,
// }
// Corrected implementation that uses phantom data
// to deceive the compiler into thinking we are
// using the lifetime
struct Undead<'a, T> {
arbitrary_memory: *const T,
phantom: PhantomData<&'a T>,
}

In the commented snippet, the Undead struct is marked with a lifetime parameter 'a and a generic type parameter T, but its sole member field, arbitrary_memory, does not use the lifetime parameter. The compiler (rightly) flags this as an error, just like it does with unused generic type parameters.

If we need to have the struct marked with the lifetime parameter without any of its members actually using it (again, this is a typical use case for unsafe Rust code), and we don’t want to unnecessarily increase the size of the struct, we can use the PhantomData struct to feign using the lifetime parameter. Now, the compiler believes that the Undead struct has a reference to the T type parameter with a lifetime bound of 'a, but this is only true during compile time. Since it is now marked as a phantom lifetime, the compiler does its type checking and then optimizes the phantom member field out.

The Rust compiler is very strict when it comes to static type checking, and this is, of course, for many good reasons. Using this knowledge in tandem with the utilities provided in the standard library can yield very powerful results to enable the development of stronger and more granular type checking and specialization with generics. The PhantomData struct is one such example of the standard library’s usefulness. There are many more interesting quirks and features of the Rust compiler and standard library to explore, so if you want to stay informed when posts are published on these topics, subscribe to stay in the know.

  1. https://doc.rust-lang.org/rust-by-example/generics.html ↩︎
  2. https://doc.rust-lang.org/std/marker/struct.PhantomData.html ↩︎
  3. https://doc.rust-lang.org/std/marker/struct.PhantomData.html#unused-lifetime-parameters ↩︎

Discover more from shared_ptr

Subscribe now to keep reading and get access to the full archive.

Continue reading