String interning  
Jeff Key
10/16/2002 3:13:54 AM
I looked into string interning earlier this evening based on an
incorrect assumption I'd made regarding it.  I revisited Richter and
wrote up a bit on it:

---

The CLR maintains a hashtable of strings on a per-process basis; they're
shared across all app domains in that process.  When a method is JITted,
the runtime takes all the strings in that method and adds them to the
hashtable, if they don't already exist.  This lessens the number of
string refernces necessary in the process, sometimes significantly.

The following code returns true because of this:

	string x = "hi";
	Object.ReferenceEquals("hi", x);

(Remember that strings are reference types, so this would return false
for any other kind of reference type.)

The following code would return false, because the "x" string is
constructed at runtime and thus isn't part of the intern pool ("h" and
"i" WILL be part of the table):

	string y = "h";
	string x = y + "i";
	Object.ReferenceEquals("hi", x);

You can add strings to the pool at runtime by calling the String.Intern
method.  The Intern method returns the instance of the input string from
the intern pool (if it doesn't exist in the pool, it is added).  The
following will return true:

	string y = "h";
	string x = String.Intern(y + "i");
	Object.ReferenceEquals("hi", x);

The String class also gives us the IsInterned method that, given a
string parameter, returns a reference to the interned string if it is
indeed interned already, or null if it isn't.

I had incorrectly assumed that all strings were interned -- even those
created at run time.  Going off of that assumption, I had also assumed
that String.Equals and the overridden equality operator had the same
behavior as Object (ie reference equality).  I was wrong.  Bad
programmer!  Equality is evaluated on a per-character basis.  Ouch!

Now you're probably thinking to yourself "well, that's all fine and
dandy, but what about the select statement?".  Our friend the C#
compiler helps us out here, too.  Consider the following:

	string y = "h";
	string x = "i";

	switch (y + x)
	{
		case "hi":
			Console.WriteLine("hi");
			break;
		default:
			Console.WriteLine("not hi");
			break;
	}
	
Assuming what we've learned so far, you'd expect two things:  1) C#
wouldn't resort to doing an actual string comparison, as it's too
expensive and 2) you would therefore wind up in the default case, as y +
x is NOT "hi".  The compiler actually inserts a call to IsInterned in
the switch statement expression, so in IL it is actually evaluated as

	switch (String.IsInterned(x + y))
	
Thank you, Mr. Framework.

-jk
		
Back to dotnet