2.06.2007

C# Objects - Does it create a new one?

In response to whether a new object is created during a particular type of loop, I gave the following answer. It has to do with what type of variable you are using, but sometimes the CLR creates a new object when you wouldn't expect it. Strings are one well-known cause of this, but it's important to understand the reasons why. The string is a special case because of the way C# implements it as an object, not because of some factor of the C# language or the compiler itself - it's an implementation detail, not a language feature. Most object references will not result in a new instance of the object simply from changing its value. For string though...

string s = "myString";
s = "another value";

That results in two instances of the string being created (and the one that says "myString" is now "lost" and will be garbage-collected). This is because strings are implemented as immutable objects, meaning their value can never be changed. In order to allow for normal programming constructs, the CLR simply creates a new string whenever the value is changed, discards the old string and assigns the new string to the reference. There are historical reasons for choosing this implementation, security being one of them (notably, the buffer overrun exploit). The .Net Framework provides the StringBuilder class to overcome performance limitations with strings of this type. For most other types of objects, that doesn't happen. If it's an int, you only get one instance, no materr how often you change the value.

int i = 25;
i++;
i = i+ 20;
for (i = 0;i < 20; i++) {
 //do some stuff
}

That whole block only results in one instance of i, since you're only changing the value. This:

for (int i = 0; i < 20; i++ {
 //do some stuff
}

This also results in only one instance of i. This on the other hand:

for (int i = 0; i < 20; i++) {
 int k = 10;
 //do some stuff with k
}

That will result in 20 instances of k, all of which will be garbage-collectable at the end of the loop. The type doesn't matter there. If you did this with a reference type, a string, whatever... you will get 20 instances of the thing, and they will all be thrown to the garbage collector, unless you pass them as a reference to some other thing. In some cases... this is what you want. Like this:

for (int i = 0; i < 20; i++) {
  ListItem li = new ListItem();
  //add the li to a ListView
  listView1.Items.Add(li);
}

That creates 20 instances of a ListView object (maybe 40, see below), but they will not be garbage-collected after the loop because someone else (the ListView) is holding a reference to those 20 new things. The fact that I called it 'li' each time is of no consequence, I can recycle the variable name as much as I want. Now, here's where it gets confusing. If I did this the other way...

ListItem li = new ListItem();
for (int i = 0; i < 20; i++) {
  //set some values for li and add it to the view
  listView1.Items.Add(li);
}

This will not create 20 instances of li, there will only be one and I can change its values around all I want. So... am I adding the same object to the ListView 20 times, or is the ListView making a copy of the object 20 times, resulting in 21 ListItem objects total? I know it will result in the ListView showing 20 things and they can all have different values... so I think it's generating a copy with the Add() method. I am pretty sure of that, but I'm asking if anyone knows for sure.

My overall point is that you can't use the rule "ref type, one behavior, value type, other behavior", because the situation is more complicated than that. You can see there are some obvious cases where you get a new instance, but there's some other, less obvious cases (such as string), where you get new instances because of some implementation detail of the objects involved. With string, I get a new instance because string is implemented to create a new one when the value is changed, with my ListItem, I'm getting a new instance because of the implementation of the ListItemCollection which I'm adding it to. Am I right about this?

The person who asked this question was asking what is "best" in their case, but it's pretty hard to figure out what is best, and it won't be best in all cases. However, if you know how things operate at a lower level, you can decide what's best with much more confidence. BTW, this is one thing I don't like about .Net (not C# really, I think .Net is the Evildoer here)... I'm not always clear about the performance costs of particular methods, since the .Net Framework likes to create new object instances willy-nilly like memory has no limits.

Clear as mud?
Jasmine :)

No comments:

 
hit counter script