I asked a question similar to this one a few days ago: C# Does an array of objects store the pointers of said objects?
Now, what I'm asking is a bit more code related. A brief overview, I'm trying to store many objects into a list and the said objects may be very large (1 ~ 2 GB). From my research, a .NET data structure only has 2 GB max of memory. So my question is, am I able to fit more than 1 or 2 objects into a list?
Scenario: I have a human class and I'm trying to store it in a list.
Are these two lines of code different?
List<Human> humanList = new List<Human>();
Human human = new Human(attr1, attr2, attr3);
humanList.Add(human)
vs.
List<Human> humanList = new List<Human>();
humanList.Add(new Human(attr1, attr2, attr3))
Is the first set of code using a reference to the human object so I'll be able to store more objects in the list? Similarly, is the second set of code trying to store the whole human object into the list?
Your two code samples are equivalent (assuming you aren't doing anything else with the human variable after you create it). The new keyword creates a reference which is stored in the list - you're just creating that reference inline instead of storing it in a variable.
Assuming Human is a class and not a struct, the list will contain references to the objects, not the objects themselves.
The size of a Human object will have less bearing on how many references you can store in the list (overall system memory will likely be more of an issue that the 2GB .NET limit).
Both code examples have the same effect.
Both examples store references to your objects.
You will be able to store many more than 1 or 2 objects in the list.
This gets managed by the .Net CLR and code optimization. Your code would be changed in either scenario to store a reference to the human object. The only difference here is that the first example has a hook to the reference for the local method.
Related
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Which is best for data store Struct/Classes?
Consider the example where I have an Employee object with attributes like age, name, gender, title, salary. I now have a list i want to populate with a bunch of Employees (each Employee instance is unique).
In terms of just speed and memory footprint, is it more preferable to create the employee as a Struct or a Class?
Any additional caveats regarding Struct vs Class in the above scenario are welcome
Structs are to be used only for relatively small structures that should have value-like behaviour.
Class and struct differences
Choosing Between Classes and Structures
Do not define a structure unless the type has all of the following
characteristics:
It logically represents a single value, similar to primitive types
(integer, double, and so on).
It has an instance size smaller than
16 bytes.
It is immutable.
It will not have to be boxed frequently.
Your type breaks the first two guidelines, and probably also the third. So you should definitely use a class here.
It isn't as simple as that - you need to describe the behaviour. For example, a struct will copy itself as soon as you like, so in some ways they can consume more memory. However, an array of structs as a raw data dump can avoid a (very small) object header (and an extra dereference), so can have efficiencies.
A bigger issue, though, is in how you conceptualise them; an Employee is not a value; if I assign:
var emp = GetEmployee();
var tmp = emp;
does that mean I should now have 2 employees? It proabably doesn't. Now what if I change:
tmp.Name = "Fred";
whether this impacts emp depends on whether it is struct or class. I wager it should be class here. I also propose that a struct should almost always be immutable to avoid this type of data-loss scenario. There is case for mutable structs, but it is so often used incorrectly that I don't want to accidentally encourage you to do that.
Other issues arise with encapsulation with structs; consider that an employee has a manager; is this possible?
[struct|class] Manager {
private Manager boss;
}
this only works for a class. Again; structs are not well suited to this type of usage. Nor polymorphism, abstraction, etc.
That depends, probably a class would be fine in this case. Hope below points help you to decide.
Use of structure in following condition is desirable.
When you have small data structure
Data Structure does not require to extent the functionality.
When you want to perform faster operation on data structure. (As structure are stored on stack, operation performed on it are faster.)
Use of class in following condition is desirable.
When you have complex data structure
Data Structure requires to extend functionality.
When you want to optimize memory usage. (As class is stored on heap, they are about to be garbage collected when no reference pointing to an object.)
Look at here for more details.
I'm fairly new to C#. In C++ if I wanted two collections that contained some or all of the same data it's really easy. For example, you just create the objects on the heap and use a collection of (auto) pointers in each collection. C# doesn't seem to have a concept of pointers so how do you do the same thing in C#?
One collection (proabably an array) will contain all objects. The other (probably a queue) will contain a subset of what is in the array. Eventually the objects will be removed from the queue but remain in the array.
This, I am sure, is a really simple question but I'm still getting my head around the differences between C++ and C#.
C# has pointers in an unsafe context as you're used to in C++. However, most complex objects are passed by reference in C# to begin with, meaning (simplified) that a single object you add to two collections will be the same object. Strings and integers, among others, will be value types and strings, for instance, will be immutable.
More on types in C#: http://msdn.microsoft.com/en-us/library/3ewxz6et.aspx
Lengthy blogpost on immutability: http://blogs.msdn.com/b/ericlippert/archive/2007/11/13/immutability-in-c-part-one-kinds-of-immutability.aspx
C# has a garbage collector that takes care of memory and reference management for you, and any orphaned references will usually be cleaned up within a reasonable amount of time.
More on memory management: http://msdn.microsoft.com/en-us/library/f144e03t(v=VS.100).aspx
May be you need to understand reference and value types first.
http://www.codeproject.com/KB/cs/Types.aspx
In C# data is by default referenced (not copied, unless it is a struct (ValueType)), so if you assign a variable in one place with another object it is a reference which is created.
Example:
class C {}
var a = new C();
var b = a;
//a and b points to the same object
Anyways you can use pointer in C# in an unsafe context (though not recommended).
Almost all objects in C# are constructed on the heap, and are accessed via pointers. The . is used almost like -> in C++. IMHO its just works more naturally.
When thinking about the "pointers", a collection doesn't contain data, it references data.
So in your example, it will just work as you described it.
I have implemented an evolutionary algorithm in C# which kind of works but I have the assumption that the cloning does not. The algorithm maintains a population of object trees. Each object (tree) can be the result of cloning (e.g. after ‘natural selection’) and should be a unique object consisting of unique objects. Is there a simple way to determine whether the ‘object population’ contains unique/distinct objects – in other words whether objects are shared by more than one tree? I hope my question makes sense.
Thanks.
Best wishes,
Christian
PS: I implemented (I think) deep copy cloning via serialization see:
http://www.codeproject.com/KB/tips/SerializedObjectCloner.aspx
The way to verify whether two objects are the same objects in memory is by comparing them using Object.ReferenceEquals. This checks whether the "Pointers" are the same.
Cloning in C# is by default a shallow copy. The keyword you probably need to google for tutorials is "deep cloning", in order to create object graphs that don't share references.
What about following: Add a static counter to every class for member references of your 'main tree class'. Increment counter it in every counstructor. Determine how many of a 'subobjects'should be contained in tree object (or all tree-objects) and compare that to the counter.
OK, for first, let me see if I get you correctly:
RESULT is a object tree that has some data.
GENERATION is a collection of a result objects.
You have some 'evolution' method that moves each GENERATION to the next step.
If you want to check if one RESULT is equal to the other, you should implement IComparable on it, and for each of its members, do the same.
ADDITION:
Try to get rid of that kind of cloning, and make the clones manually - it WILL be faster. And speed is crucial to you here, because all heuristic comes down to muscle.
If what you're asking is "how do i check if two object variables refer to the same bit of memory, you can call Object.ReferenceEquals
In a couple of weeks, I'll be teaching a class of first-year engineers the salient points of references in C# as part of their first-year programming course. Most of them have never programmed before, and had enough trouble learning objects, so teaching references is going to be an uphill battle. I plan to have lots of examples available for the students to go through on their own, but just showing a bunch of examples tends to be pretty overwhelming if the the underlying concept doesn't 'click'.
So I'll put the question out to the SO community: what's the best way you've seen references taught? What made it 'click' for you? Is there any reference-related material that I'm missing?
My tentative lesson plan is:
What is a reference (using an argument like Eric Lippert's)
References and the Garbage Collector
Reference Types and Value Types
Immutable Types
Passing by Reference versus Passing by Value (and all of the subtleties of object references being passed by value)
A handful of nasty examples that produce unexpected results if you don't understand 1-5.
One way that I've heard it explained is to use a cell phone or walkie-talkie. You (the instructor) hold one end and declare that you are an object instance. You stay in one place (ie. the heap) while the students pass the other end (which is on speaker phone if it's a cell phone) around the classroom.
They can interact with you through the "reference" they have to you, but they don't really have "you" in their possession.
Binky! (or # http://cslibrary.stanford.edu/104/)
I like the URL analogy that describes the differences between Reference and Value types. You can pass around a URL as a reference to some content. You can modify that URL without modifying that content. You can also get to the content via the URL to perhaps modify the content.
This is a useful reference:
http://www.yoda.arachsys.com/csharp/parameters.html
Try to explain references with figures, as pure text sometimes don't get through to most people. Many resources and books on the topic, do try to explain through figures as it is difficult to relate allocation through verbal communication alone (this is mostly an issue of attention span for most people).
At least try to point out how objects relate to each other, a simple example would be a simple reference.
Given:
class A {
B b = new B();
}
class B {
int mine = 1;
}
When instantiating class A as object a from some context the following figure will illustrate how it will all look in the heap. The point of the illustration is to show how the different objects relate to each other and have a mental model for how the heap works.
+-A-----+
a: *---->| |
| | +-B--------+
| b: *--+-->| |
| | | mine: 1 |
+-------+ | |
+----------+
Also try to explain the difference between heap and stack allocation. Calling a method with parameters. Simple example would be something like this:
Given the following method:
public void doSomething(B b) {
int doMine = b.mine + 1;
}
When calling doSomething and letting it do it's stuff, at the end doSomething's stack will look something like below. The point showing that objects do not directly reside inside a stack, but it is just referred to an object in the heap and objects are shared through references.
whoever called doSomething *
|
v
+-doSomething-+ +-B--------+
| b: *--------+-->| |
|-------------| | mine: 1 |
| doMine: 2 | +----------+
+-------------+
Another illustrative example would be illustrating an array which is an object, and a multidimensional array contains an array of arrays.
I found this article really useful for explaning parameter passing in C#. The article also does a good job explaining value and reference types in general terms.
It's more of a visual representation which helped me a lot.
Pictures and diagrams.
People form mental images of the concepts they're learning, and a visual representation of references and their relation to their associated objects is a good way to start. Likewise, visualizing object as containing member variables (which includes references to other objects) and member methods, a la UML diagrams, is very helpful.
Later, you can delve into the details of how references and primitive types are actually implemented, if you feel the need to do so. But delay these discussions as long as possible, as people can get bogged down in trying to pair abstract concepts to the representational details, which distracts from learning the abstract concepts.
When I was learning VB6, references actually confused me a bit. Then I tried learning C++, and after dealing with pointers, references made perfect sense to me. Understanding it from a what-is-actually-happening perspective was easier to me than understanding it from an oo-concepts perspective. Maybe you can go over the under-the-hood stuff in your lesson.
I would suggest minimizing one's use of the bare term "reference" altogether, since it can be used in .net to refer to two very different things: the content of class-type storage locations, and parameters passed with a "ref" qualifier. Use the term "object reference" for the former, and "ref parameter" for the latter.
In describing what an "object reference" is, I would suggest using the term "object ID". Object ID's have a few things that make them different from "addresses":
One can't do very many things with object ID's. One can test whether one is blank, check whether two of them are equal, copy one to a storage location of suitable type, or look up the object referred to by one and ask it to do something. Most requests to do something with a class-type value or variable are really requests to do something with the referred-to object. Note that one cannot manipulate an ID of one object in such a way as to get the ID of another, as one can do with addresses.
While the system must have a means of converting object ID's to addresses, there is no guarantee that it will use any particular means of doing so. Nor is there any guarantee that the bit pattern associated with any object ID won't spontaneously change; all that is guaranteed is that if the bit pattern changes, the new pattern will refer to the same object as the old.
The system keeps track of every place that object ID's are stored. As long as any copy of an Object ID exists, that object ID will never refer to anything other than the object instance for which it was created. By contrast, in general, systems that use addresses for things do not track every single place where an address might be copied. It's possible that an object might cease to exist while somebody still has a copy of its address, and some new object might be created with the same address.
This question already has answers here:
When should I use a struct rather than a class in C#?
(31 answers)
Closed 9 years ago.
Duplicate of: When to use struct in C#?
Are there practical reasons to use structures instead of some classes in Microsoft .NET 2.0/3.5 ?
"What is the difference between structures and classes?" - this is probably the most popular question on intrviews for ".NET developer" vacancies. The only answer that interviewer considers to be right is "structures are allocated on stack and classes are allocated on heap" and no further questions are asked about that.
Some google search showed that:
a) structures have numerous limitations and no additional abilities in comparison to classes and
b) stack (and as such
structures) can be faster on very specialized conditions including:
size of data chunk less that 16 bytes
no extensive boxing/unboxing
structure's members are nearly immutable
whole set of data is not big (otherwise we get stack overflow)
(please correct/add to this list if it is wrong or not full)
As far as I know, most typical commercial projects (ERM, accouting, solutions for banks, etc.) do not define even a single structure, all custom data types are defined as classes instead. Is there something wrong or at least imperfect in this approach?
NOTE: question is about run-of-the-mill business apps, please don't list "unusual" cases like game development, real-time animation, backward compatibility (COM/Interop), unmanaged code and so on - these answers are already under this similar question:
When to use struct?
As far as I know, most typical commercial projects (ERM, accouting, solutions for banks, etc.) do not define even a single structure, all custom data types are defined as classes instead. Is there something wrong or at least imperfect in this approach?
No! Everything is perfectly right with that. Your general rule should be to always use objects by default. After all we are talking about object-oriented programing for a reason and not structure-oriented programing (structs themselves are missing some OO principles like Inheritance and Abstraction).
However structures are sometimes better if:
You need precise control over the amount of memory used (structures use (depending on the size) a little bit to FAR less memory than objects.
You need precise control of memory layout. This is especially important for interop with Win32 or other native APIs
You need the fastest possible speed. (In lots of scenarios with larger sets of data you can get a decent speedup when correctly using structs).
You need to waste less memory and have large amounts of structured data in arrays. Especially in conjunction with Arrays you could get huge amount of memory savings with structures.
You are working extensively with pointers. Then structures offer lots of interesting characteristics.
IMO the most important use case are large arrays of small composite entities. Imagine an array containing 10^6 complex numbers. Or a 2d array containing 1000x1000 24-bit RGB values. Using struct instead of classes can make a huge difference in cases like these.
EDIT:
To clarify: Assume you have a struct
struct RGB
{
public byte R,G,B;
}
If you declare an array of 1000x1000 RGB values, this array will take exactly 3 MB of memory, because the values types are stored inline.
If you used a class instead of a struct, the array would contain 1000000 references. That alone would take 4 or 8 MB (on a 64 bit machine) of memory. If you initialized all items with separate objects, so you can modify the values separately, you'd habe 1000000 objects swirling around on the managed heap to keep the GC busy. Each object has an overhead (IIRC) of 2 references, i.e. the objects would use 11/19 MB of memory. In total that's 5 times as much memory as the simple struct version.
One advantage of stack allocated value types is that they are local to the thread. That means that they are inherently thread safe. That cannot be said for objects on the heap.
This of course assumes we're talking about safe, managed code.
Another difference with classes is that when you assign an structure instance to a variable, you are not just copying a reference but indeed copying the whole structure. So if you modify one of the instances (you shouldn't anyway, since structure instances are intended to be immutable), the other one is not modified.
All good answers thus far...I only have to add that by definition value types are not nullable and hence are a good candidate for use in scenarios where you do not want to be bothered with creating a new instance of a class and assigning it to fields, for example...
struct Aggregate1
{
int A;
}
struct Aggregate2
{
Aggregate1 A;
Aggregate1 B;
}
Note if Aggregate1 were a class then you would have had to initialize the fields in Aggregate2 manually...
Aggregate2 ag2 = new Aggregate2();
ag2.A = new Aggregate1();
ag2.B = new Aggregate1();
This is obviously not required as long as Aggregate1 is a struct...this may prove to be useful when you are creating a class/struct heirarchy for the express purpose of serialization/deserialization with the XmlSerializer Many seemingly mysterious exceptions will disappear just by using structs in this case.
If the purpose of a type is to bind a small fixed collection of independent values together with duct tape (e.g. the coordinates of a point, a key and associated value of an enumerated dictionary entry, a six-item 2d transformation matrix, etc.), the best representation, from the standpoint of both efficiency and semantics, is likely to be a mutable exposed-field structure. Note that this represents a very different usage scenario from the case where a struct represents a single unified concept (e.g. a Decimal or DateTime), and Microsoft's advice for when to use structures gives advice which is only applicable to the latter one. The style of "immutable" structure Microsoft describes is only really suitable for representing a single unified concept; if one needs to represent a small fixed collection of independent values, the proper alternative is not an immutable class (which offers inferior performance), nor a mutable class (which will in many cases offer incorrect semantics), but rather an exposed-field struct (which--when used properly--offers superior semantics and performance). For example, if one has a struct MyTransform which holds a 2d transformation matrix, a method like:
static void Offset(ref it, double x, double y)
{
it.dx += x;
it.dy += y;
}
is both faster and clearer than
static void Offset(ref it, double x, double y)
{
it = new Transform2d(it.xx, int.xy, it.yx, it.yy, it.dx+x, it.dy+y);
}
or
Transform2d Offset(double dx, double dy)
{
it = new Transform2d(xx, xy, yx, yy, dx+x, dy+y);
}
Knowing that dx and dy are fields of Transform2d is sufficient to know that the first method modifies those fields and has no other side-effect. By contrast, to know what the other methods do, one would have to examine the code for the constructor.
There have been some excellent answers that touch on the practicality of using structs vs. classes and visa-versa, but I think your original comment about structs being immutable is a pretty good argument for why classes are used more often in the high-level design of LOB applications.In Domain Driven Design http://www.infoq.com/minibooks/domain-driven-design-quickly there is somewhat of a parallel between Entities/Classes and Value Objects/Structs. Entities in DDD are items within the business domain whose identity we need to track with an identifier, e.g. CustomerId, ProductId, etc. Value Objects are items whose values we might be interested in, but whose identity we don't track with an identifier e.g Price or OrderDate. Entities are mutable in DDD except for their Identity Field, while Value Objects do not have an identity.So when modeling a typical business entity, a class is usually designed along with an identity attribute, which tracks the identity of the business object round trip from the persistance store and back again. Although at runtime we might change all the property values on a business object instance, the entity's identity is retained as long as the identifier is immutable. With business concepts that correspond to Money or Time, a struct is sort of a natural fit because even though a new instance is created whenever we perform a computation, that's ok because we aren't tracking an identity, only storing a value.
sometime, you just wanna transfer data between components, then struct is better than class. e.g. Data Transfer Object(DTO) which only carry data.