C# Using Two Collections That Contain The Same Data - c#

I'm fairly new to C#. In C++ if I wanted two collections that contained some or all of the same data it's really easy. For example, you just create the objects on the heap and use a collection of (auto) pointers in each collection. C# doesn't seem to have a concept of pointers so how do you do the same thing in C#?
One collection (proabably an array) will contain all objects. The other (probably a queue) will contain a subset of what is in the array. Eventually the objects will be removed from the queue but remain in the array.
This, I am sure, is a really simple question but I'm still getting my head around the differences between C++ and C#.

C# has pointers in an unsafe context as you're used to in C++. However, most complex objects are passed by reference in C# to begin with, meaning (simplified) that a single object you add to two collections will be the same object. Strings and integers, among others, will be value types and strings, for instance, will be immutable.
More on types in C#: http://msdn.microsoft.com/en-us/library/3ewxz6et.aspx
Lengthy blogpost on immutability: http://blogs.msdn.com/b/ericlippert/archive/2007/11/13/immutability-in-c-part-one-kinds-of-immutability.aspx
C# has a garbage collector that takes care of memory and reference management for you, and any orphaned references will usually be cleaned up within a reasonable amount of time.
More on memory management: http://msdn.microsoft.com/en-us/library/f144e03t(v=VS.100).aspx

May be you need to understand reference and value types first.
http://www.codeproject.com/KB/cs/Types.aspx

In C# data is by default referenced (not copied, unless it is a struct (ValueType)), so if you assign a variable in one place with another object it is a reference which is created.
Example:
class C {}
var a = new C();
var b = a;
//a and b points to the same object
Anyways you can use pointer in C# in an unsafe context (though not recommended).

Almost all objects in C# are constructed on the heap, and are accessed via pointers. The . is used almost like -> in C++. IMHO its just works more naturally.
When thinking about the "pointers", a collection doesn't contain data, it references data.
So in your example, it will just work as you described it.

Related

Where should I store different types of value?

After Python and JavaScript I started using C# and can't understand some basic concepts.
In Python and JavaScript I used to store everything in a heap without thinking about the type of object. But in C# I can't create Dictionary or List with different type of object.
I want to store some mouse and keyboard events. For that, I use instances of class, like this:
class UserActionEvent
{
public MacroEventType Type;
public int[] MouseCoordinate = new int[2];
public string MouseKey;
public string KeyBoardKey;
public int TimeSinceLastEvent;
}
And all instances is saved in Queue. But I worry whether it is normal to store several thousand objects like this? Maybe there is a more universal way to store data of different types?
Storage in C# is not much different from Python in JavaScript in that it uses a garbage collected heap (of course every runtime has its own way of implementing the GC). So for "normal" classes you can just go ahead and treat them as you would in JS.
C#, however, also has the concept of value types, which are typically allocated on the stack. The stack has a much more limited space than the heap, so this is where you need to be a bit more careful, but it is unlikely that you accidentally allocate a large amount of stack space, since collection types are all reference types (with the exception of the more exotic stackalloc arrays that you should stay away from unless you are sure what you are doing). When passing value types between methods, they are copied, but it is also possible to pass them by reference (for example by casting to object). This will wrap the value type in a reference type, a process called boxing (the opposite process is called unboxing).
To create a value type, use struct instead of class. In your example above, using a value type for the mouse coordinate, e.g.
struct Point {
public int X, Y;
}
instead of an int array would likely save memory (and GC CPU time) since in your example you would have to allocate a reference object (the array) to hold only eight bytes (the two ints). But this only matters in more exotic cases, maybe in the render loop of a game engine, or if you have huge data sets. For most type of programs this is likely to be premature optimization (though one could argue creating the struct would make the code more readable, which would likely then be the main benefit).
Some useful reads:
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/value-types
https://medium.com/fhinkel/confused-about-stack-and-heap-2cf3e6adb771
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/stackalloc
If you want to store different type of objects on c# I recommend the use of ArrayList
With ArrayList you can store any type of object since it is a dynamic collection of objects.
ArrayList myAL = new ArrayList();
myAL.Add("Hello");
myAL.Add("World");
myAL.Add("!");
You will need a
using System.Collections;
To be abel to use this collection

Can C# structs coming from the unmanaged world be "live"-updating?

Suppose I get an IntPtr pointer to a struct from an unmanaged library. Is there any way, in C#, to obtain a "live" struct from this pointer, so that if I make a call that modifies the unmanaged struct, my "live" struct reflects this immediately?
I believe the standard approach is to construct a copy of the data using marshalling, which can't be "live" like this for various reasons (struct layout, data type compatibility, not residing in the .NET managed memory). But I couldn't find any explicit confirmation that "live" structs are impossible in C# though. Are they?
What's the closest I can get to such "live" structs without going to C++/CLI?
Try using the UnmanagedMemoryStream:
This class supports access to unmanaged memory using the existing stream-based model and does not require that the contents in the unmanaged memory be copied to the heap.
This means you will be seeking/reading/resetting the stream, but this avoids the marshalling. It's still not live in the sense you'd want to probably want to wrap these accesses in .NET properties.
Another alternative: maybe you could use System.Buffer, after getting the unmanaged pointer. You may need to do some clever casting.
Technically, you CAN set up a structure whose data is "live" to changes made elsewhere. However, you want to think VERY carefully about whether you SHOULD.
By its very definition in C#, a struct is a "value type". That means that one instance is one value, like "5", and any change to that value conceptually results in a new value. 5+1==6; that doesn't mean that 5 "becomes" 6 when you add 1, it means that two values 5 and 1, when added, equal 6.
Value types in programming also have another idiosyncrasy with reference types; they are passed "by value", meaning that they are considered cheap enough to "clone" when a value is passed as a parameter. Any change that could be made to the variable's value (or child properties) while in the method is discarded when the call is complete, becauwse all of the changes were made to a new copy of the struct on the top level of the stack, instead of a reference to the original object residing lower in the stack. You must explicitly override this behavior by using the ref or out keywords, in effect specifically stating that the original value SHOULD change based on what happens in the method.
Most objects implemented as structs force you to deal with them according to these rules by being immutable; once you've created one, you cannot set its fields/properties directly. You must instead call various methods on that struct which will result in the creation of a new struct.
If you wanted to create a class that would reflect the most current data coming from unmanaged land, first off I would make it a "class", so that there is no confusion about the behavior of the object when you pass it or attempt to change its members. Then, you would basically create a "wrapper" that used Kit's aforementioned UnmanagedMemoryStream to get/set values that you exposed as properties. That would give you a "reactive" object that could be polled to get whatever the unmanaged code had most recently set, and also to write out new values to the correct places in memory. Be VERY careful; this code will not be "safe" (especially if you write back out to it), and hooks into unmanaged code via pointers is one of the few places in .NET where you can intentionally crash not just your program and the unmanaged C++ program, but the entire machine.

Why is this implemented as a struct?

In System.Data.Linq, EntitySet<T> uses a couple of ItemList<T> structs which look like this:
internal struct ItemList<T> where T : class
{
private T[] items;
private int count;
...(methods)...
}
(Took me longer than it should to discover this - couldn't understand why the entities field in EntitySet<T> was not throwing null reference exceptions!)
My question is what are the benefits of implementing this as a struct over a class?
Lets assume that you want to store ItemList<T> in an array.
Allocating an array of value types (struct) will store the data inside the array. If on the other hand ItemList<T> was a reference type (class) only references to ItemList<T> objects would be stored inside the array. The actualy ItemList<T> objects would be allocated on the heap. An extra level of indirection is required to reach an ItemList<T> instance and as it simply is a an array combined with a length it is more efficient to use a value type.
After the inspecting the code for EntitySet<T> I can see that no array is involved. However, an EntitySet<T> still contains two ItemList<T> instances. As ItemList<T> is a struct the storage for these instances are allocated inside the EntitySet<T> object. If a class was used instead the EntitySet<T> would have contained references pointing to EntitySet<T> objects allocated separately.
The performance difference between using one or the other may not be noticable in most cases but perhaps the developer decided that he wanted to treat the array and the tightly coupled count as a single value simply because it seemed like the best thing to do.
For small critical internal data structures like ItemList<T>, we often have the choice of using either a reference type or a value type. If the code is written well, switching from one to the other is of a trivial change.
We can speculate that a value type avoids heap allocation and a reference type avoids struct copying so it's not immediately clear either way because it depends so much on how it is used.
The best way to find out which one is better is to measure it. Whichever is faster is the clear winner. I'm sure they did their benchmarking and struct was faster. After you've done this a few times your intuition is pretty good and the benchmark just confirms that your choice was correct.
Maybe its important that...quote about struct from here
The new variable and the original
variable therefore contain two
separate copies of the same data.
Changes made to one copy do not affect the other copy.
Just thinking, dont judge me hard :)
There are really only two reasons to ever use a struct, and that is either to get value type semantics, or for better performance.
As the struct contains an array, value type semantics doesn't work well. When you copy the struct you get a copy of the count, but you only get a copy of the reference to the array, not a copy of the items in the array. Therefore you would have to use special care whenever the struct is copied so that you don't get inconsistent instances of it.
So, the only remaining valid reason would be performance. There is a small overhead for each reference type instance, so if you have a lot of them there may be a noticable performance gain.
One nifty feature of such a structure is that you can create an array of them, and you get an array of empty lists without having to initialise each list:
ItemList<string>[] = new ItemList<string>[42];
As the items in the array are zero-filled, the count member will be zero and the items member will be null.
Purely speculating here:
Since the object is fairly small (only has two member variables), it is a good candidate for making it a struct to allow it to be passed as a ValueType.
Also, as #Martin Liversage points out, by being a ValueType it can be stored more efficiently in larger data structures (e.g. as an item in an array), without the overhead of having an individual object and a reference to it.

I want to allocate an object on the stack with C#

Say I have this C# class:
public class HttpContextEx
{
public HttpContext context = null;
public HttpRequest req = null;
public HttpResponse res = null;
}
How do I declare an object of it, inside a function, which will be allocated on the stack and not on the heap?
In other words I want to avoid using the 'new' keyword for this one. This code is bad:
HttpContextEx ctx = new HttpContextEx(); // << allocates on the heap!
I know what stack/heap are perfectly and I've heard of the wonderful C# GC, yet I insist to allocate this tiny object, which is here only for convenience, on the stack.
This attitude comes from C++ (my main tool) so I can't ignore this, I mean it really ruins the fun for me here (:
If you changed it to a value type using struct and create a new instance within the body of a method, it will create it on the stack. However the members, as they are reference types will still be on the Heap. The language whether it be a value or a reference type will still require the new operator but you can use var to eliminate the double use of the type name
var ctx = new HttpContextEx();
Otherwise, take C# as it is since the GC does a great job.
You can't (and shouldn't) do that. Even if you would use a struct (which will be put on the stack), you'd have to use the new operator for the contained classes. On a serious note, if you switch to another language, also switch your attitudes.
You must come from C++. Things don't work like that in .Net.
All reference types are allocated on the managed heap, where they are tracked by the GC. For object references scoped by a function which exits quickly, allocated objects most likely remain only in Generation 0 of the Managed Heap, and this results in very efficient memory collections. The Managed Heap is tuned to handle short lived objects like this. It doesn't even have the same allocation strategy as the C++ heap you may be used to.
This is how the CLR works. If you want to work in another way, try an unmanaged runtime.
In .NET classes are reference types, and structures are value types.
If you really want the type allocated on the stack, you can make a structure. There are however several reasons not to:
Structures are more complicated to implement correctly. You should stick to classes until you have a good reason to create a structure.
Structures are intended for types that represent a single unit, while your type is just a container for three separate units.
A structure should not be larger than 16 bytes to work efficiently. On a 32 bit system you get under that limit, but on a 64 bit system the structure is larger than that.
A structure should be immutable to work well as a value type, which your type is not.
Finally, there isn't anything inherently bad about allocating small objects on the heap. The memory manager is actually specifially designed to handle small short-lived objects efficiently.
Here is an example of code that doesn't work if it's a structure, but works fine if it's a class:
public void SetContext(HttpContextEx ex) {
ex.context = HttpContext.Current;
ex.req = ex.context.Request;
ex.res = es.context.Response;
}
HttpContextEx ctx = new HttpCoontextEx();
SetContext(ctx);

C# to C++ 'Gotchas'

I have been developing a project that I absolutely must develop part-way in C++. I need develop a wrapper and expose some C++ functionality into my C# app. I have been a C# engineer since the near-beginning of .NET, and have had very little experience in C++. It still looks very foreign to me when attempting to understand the syntax.
Is there anything that is going to knock me off my feet that would prevent me from just picking up C++ and going for it?
C++ has so many gotchas that I can't enumerate them all. Do a search for "C# vs C++". A few basic things to know:
In C++:
struct and a class are basically the same thing (Default visibility for a struct is public, it's private for a class).
Both struct and class can be created either on the heap or the stack.
You have to manage the heap yourself. If you create something with "new", you have to delete it manually at some point.
If performance isn't an issue and you have very little data to move around, you can avoid the memory management issue by having everything on the stack and using references (& operator).
Learn to deal with .h and .cpp. Unresolved external can be you worse nightmare.
You shouldn't call a virtual method from a constructor. The compiler will never tell you so I do.
Switch case doesn't enforce "break" and go thru by default.
There is not such a thing as an interface. Instead, you have class with pure virtual methods.
C++ aficionados are dangerous people living in cave and surviving on the fresh blood of C#/java programmers. Talk with them about their favorite language carefully.
Garbage collection!
Remember that everytime you new an object, you must be responsible for calling delete.
There are a lot of differences, but the biggest one I can think of that programmers coming from Java/C# always get wrong, and which they never realize they've got wrong, is C++'s value semantics.
In C#, you're used to using new any time you wish to create an object. And whenever we talk about a class instance, we really mean "a reference to the class instance". Foo x = y doesn't copy the object y, it simply creates another reference to whatever object y references.
In C++, there's a clear distinction between local objects, allocated without new (Foo f or Foo f(x, y), and dynamically allocated ones (Foo* f = new Foo() or Foo* f = new Foo(x, y)). And in C# terms, everything is a value type. Foo x = y actually creates a copy of the Foo object itself.
If you want reference semantics, you can use pointers or references: Foo& x = y creates a reference to the object y. Foo* x = &y creates a pointer to the address at which y is located. And copying a pointer does just that: it creates another pointer, which points to whatever the original pointer pointed to. So this is similar to C#'s reference semantics.
Local objects have automatic storage duration -- that is, a local object is automatically destroyed when it goes out of scope. If it is a class member, then it is destroyed when the owning object is destroyed. If it is a local variable inside a function, it is destroyed when execution leaves the scope in which it was declared.
Dynamically allocated objects are not destroyed until you call delete.
So far, you're probably with me. Newcomers to C++ are taught this pretty soon.
The tricky part is in what this means, how it affects your programming style:
In C++, the default should be to create local objects. Don't allocate with new unless you absolutely have to.
If you do need dynamically allocated data, make it the responsibility of a class. A (very) simplified example:
class IntArrayWrapper {
explicit IntArrayWrapper(int size) : arr(new int[size]) {} // allocate memory in the constructor, and set arr to point to it
~IntArrayWrapper() {delete[] arr; } // deallocate memory in the destructor
int* arr; // hold the pointer to the dynamically allocated array
};
this class can now be created as a local variable, and it will internally do the necessary dynamic allocations. And when it goes out of scope, it'll automatically delete the allocated array again.
So say we needed an array of x integers, instead of doing this:
void foo(int x){
int* arr = new int[x];
... use the array ...
delete[] arr; // if the middle of the function throws an exception, delete will never be called, so technically, we should add a try/catch as well, and also call delete there. Messy and error-prone.
}
you can do this:
void foo(int x){
IntArrayWrapper arr(x);
... use the array ...
// no delete necessary
}
Of course, this use of local variables instead of pointers or references means that objects are copied around quite a bit:
Bar Foo(){
Bar bar;
... do something with bar ...
return bar;
}
in the above, what we return is a copy of the bar object. We could return a pointer or a reference, but as the instance created inside the function goes out of scope and is destroyed the moment the function returns, we couldn't point to that. We could use new to allocate an instance that outlives the function, and return a function to that -- and then we get all the memory management headaches of figuring out whose responsibility it is to delete the object, and when that should happen. That's not a good idea.
Instead, the Bar class should simply be designed so that copying it does what we need. Perhaps it should internally call new to allocate an object that can live as long as we need it to. We could then make copying or assignment "steal" that pointer. Or we could implement some kind of reference-counting scheme where copying the object simply increments a reference counter and copies the pointer -- which should then be deleted not when the individual object is destroyed, but when the last object is destroyed and the reference counter reaches 0.
But often, we can just perform a deep copy, and clone the object in its entirety. If the object includes dynamically allocated memory, we allocate more memory for the copy.
It may sound expensive, but the C++ compiler is good at eliminating unnecessary copies (and is in fact in most cases allowed to eliminate copy operations even if they have side effects).
If you want to avoid copying even more, and you're prepared to put up with a little more clunky usage, you can enable "move semantics" in your classes as well as (or instead of) "copy semantics". It's worth getting into this habit because (a) some objects can't easily be copied, but they can be moved (e.g. a Socket class), (b) it's a pattern established in the standard library and (c) it's getting language support in the next version.
With move semantics, you can use objects as a kind of "transferable" container. It's the contents that move. In the current approach, it's done by calling swap, which swaps the contents of two objects of the same type. When an object goes out of scope, it is destructed, but if you swap its contents into a reference parameter first, the contents escape being destroyed when the scope ends. Therefore, you don't necessarily need to go all the way and use reference counted smart pointers just to allow complex objects to be returned from functions. The clunkiness comes from the fact that you can't really return them - you have to swap them into a reference parameter (somewhat similar to a ref parameter in C#). But the language support in the next version of C++ will address that.
So the biggest C# to C++ gotcha I can think of: don't make pointers the default. Use value semantics, and instead tailor your classes to behave the way you want when they're copied, created and destroyed.
A few months ago, I attempted to write a series of blog posts for people in your situation:
Part 1
Part 2
Part 3
I'm not 100% happy with how they turned out, but you may still find them useful.
And when you feel that you're never going to get a grip on pointers, this post may help.
No run-time checks
One C++ pitfall is the behaviour when you try to do something that might be invalid, but which can only be checked at runtime - for example, dereferencing a pointer that could be null, or accessing an array with an index that might be out of range.
The C# philosophy emphasises correctness; all behaviour should be well-defined and, in cases like this, it performs a run-time check of the preconditions and throws well-defined exceptions if they fail.
The C++ philosophy emphasises efficiency, and the idea that you shouldn't pay for anything you might not need. In cases like this, nothing will be checked for you, so you must either check the preconditions yourself or design your logic so that they must be true. Otherwise, the code will have undefined behaviour, which means it might (more or less) do what you want, it might crash, or it might corrupt completely unrelated data and cause errors that are horrendously difficult to track down.
Just to throw in some others that haven't been mentioned yet by other answers:
const: C# has a limited idea of const. In C++ 'const-correctness' is important. Methods that don't modify their reference parameters should take const-references, eg.
void func(const MyClass& x)
{
// x cannot be modified, and you can't call non-const methods on x
}
Member functions that don't modify the object should be marked const, ie.
int MyClass::GetSomething() const // <-- here
{
// Doesn't modify the instance of the class
return some_member;
}
This might seem unnecessary, but is actually very useful (see the next point on temporaries), and sometimes required, since libraries like the STL are fully const-correct, and you can't cast const things to non-const things (don't use const_cast! Ever!). It's also useful for callers to know something won't be changed. It is best to think about it in this way: if you omit const, you are saying the object will be modified.
Temporary objects: As another answer mentioned, C++ is much more about value-semantics. Temporary objects can be created and destroyed in expressions, for example:
std::string str = std::string("hello") + " world" + "!";
Here, the first + creates a temporary string with "hello world". The second + combines the temporary with "!", giving a temporary containing "hello world!", which is then copied to str. After the statement is complete, the temporaries are immediately destroyed. To further complicate things, C++0x adds rvalue references to solve this, but that's way out of the scope of this answer!
You can also bind temporary objects to const references (another useful part of const). Consider the previous function again:
void func(const MyClass& x)
This can be called explicitly with a temporary MyClass:
func(MyClass()); // create temporary MyClass - NOT the same as 'new MyClass()'!
A MyClass instance is created, on the stack, func2 accesses it, and then the temporary MyClass is destroyed automatically after func returns. This is convenient and also usually very fast, since the heap is not involved. Note 'new' returns a pointer - not a reference - and requires a corresponding 'delete'. You can also directly assign temporaries to const references:
const int& blah = 5; // 5 is a temporary
const MyClass& myClass = MyClass(); // creating temporary MyClass instance
// The temporary MyClass is destroyed when the const reference goes out of scope
Const references and temporaries are frequent in good C++ style, and the way these work is very different to C#.
RAII, exception safety, and deterministic destructors. This is actually a useful feature of C++, possibly even an advantage over C#, and it's worth reading up on since it's also good C++ style. I won't cover it here.
Finally, I'll just throw in this is a pointer, not a reference :)
The traditional stumbling blocks for people coming to C++ from C# or Java are memory management and polymorphic behavior:
While objects always live on the heap and are garbage collected in C#/Java, you can have objects in static storage, stack or the heap ('free store' in standard speak) in C++. You have to cleanup the stuff you allocate from the heap (new/delete). An invaluable technique for dealing with that is RAII.
Inheritance/polymorphism work only through pointer or reference in C++.
There are many others, but these will probably get you first.
Virtual destructors.
Header files! You'll find yourself asking, "so why do I need to write method declarations twice every time?"
Pointers and Memory Allocation
...I'm a C# guy too and I'm still trying to wrap my head around proper memory practices in C/C++.
Here is a brief overview of Managed C++ here. An article about writing an Unmanaged wrapper using the Managed C++ here. There is another article here about mixing Unmanaged with Managed C++ code here.
Using Managed C++ would IMHO make it easier to use as a bridge to the C# world and vice versa.
Hope this helps,
Best regards,
Tom.
The biggest difference is C#'s reference semantics (for most types) vs. C++'s value semantics. This means that objects are copied far more often than they are in C#, so it's important to ensure that objects are copied correctly. This means implementing a copy constructor and operator= for any class that has a destructor.
Raw memory twiddling. Unions, memsets, and other direct memory writes. Anytime someone writes to memory as a sequence of bytes (as opposed to as objects), you lose much of the ability to reason about the code.
Linking
Linking with external libraries is not as forgiving as it is in .Net, $DEITY help you if you mix something compiled with different flavors of the same msvcrt (debug, multithread, unicode...)
Strings
And you'll have to deal with Unicode vs Ansi strings, these are not exactly the same.
Have fun :)
The following isn't meant to dissuade in any way :D
C++ is a minefield of Gotcha's, it's relatively tame if you don't use templates and the STL -- and just use object orientation, but even then is a monster. In that case object based programming (rather than object-oriented programming) makes it even tamer -- often this form of C++ is enforced in certain projects (i.e., don't use any features that have even a chance of being naively used).
However you should learn all those things, as its a very powerful language if you do manage to traverse the minefield.If you want to learn about gotcha's you better get the books from Herb Sutter, Scott Myers, and Bjarne Stroustrup. Also Systematically going over the C++ FAQ Lite will help you to realize that it indeed does require 10 or so books to turn into a good C++ programmer.

Categories