How to have transactions on objects - c#

How I can imitate transactions on objects. For example, I want to delete the item from one collection and then add the same item to other collection as an atomic action. It is possible to do a lot of checks when something failed and to roll back everything but this is annoying.
Is there any technique (no difference what language (Java/C++/C#)) to achive this.

This sort of thing becomes easier when you use immutable collections. In an immutable collection, adding or removing a member does not change the collection, it returns a new collection. (Implementing immutable collections which can do that using acceptably little time and space is a tricky problem.)
But if you have immutable collections, the logic becomes much easier. Suppose you want to move an item from the left collection to the right collection:
newLeft = left.Remove(item);
newRight = right.Add(item);
left and right have not changed; they are immutable. Now the problem you have to solve is an atomic set of left = newLeft and right = newRight, which isn't that hard a problem to solve.

For small, simple objects, you can use a copy-modify-swap idiom. Copy the original object. Make the changes. If all the changes succeeded, swap the copy with the original. (In C++, swap is typically efficient and no-fail.) The destructor will then clean up the original, instead of the copy.
In your case, you'd copy both collections. Remove the object from the first, add it to the second, and then swap the original collections with the copies.
However, this may not be practical if you have large or hard-to-copy objects. In those cases, you generally have to do more work manually.

Yes, Memento pattern http://en.wikipedia.org/wiki/Memento_pattern

Software transactional memory is one approach. There is no language-agnostic technology for this that I know of.

You can use Herb Sutters' method
Like
class EmployeeDatabase
{
public void TerminateEmployee(int index)
{
// Clone sensitive objects.
ArrayList tempActiveEmployees =
(ArrayList) activeEmployees.Clone();
ArrayList tempTerminatedEmployees =
(ArrayList) terminatedEmployees.Clone();
// Perform actions on temp objects.
object employee = tempActiveEmployees[index];
tempActiveEmployees.RemoveAt( index );
tempTerminatedEmployees.Add( employee );
// Now commit the changes.
ArrayList tempSpace = null;
ListSwap( ref activeEmployees,
ref tempActiveEmployees,
ref tempSpace );
ListSwap( ref terminatedEmployees,
ref tempTerminatedEmployees,
ref tempSpace );
}
void ListSwap(ref ArrayList first,
ref ArrayList second,
ref ArrayList temp)
{
temp = first;
first = second;
second = temp;
temp = null;
}
private ArrayList activeEmployees;
private ArrayList terminatedEmployees;
}
Mainly it means to divide the code into 2 parts :
void ExceptionNeutralMethod()
{
//——————————
// All code that could possibly throw exceptions is in this
// first section. In this section, no changes in state are
// applied to any objects in the system including this.
//——————————
//——————————
// All changes are committed at this point using operations
// strictly guaranteed not to throw exceptions.
//——————————
}
Of course it is just to show method I mean concerning ArrayList :). Better to use generics if possible, etc...
EDIT
Additionally if you have extreme requirements reliability please have a look at
Constrained Execution Regions also.

Related

c# clearing a list vs assigning a new list to existing variable

I am learning C#.
If I first make a variable to hold a list.
List<int> mylist = new List<int>();
Say I did some work with the list, now I want to clear the list to use it for something else. so I do one of the following:
Method 1:
mylist.Clear();
Method 2:
mylist = new List<int>();
The purpose is just to empty all value from the list to reuse the list.
Is there any side effect with using method2. Should I favor one method to the next.
I also found a similar question,
Using the "clear" method vs. New Object
I will let other readers decide what's best for their own use case. So I won't pick a correct answer.
Using method 2 could result in unexpected behaviour within your program depending on how you are using the list.
If you were to do something like:
List<int> myList = new List<int> { 1, 2, 3 };
someObj.listData = myList;
myList = new List<int>(); // clearing the list.
the data in "someObj" will still be 1,2,3.
However, if you did myList.clear() instead, then the data in "someObj" would also get cleared.
An additional thought I just had. If you have dangling references to the original list, and reassign the variable using new in order to clear it, the GC will never clean up that memory. I would say it's always safer to use the .clear() method if you need to empty the contents of a list.
Method 2 will cause a reallocation while method 1 just clears the internal array so the garbage collector can reclaim the memory:
From source:
// Clears the contents of List.
public void Clear() {
if (_size > 0)
{
Array.Clear(_items, 0, _size); // Don't need to doc this but we clear the elements so that the gc can reclaim the references.
_size = 0;
}
_version++;
}
https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,2765070d40f47b98
I think reallocating is going to be less expensive than clearing the array. Either way the performance is probably negligible unless you are doing some real super intensive work. In that case you would probably consider using a data structure that is faster than a list anyways.
Is there any side effect with using method2?
Yes. First, theres an allocation of a new object,
Second, the first list might get collected the next time the garbage collector collects.
Should I favor one method to the next.?
You should favor the first method, since it expresses your intention ,to clear the list, more clearly.

Why do we need two interfaces to enumerate a collection?

It is quite a while that I have been trying to understand the idea behind IEnumerable and IEnumerator. I read all the questions and answers I could find over the net, and on StackOverflow in particular, but I am not satisfied. I got to the point where I understand how those interfaces should be used, but not why they are used this way.
I think that the essence of my misunderstanding is that we need two interfaces for one operation. I realized that if both are needed, one was probably not enough. So I took the "hard coded" equivalent of foreach (as I found here):
while (enumerator.MoveNext())
{
object item = enumerator.Current;
// logic
}
and tried to get it to work with one interface, thinking something would go wrong which would make me understand why another interface is needed.
So I created a collection class, and implemented IForeachable:
class Collection : IForeachable
{
private int[] array = { 1, 2, 3, 4, 5 };
private int index = -1;
public int Current => array[index];
public bool MoveNext()
{
if (index < array.Length - 1)
{
index++;
return true;
}
index = -1;
return false;
}
}
and used the foreach equivalent to nominate the collection:
var collection = new Collection();
while (collection.MoveNext())
{
object item = collection.Current;
Console.WriteLine(item);
}
And it works! So what is missing here that make another interface required?
Thanks.
Edit:
My question is not a duplicate of the questions listed in the comments:
This question is why interfaces are needed for enumerating in the first place.
This question and this question are about what are those interfaces and how should they be used.
My question is why they are designed the way they are, not what are they, how they work, and why do we need them in the first place.
What are the two interfaces and what do they do?
The IEnumerable interface is placed on the collection object and defines the GetEnumerator() method, this returns a (normally new) object that has implements the IEnumerator interface. The foreach statement in C# and For Each statement in VB.NET use IEnumerable to access the enumerator in order to loop over the elements in the collection.
The IEnumerator interface is esentially the contract placed on the object that actually does the iteration. It stores the state of the iteration and updates it as the code moves through the collection.
Why not just have the collection be the enumerator too? Why have two separate interfaces?
There is nothing to stop IEnumerator and IEnumerable being implemented on the same class. However, there is a penalty for doing this – It won’t be possible to have two, or more, loops on the same collection at the same time. If it can be absolutely guaranteed that there won’t ever be a need to loop on the collection twice at the same time then that’s fine. But in the majority of circumstances that isn’t possible.
When would someone iterate over a collection more than once at a time?
Here are two examples.
The first example is when there are two loops nested inside each other on the same collection. If the collection was also the enumerator then it wouldn’t be possible to support nested loops on the same collection, when the code gets to the inner loop it is going to collide with the outer loop.
The second example is when there are two, or more, threads accessing the same collection. Again, if the collection was also the enumerator then it wouldn’t be possible to support safe multithreaded iteration over the same collection. When the second thread attempts to loop over the elements in the collection the state of the two enumerations will collide.
Also, because the iteration model used in .NET does not permit alterations to a collection during enumeration these operations are otherwise completely safe.
-- This was from a blog post I wrote many years ago: https://colinmackay.scot/2007/06/24/iteration-in-net-with-ienumerable-and-ienumerator/
Your IForeachable cannot even be iterated from two different threads (you cannot have multiple active iterations at all - even from the same thread), because current enumeration state stored in IForeachable itself. You also have to reset your current position each time you finished enumeration, and if you forgot to do that - well, next caller will think your collection is empty. I can only imagine all kind of hard to track bugs this all might lead to.
On the other hand, because IEnumerable returns new IEnumerator for each caller - you can have multiple enumerations in progress simultaneously, because each caller has it's own enumeration state. I think this reason alone is enough to justify two interfaces. Enumeration is essentially read operation, and it would have been very confusing if you cannot read the same thing simultaneously in multiple places.

How to design an api to a persistent collection in C#?

I am thinking about creating a persistent collection (lists or other) in C#, but I can't figure out a good API.
I use 'persistent' in the Clojure sense: a persistent list is a list that behaves as if it has value semantics instead of reference semantics, but does not incur the overhead of copying large value types. Persistent collections use copy-on-write to share internal structure. Pseudocode:
l1 = PersistentList()
l1.add("foo")
l1.add("bar")
l2 = l1
l1.add("baz")
print(l1) # ==> ["foo", "bar", "baz"]
print(l2) # ==> ["foo", "bar"]
# l1 and l2 share a common structure of ["foo", "bar"] to save memory
Clojure uses such datastructures, but additionally in Clojure all data structures are immutable. There is some overhead in doing all the copy-on-write stuff so Clojure provides a workaround in the form of transient datastructures that you can use if you are sure you're not sharing the datastructure with anyone else. If you have the only reference to a datastructure, why not mutate it directly instead of going through all the copy-on-write overhead.
One way to get this efficiency gain would be to keep a reference count on your datastructure (though I don't think Clojure works that way). If the refcount is 1, you're holding the only reference so do the updates destructively. If the refcount is higher, someone else is also holding a reference to it that's supposed to behave like a value type, so do copy-on-write to not disturb the other referrers.
In the API to such a datastructure, one could expose the refcounting, which makes the API seriously less usable, or one could not do the refcounting, leading to unnecessary copy-on-write overhead if every operation is COW'ed, or the API loses it's value type behaviour and the user has to manage when to do COW manually.
If C# had copy constructors for structs, this would be possible. One could define a struct containing a reference to the real datastructure, and do all the incref()/decref() calls in the copy constructor and destructor of the struct.
Is there a way to do something like reference counting or struct copy constructors automatically in C#, without bothering the API users?
Edit:
Just to be clear, I'm just asking about the API. Clojure already has an implementation of this written in Java.
It is certainly possible to make such an interface by using a struct with a reference to the real collection that is COW'ed on every operation. The use of refcounting would be an optimisation to avoid unnecessary COWing, but apparently isn't possible with a sane API.
What you're looking to do isn't possible, strictly speaking. You could get close by using static functions that do the reference counting, but I understand that that isn't a terrible palatable option.
Even if it were possible, I would stay away from this. While the semantics you describe may well be useful in Clojure, this cross between value type and reference type semantics will be confusing to most C# developers (mutable value types--or types with value type semantics that are mutable--are also usually considered Evil).
You may use the WeakReference class as an alternative to refcounting and achieve some of the benefits that refcounting gives you. When you hold the only copy to an object in a WeakReference, it will be garbage collected. WeakReference has some hooks for you to inspect whether that's been the case.
EDIT 3: While this approach does do the trick I'd urge you to stay away from persuing value semantics on C# collections. Users of your structure do not expect this kind of behavior on the platform. These semantics add confusion and the potential for mistakes.
EDIT 2: Added an example. #AdamRobinson: I'm afraid I was not clear how WeakReference can be of use. I must warn that performancewise, most of the time it might be even worse than doing a naive Copy-On-Write at every operation. This is due to the Garbage Collector call. Therefore this is merely an academic solution, and I cannot recommend it's use in production systems. It does do exactly what you ask however.
class Program
{
static void Main(string[] args)
{
var l1 = default(COWList);
l1.Add("foo"); // initialize
l1.Add("bar"); // no copy
l1.Add("baz"); // no copy
var l2 = l1;
l1.RemoveAt(0); // copy
l2.Add("foobar"); // no copy
l1.Add("barfoo"); // no copy
l2.RemoveAt(1); // no copy
var l3 = l2;
l3.RemoveAt(1); // copy
Trace.WriteLine(l1.ToString()); // bar baz barfoo
Trace.WriteLine(l2.ToString()); // foo baz foobar
Trace.WriteLine(l3.ToString()); // foo foobar
}
}
struct COWList
{
List<string> theList; // Contains the actual data
object dummy; // helper variable to facilitate detection of copies of this struct instance.
WeakReference weakDummy; // helper variable to facilitate detection of copies of this struct instance.
/// <summary>
/// Check whether this COWList has already been constructed properly.
/// </summary>
/// <returns>true when this COWList has already been initialized.</returns>
bool EnsureInitialization()
{
if (theList == null)
{
theList = new List<string>();
dummy = new object();
weakDummy = new WeakReference(dummy);
return false;
}
else
{
return true;
}
}
void EnsureUniqueness()
{
if (EnsureInitialization())
{
// If the COWList has been copied, removing the 'dummy' reference will not kill weakDummy because the copy retains a reference.
dummy = new object();
GC.Collect(2); // OUCH! This is expensive. You may replace it with GC.Collect(0), but that will cause spurious Copy-On-Write behaviour.
if (weakDummy.IsAlive) // I don't know if the GC guarantees detection of all GC'able objects, so there might be cases in which the weakDummy is still considered to be alive.
{
// At this point there is probably a copy.
// To be safe, do the expensive Copy-On-Write
theList = new List<string>(theList);
// Prepare for the next modification
weakDummy = new WeakReference(dummy);
Trace.WriteLine("Made copy.");
}
else
{
// At this point it is guaranteed there is no copy.
weakDummy.Target = dummy;
Trace.WriteLine("No copy made.");
}
}
else
{
Trace.WriteLine("Initialized an instance.");
}
}
public void Add(string val)
{
EnsureUniqueness();
theList.Add(val);
}
public void RemoveAt(int index)
{
EnsureUniqueness();
theList.RemoveAt(index);
}
public override string ToString()
{
if (theList == null)
{
return "Uninitialized COWList";
}
else
{
var sb = new StringBuilder("[ ");
foreach (var item in theList)
{
sb.Append("\"").Append(item).Append("\" ");
}
sb.Append("]");
return sb.ToString();
}
}
}
This outputs:
Initialized an instance.
No copy made.
No copy made.
Made copy.
No copy made.
No copy made.
No copy made.
Made copy.
[ "bar" "baz" "barfoo" ]
[ "foo" "baz" "foobar" ]
[ "foo" "foobar" ]
I read what you're asking for, and I'm thinking of a "terminal-server"-type API structure.
First, define an internal, thread-safe singleton class that will be your "server"; it actually holds the data you're looking at. It will expose a Get and Set method that will take the string of the value being set or gotten, controlled by a ReaderWriterLock to ensure that the value can be read by anyone, but not while anyone's writing and only one person can write at a time.
Then, provide a factory for a class that is your "terminal"; this class will be public, and contains a reference to the internal singleton (which otherwise cannot be seen). It will contain properties that are really just pass-throughs for the singleton instance. In this way, you can provide a large number of "terminals" that will all see the same data from the "server", and will be able to modify that data in a thread-safe way.
You could use copy constructors and a list of the values accessed by each instance to provide copy-type knowledge. You can also mashup the value names with the object's handle to support cases where L1 and L2 share an A, but L3 has a different A because it was declared seperately. Or, L3 can get the same A that L1 and L2 have. However you structure this, I would very clearly document how it should be expected to behave, because this is NOT the way things behave in basic .NET.
I'd like to have something like this on a flexible tree collection object of mine, though it wouldn't be by using value-type semantics (which would be essentially impossible in .net) but by having a clone generate a "virtual" deep clone instead of actually cloning every node within the collection. Instead of trying to keep an accurate reference count, every internal node would have three states:
Flexible
SharedImmutable
UnsharedMutable
Calling Clone() on a sharedImmutable node would simply yield the original object; calling Clone on a Flexible node would turn it into a SharedImmutable one. Calling Clone on an unshared mutable node would create a new node holding clones of all its descendents; the new object would be Flexible.
Before an object could be written, it would have to be made UnsharedMutable. To make an object UnsharedMutable if it isn't already, make its parent (the node via which it was accessed) UnsharedMutable (recursively). Then if the object was SharedImmutable, clone it (using a ForceClone method) and update the parent's link to point to the new object. Finally, set the new object's state to UnsharedMutable.
An essential aspect of this technique would be having separate classes for holding the data and providing the interface to it. A statement like MyCollection["this"]["that"]["theOther"].Add("George")needs to be evaluated by having the indexing operations return an indexer class which holds a reference to MyCollection. At that point, the "Add" method could then be able to act upon whatever intermediate nodes it had to in order to perform any necessary copy-on-write operations.

LinQ optimization

Here is a peace of code:
void MyFunc(List<MyObj> objects)
{
MyFunc1(objects);
foreach( MyObj obj in objects.Where(obj1=>obj1.Good))
{
// Do Action With Good Object
}
}
void MyFunc1(List<MyObj> objects)
{
int iGoodCount = objects.Where(obj1=>obj1.Good).Count();
BeHappy(iGoodCount);
// do other stuff with 'objects' collection
}
Here we see that collection is analyzed twice and each time the value of 'Good' property is checked for each member: 1st time when calculating count of good objects, 2nd - when iterating through all good objects.
It is desirable to have that optimized, and here is a straightforward solution:
before call to MyFunc1 makecreate an additional temporary collection of good objects only (goodObjects, it can be IEnumerable);
get count of these objects and pass it as an additional parameter to MyFunc1;
in the 'MyFunc' method iterate not through 'objects.Where(...)' but through the 'goodObjects' collection.
Not too bad approach (as far as I see), but additional variable is required to be created in the 'MyFunc' method and additional parameter is required to be passed.
Question: is there any LinQ out-of-the-box functionality that allows any caching during 1st Where().Count(), remembering a processed collection and use it in the next iteration?
Any thoughts are welcome.
Thanks.
No, LINQ queries are not optimized in this way (what you describe is similar to the way SQL Server reuses a query execution plan). LINQ does not (and, for practical purposes, cannot) know enough about your objects in order to optimize this way. As far as it knows, your collection has changed (or is entirely different) between the two calls.
You're obviously aware of the ability to persist your query into a new List<T>, but apart from that there's really nothing that I can recommend without knowing more about your class and where else MyFunc is used.
As long as MyFunc1 doesn't need to modify the list by adding/removing objects, this will work.
void MyFunc(List<MyObj> objects)
{
ILookup<bool, MyObj> objLookup = objects.ToLookup(obj1 => obj1.Good);
MyFunc1(objLookup[true]);
foreach(MyObj obj in objLookup[true])
{
//..
}
}
void MyFunc1(IEnumerable<MyObj> objects)
{
//..
}

Partially thread-safe dictionary

I have a class that maintains a private Dictionary instance that caches some data.
The class writes to the dictionary from multiple threads using a ReaderWriterLockSlim.
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
Right now, I have the following:
public ReadOnlyCollection<MyClass> Values() {
using (sync.ReadLock())
return new ReadOnlyCollection<MyClass>(cache.Values.ToArray());
}
Is there a way to do this without copying the collection many times?
I'm using .Net 3.5 (not 4.0)
I want to expose the dictionary's values outside the class.
What is a thread-safe way of doing that?
You have three choices.
1) Make a copy of the data, hand out the copy. Pros: no worries about thread safe access to the data. Cons: Client gets a copy of out-of-date data, not fresh up-to-date data. Also, copying is expensive.
2) Hand out an object that locks the underlying collection when it is read from. You'll have to write your own read-only collection that has a reference to the lock of the "parent" collection. Design both objects carefully so that deadlocks are impossible. Pros: "just works" from the client's perspective; they get up-to-date data without having to worry about locking. Cons: More work for you.
3) Punt the problem to the client. Expose the lock, and make it a requirement that clients lock all views on the data themselves before using it. Pros: No work for you. Cons: Way more work for the client, work they might not be willing or able to do. Risk of deadlocks, etc, now become the client's problem, not your problem.
If you want a snapshot of the current state of the dictionary, there's really nothing else you can do with this collection type. This is the same technique used by the ConcurrentDictionary<TKey, TValue>.Values property.
If you don't mind throwing an InvalidOperationException if the collection is modified while you are enumerating it, you could just return cache.Values since it's readonly (and thus can't corrupt the dictionary data).
EDIT: I personally believe the below code is technically answering your question correctly (as in, it provides a way to enumerate over the values in a collection without creating a copy). Some developers far more reputable than I strongly advise against this approach, for reasons they have explained in their edits/comments. In short: This is apparently a bad idea. Therefore I'm leaving the answer but suggesting you not use it.
Unless I'm missing something, I believe you could expose your values as an IEnumerable<MyClass> without needing to copy values by using the yield keyword:
public IEnumerable<MyClass> Values {
get {
using (sync.ReadLock()) {
foreach (MyClass value in cache.Values)
yield return value;
}
}
}
Be aware, however (and I'm guessing you already knew this), that this approach provides lazy evaluation, which means that the Values property as implemented above can not be treated as providing a snapshot.
In other words... well, take a look at this code (I am of course guessing as to some of the details of this class of yours):
var d = new ThreadSafeDictionary<string, string>();
// d is empty right now
IEnumerable<string> values = d.Values;
d.Add("someKey", "someValue");
// if values were a snapshot, this would output nothing...
// but in FACT, since it is lazily evaluated, it will now have
// what is CURRENTLY in d.Values ("someValue")
foreach (string s in values) {
Console.WriteLine(s);
}
So if it's a requirement that this Values property be equivalent to a snapshot of what is in cache at the time the property is accessed, then you're going to have to make a copy.
(begin 280Z28): The following is an example of how someone unfamiliar with the "C# way of doing things" could lock the code:
IEnumerator enumerator = obj.Values.GetEnumerator();
MyClass first = null;
if (enumerator.MoveNext())
first = enumerator.Current;
(end 280Z28)
Review next possibility, just exposes ICollection interface, so in Values() you can return your own implementation. This implementation will use only reference on Dictioanry.Values and always use ReadLock for access items.

Categories