Related
I'm in a scenario where I'm looping through data and formatting it in specific ways based on a setting, and I'm concerned that what I feel is best stylistically might impede performance.
The basic pattern of the code is as follows
enum setting {single, multiple, foo, bar};
Data data = getData(Connection conn, int id);
setting blah = data.getSetting();
foreach (Item item in data)
{
switch(blah)
{
case blah.single:
processDataSingle(item blah);
break;
...
}
}
My concern is that there might be thousands, or even tens of thousands of items in data. I was wondering if having the switch inside the loop where it may be evaluated repeatedly might cause some serious performance issues. I know I could put the switch before the loop, but then each case contains it, which seems much less readable, in that it's less apparent that the basic function remains the same.
You could set up a delegate/action once, then call it every time in the loop:
Data data = getData(Connection conn, int id);
setting blah = data.getSetting();
Action<Item> doThis;
switch (blah)
{
case blah.single:
doThis = i => processSingleData(i blah);
break;
...
}
foreach (Item item in data)
{
doThis(item);
}
Basically, put the body of each "case" in an Action, select that Action in your switch outside the loop, and call the Action in the loop.
You could create a method to keep readability, then pass the data to the method:
void processAllData(IEnumerable<Item> data, setting blah)
{
switch(blah)
{
case blah.single:
foreach (Item item in data)
{
}
}
// next case, next loop ...
}
Then it's just a one-liner:
processAllData(data, blah);
This approach is readable since it encapsulates complexity, concise since you only see what you have to see and efficient since you can optimize the cases.
By using a Action delegate this way, you can factorize your code a lot
enum setting {single, multiple, foo, bar};
Data data = getData(Connection conn, int id);
var processAll = new Action<Action<item>>(action =>
{
foreach(var item in data)
action(item);
});
setting blah = data.getSetting();
switch(blah)
{
case blah.single:
processAll(item => processDataSingle(item, blah));
break;
...
}
It certainly does have the potential to affect performance if you're talking about possibly running the comparison tens of thousands of times or more. The other problem that could potentially arise in the code that you've written here is what happens if you need to add to your enum. Then you'd need to open up this code and adjust it to take care of that circumstance, which violates the Open/Closed Principle.
The best way, IMO, to solve both problems at once would be to use a Factory pattern to take care of this (see posts here and here for some advice on starting that). All you'd need to do is have an interface whose implementations would call the method that you'd want to call in your switch code above. Create a factory and have it pick which implementation to return back to your code (before the loop) based on the enum passed in. At that point all your loop needs to do is to call that interface method which will do exactly what you wanted.
Afterwards, any future feature additions will only require you to create another implementation of that interface, and adjust the enum accordingly. No muss, no fuss.
It's almost certainly slower to put the switch in the loop like that. Whether it's significant or not is impossible to say - use a Stopwatch to see.
If the values in the switch statement are near one to another, the compiler will produce a lookup table instead of N if statements. It increases performance, but it's hard to say when the compiler will decide to do this.
Instead you can create a Dictionary<switchType,Delegate>, populate it with pairs of value-action, and then selecting the appropriate action will take about O(1) as dictionary is a hash table.
dictionary[value].Invoke().
I'm a bit new to C# (coming from PHP) and I was a bit shocked that, by looping through a list I can't pass a reference to that varialbe, i.e. the following code is not valid:
foreach (ref string var in arr) {
var = "new value";
}
I researched a bit and I found a suggestion to create a "updatable enumerator" but I can't figure out how exactly I should do that. I followed an example and tried to add a setter for the Current for both the IEnumerator.Current method and my custom enumerator (PeopleEnum.Current), but, to be honest, that was blind guessing and didn't work. I'm pasting the whole code at pastebin, as it's quite long to paste here - custom enumerator attempt. In this code, trying to access the current element by
badClass baddie = new badClass(ref tmp.Current);
results in an expected error that "A property or indexer may not be passed as an out or ref parameter"
What I'm aiming to do in the end is something like this - iterate through a list of objects, generate a button for each of them and add an onclick event for that button which will open a new form, passing the reference for that object, so that its contents can be edited in that new form. I did all this, but passing the object as a reference, instead of read-only data, is killing me. I would appreciate any answers, links where I can read about updatable enumerators or ideas.
First of all - without wanting to blame you - I would say: If you learn a new language, learn the new language! And don't try to develop PHP using C#. If computer languages would all be the same, we would not have so much of them. ;-)
I don't see exactly how your example is related to the actual job you want to do, but you shoudl probably learn about events, delegates and LINQ first. Might something like this help:
foreach (Obj obj in yourBaseObjects) {
Obj localObj = obj; // See Dans comment!!!
Button button = new Button(); // however you create your buttons
button.Click += {
// do something with obj
Console.WriteLine(localObj);
}
}
Yes, that works in C# and each event handler will be using the correct object. If it does not fit your needs, you have to provide more details.
Why you are using foreach loop. Use for loop. I dont know the exact code/syntax but something like that:
int sizeOfArray=objectArray.size();
for(int i=0;i<sizeOfArray;i++)
{
obj=objectArray[i];
// use obj whatever you wany
}
It sounds like you're not trying to pass a reference to the Object (the Object is already a reference type), but rather a reference to the Object's location in the array, correct? This latter is not directly possible in .NET, due to the way it manages memory and references. You can accomplish something like it using a wrapper class (no error handling, but this is the basic idea):
public sealed class ListReference<T>
{
private readonly IList<T> _list;
private readonly int _index;
public ListReference(IList<T> list, int index)
{
_list = list;
_index = index;
}
public T Value
{
get { return _list[_index]; }
set { _list[_index] = value; }
}
}
You can now construct this and pass it along, with all the associated complexity risks that come with passing around multiple references to an array. It would be better to change the design to avoid this, if possible, but it is possible to accomplish what you're after.
I am thinking about creating a persistent collection (lists or other) in C#, but I can't figure out a good API.
I use 'persistent' in the Clojure sense: a persistent list is a list that behaves as if it has value semantics instead of reference semantics, but does not incur the overhead of copying large value types. Persistent collections use copy-on-write to share internal structure. Pseudocode:
l1 = PersistentList()
l1.add("foo")
l1.add("bar")
l2 = l1
l1.add("baz")
print(l1) # ==> ["foo", "bar", "baz"]
print(l2) # ==> ["foo", "bar"]
# l1 and l2 share a common structure of ["foo", "bar"] to save memory
Clojure uses such datastructures, but additionally in Clojure all data structures are immutable. There is some overhead in doing all the copy-on-write stuff so Clojure provides a workaround in the form of transient datastructures that you can use if you are sure you're not sharing the datastructure with anyone else. If you have the only reference to a datastructure, why not mutate it directly instead of going through all the copy-on-write overhead.
One way to get this efficiency gain would be to keep a reference count on your datastructure (though I don't think Clojure works that way). If the refcount is 1, you're holding the only reference so do the updates destructively. If the refcount is higher, someone else is also holding a reference to it that's supposed to behave like a value type, so do copy-on-write to not disturb the other referrers.
In the API to such a datastructure, one could expose the refcounting, which makes the API seriously less usable, or one could not do the refcounting, leading to unnecessary copy-on-write overhead if every operation is COW'ed, or the API loses it's value type behaviour and the user has to manage when to do COW manually.
If C# had copy constructors for structs, this would be possible. One could define a struct containing a reference to the real datastructure, and do all the incref()/decref() calls in the copy constructor and destructor of the struct.
Is there a way to do something like reference counting or struct copy constructors automatically in C#, without bothering the API users?
Edit:
Just to be clear, I'm just asking about the API. Clojure already has an implementation of this written in Java.
It is certainly possible to make such an interface by using a struct with a reference to the real collection that is COW'ed on every operation. The use of refcounting would be an optimisation to avoid unnecessary COWing, but apparently isn't possible with a sane API.
What you're looking to do isn't possible, strictly speaking. You could get close by using static functions that do the reference counting, but I understand that that isn't a terrible palatable option.
Even if it were possible, I would stay away from this. While the semantics you describe may well be useful in Clojure, this cross between value type and reference type semantics will be confusing to most C# developers (mutable value types--or types with value type semantics that are mutable--are also usually considered Evil).
You may use the WeakReference class as an alternative to refcounting and achieve some of the benefits that refcounting gives you. When you hold the only copy to an object in a WeakReference, it will be garbage collected. WeakReference has some hooks for you to inspect whether that's been the case.
EDIT 3: While this approach does do the trick I'd urge you to stay away from persuing value semantics on C# collections. Users of your structure do not expect this kind of behavior on the platform. These semantics add confusion and the potential for mistakes.
EDIT 2: Added an example. #AdamRobinson: I'm afraid I was not clear how WeakReference can be of use. I must warn that performancewise, most of the time it might be even worse than doing a naive Copy-On-Write at every operation. This is due to the Garbage Collector call. Therefore this is merely an academic solution, and I cannot recommend it's use in production systems. It does do exactly what you ask however.
class Program
{
static void Main(string[] args)
{
var l1 = default(COWList);
l1.Add("foo"); // initialize
l1.Add("bar"); // no copy
l1.Add("baz"); // no copy
var l2 = l1;
l1.RemoveAt(0); // copy
l2.Add("foobar"); // no copy
l1.Add("barfoo"); // no copy
l2.RemoveAt(1); // no copy
var l3 = l2;
l3.RemoveAt(1); // copy
Trace.WriteLine(l1.ToString()); // bar baz barfoo
Trace.WriteLine(l2.ToString()); // foo baz foobar
Trace.WriteLine(l3.ToString()); // foo foobar
}
}
struct COWList
{
List<string> theList; // Contains the actual data
object dummy; // helper variable to facilitate detection of copies of this struct instance.
WeakReference weakDummy; // helper variable to facilitate detection of copies of this struct instance.
/// <summary>
/// Check whether this COWList has already been constructed properly.
/// </summary>
/// <returns>true when this COWList has already been initialized.</returns>
bool EnsureInitialization()
{
if (theList == null)
{
theList = new List<string>();
dummy = new object();
weakDummy = new WeakReference(dummy);
return false;
}
else
{
return true;
}
}
void EnsureUniqueness()
{
if (EnsureInitialization())
{
// If the COWList has been copied, removing the 'dummy' reference will not kill weakDummy because the copy retains a reference.
dummy = new object();
GC.Collect(2); // OUCH! This is expensive. You may replace it with GC.Collect(0), but that will cause spurious Copy-On-Write behaviour.
if (weakDummy.IsAlive) // I don't know if the GC guarantees detection of all GC'able objects, so there might be cases in which the weakDummy is still considered to be alive.
{
// At this point there is probably a copy.
// To be safe, do the expensive Copy-On-Write
theList = new List<string>(theList);
// Prepare for the next modification
weakDummy = new WeakReference(dummy);
Trace.WriteLine("Made copy.");
}
else
{
// At this point it is guaranteed there is no copy.
weakDummy.Target = dummy;
Trace.WriteLine("No copy made.");
}
}
else
{
Trace.WriteLine("Initialized an instance.");
}
}
public void Add(string val)
{
EnsureUniqueness();
theList.Add(val);
}
public void RemoveAt(int index)
{
EnsureUniqueness();
theList.RemoveAt(index);
}
public override string ToString()
{
if (theList == null)
{
return "Uninitialized COWList";
}
else
{
var sb = new StringBuilder("[ ");
foreach (var item in theList)
{
sb.Append("\"").Append(item).Append("\" ");
}
sb.Append("]");
return sb.ToString();
}
}
}
This outputs:
Initialized an instance.
No copy made.
No copy made.
Made copy.
No copy made.
No copy made.
No copy made.
Made copy.
[ "bar" "baz" "barfoo" ]
[ "foo" "baz" "foobar" ]
[ "foo" "foobar" ]
I read what you're asking for, and I'm thinking of a "terminal-server"-type API structure.
First, define an internal, thread-safe singleton class that will be your "server"; it actually holds the data you're looking at. It will expose a Get and Set method that will take the string of the value being set or gotten, controlled by a ReaderWriterLock to ensure that the value can be read by anyone, but not while anyone's writing and only one person can write at a time.
Then, provide a factory for a class that is your "terminal"; this class will be public, and contains a reference to the internal singleton (which otherwise cannot be seen). It will contain properties that are really just pass-throughs for the singleton instance. In this way, you can provide a large number of "terminals" that will all see the same data from the "server", and will be able to modify that data in a thread-safe way.
You could use copy constructors and a list of the values accessed by each instance to provide copy-type knowledge. You can also mashup the value names with the object's handle to support cases where L1 and L2 share an A, but L3 has a different A because it was declared seperately. Or, L3 can get the same A that L1 and L2 have. However you structure this, I would very clearly document how it should be expected to behave, because this is NOT the way things behave in basic .NET.
I'd like to have something like this on a flexible tree collection object of mine, though it wouldn't be by using value-type semantics (which would be essentially impossible in .net) but by having a clone generate a "virtual" deep clone instead of actually cloning every node within the collection. Instead of trying to keep an accurate reference count, every internal node would have three states:
Flexible
SharedImmutable
UnsharedMutable
Calling Clone() on a sharedImmutable node would simply yield the original object; calling Clone on a Flexible node would turn it into a SharedImmutable one. Calling Clone on an unshared mutable node would create a new node holding clones of all its descendents; the new object would be Flexible.
Before an object could be written, it would have to be made UnsharedMutable. To make an object UnsharedMutable if it isn't already, make its parent (the node via which it was accessed) UnsharedMutable (recursively). Then if the object was SharedImmutable, clone it (using a ForceClone method) and update the parent's link to point to the new object. Finally, set the new object's state to UnsharedMutable.
An essential aspect of this technique would be having separate classes for holding the data and providing the interface to it. A statement like MyCollection["this"]["that"]["theOther"].Add("George")needs to be evaluated by having the indexing operations return an indexer class which holds a reference to MyCollection. At that point, the "Add" method could then be able to act upon whatever intermediate nodes it had to in order to perform any necessary copy-on-write operations.
One common pattern I see and use frequently in C++ is to temporarily set a variable to a new value, and then reset it when I exit that scope. In C++, this is easily accomplished with references and templated scope classes, and allows for increased safety and prevention of errors where the variable is set to a new value, then reset to an incorrect assumed initial value.
Here is a simplified example of what I mean (in C++):
void DoSomething()
{
// The following line captures GBL.counter by reference, stores its current
// value, and sets it to 1
ScopedReset<int> resetter(GBL.counter, 1);
// In this function and all below, GBL.counter will be 1
CallSomethingThatNeedsCounterOf1();
// When I hit the close brace, ~ScopedReset will be called, and it will
// reset GBL.counter to it's previous value
}
Is there any way to do this in C#? I've found the hard way that I can't capture a ref parameter inside an IEnumerator or a lambda, which were my first two thoughts. I don't want to use the unsafe keyword if possible.
The first challenge to doing this in C# is dealing with non-deterministic destruction. Since C# doesn't have destructors you need a mechanism to control scope in order to execute the reset. IDisposable helps there and the using statement will mimic C++ deterministic destruction semantics.
The second is getting at the value you want to reset without using pointers. Lambdas and delegates can do that.
class Program
{
class ScopedReset<T> : IDisposable
{
T originalValue = default(T);
Action<T> _setter;
public ScopedReset(Func<T> getter, Action<T> setter, T v)
{
originalValue = getter();
setter(v);
_setter = setter;
}
public void Dispose()
{
_setter(originalValue);
}
}
static int counter = 0;
static void Main(string[] args)
{
counter++;
counter++;
Console.WriteLine(counter);
using (new ScopedReset<int>(() => counter, i => counter = i, 1))
Console.WriteLine(counter);
Console.WriteLine(counter);
}
}
Can you not simply copy the reference value to a new local variable, and use this new variable throughout your method, i.e. copy value by value?
Indeed, changing it from a ref to regular value parameter will accomplish this!
I don't think you can capture a ref paramenter to a local variable, and have it stay a ref - a local copy will be created.
GBL.counter is effectively an implicit, hidden parameter to CallSomethingThatNeedsCounterOf1. If you could convert it to a regular, declared paraemter your problem would go away. Also, if that would result in to many parameters, a solution would be a pair of methods which set up and reset the environment so that CallSomethingThatNeedsCounterOf1() can run.
You can create a class that calls the SetUp method in its constructor and the Reset method in Dispose(). You can use this class with the using statement, to aproximate the c++ behaviour. You would, however, have to create one of these classes for each scenario.
I have various classes for handling form data and querying a database. I need some advice on reducing the amount of code I write from site to site.
The following code is for handling a form posted via ajax to the server. It simply instantiates a Form class, validates the data and processes any errors:
public static string submit(Dictionary<string, string> d){
Form f = new Form("myform");
if (!f.validate(d)){
return f.errors.toJSON();
}
//process form...
}
Is there a way to reduce this down to 1 line as follows:
if (!Form.validate("myform", d)){ return Form.errors.toJSON(); }
Let's break that down into two questions.
1) Can I write the existing logic all in one statement?
The local variable has to be declared in its own statement, but the initializer doesn't have to be there. It's prefectly legal to say:
Form f;
if (!(f=new Form("myform")).validate(d))return f.errors.toJSON();
Why you would want to is beyond me; doing so is ugly, hard to debug, hard to understand, and hard to maintain. But it's perfectly legal.
2) Can I make this instance method into a static method?
Probably not directly. Suppose you had two callers validating stuff on two different threads, both calling the static Form.Validate method, and both producing errors. Now you have a race. One of them is going to win and fill in Form.Errors. And now you have two threads reporting the same set of errors, but the errors are wrong for one of them.
The better way to make this into a static method is to make the whole thing into a static method that has the desired semantics, as in plinth's answer.
Errors errors = Validator.Validate(d);
if (errors != null) return errors.toJSON();
Now the code is very clear, and the implementation of Validate is straightforward. Create a form, call the validator, either return null or the errors.
I would suggest that you don't need advice on reducing the amount of code you write. Rather, get advice on how to make the code read more like the meaning it intends to represent. Sometimes that means writing slightly more code, but that code is clear and easy to understand.
I would move all common validation logic to a superclass.
I think the main problem of your code is not that is long, but that you're repeating that in many places, either if you manage to make it a one-liner, it would not be DRY.
Take a look at the Template Method pattern, it might help here (The abstract class with the validation would be the Template and your specific 'actions' would be the subclasses).
Of course you could write this:
public static string FormValidate(Dictionary<string, string> d)
{
Form f = new Form("myform");
if (!f.validate(d))
return f.errors.ToJSON();
return null;
}
then your submit can be:
public static string submit(Dictionary<string, string> d)
{
if ((string errs = FormValidate(d))!= null) { return errs; }
// process form
}
That cuts down your code and doesn't hurt readability much at all.
If you really, really wanted to, you could store the error text in a thread-local property.
Does C# have a "ThreadLocal" analog (for data members) to the "ThreadStatic" attribute?