Using Regex in a MultiThreaded Environment - c#

I have a constructor that uses several Regex objects which are static readonly in a class called RegexLib (mainly because this project uses a whole lot of Regex patterns that need to be used all over the place.
Upon the user adding some files to the application, this constructor gets called once for each file (run aynchronously across several threads). I've attached the relevant function that the constructor calls below.
private void GetSymbolsFromLines()
{
for (int i = 0; i < Lines.Length; i++)
{
string line = Lines[i];
if (RegexLib.InstString.IsMatch(line))
{
int instString = i;
int userdataString = 0;
for (int j = i; j < Lines.Length; j++)
{
if (RegexLib.UserdataString.IsMatch(Lines[j]))
{
userdataString = j;
break;
}
else if (Lines[j].Contains("userdata"))
{
break;
}
}
if (userdataString != 0)
{
_symbols.Add(new Symbol(RegexLib.InstString.Match(Lines[instString]),
RegexLib.UserdataString.Match(Lines[userdataString])));
}
}
}
}
The Regex objects are all fairly similar to these and have been tested using Regex Hero.
public static readonly Regex AliasFromUserdata = new Regex(#"text_alias=(?<AliasName>\w+).*?value=(?<AliasValue>(.*?))\^(?=(?:text_alias|\""))");
public static readonly Regex UpdateFromUserdata = new Regex("FOX_VAR=.*?attr=(?<AttributeType>.+?)\\^(?<AttributePropertyString>.*?)\\^(?:(?=(?:FOX_VAR|END_FOXV)))");
For some reason, the use of Regex seems to cause some issues in this multithreaded environment and a dig into the documentation revealed that this could be because:
However, result objects (Match and MatchCollection) returned by Regex should be used on a single thread.
So my question is, is there an easy way to use Regex accross mutliple threads whilst structuring them inside a library class for organisational reasons?
The only likely solution I can think of short of moving the Regex declaration closer to use is to clone the objects before use, but this seems like it could be quite slow.
For reference, here is the Worker Function that runs concurrently on 4 different threads.
private void FoxFileConvWorker(ConcurrentQueue<string> queue,QueueProgressData qpd)
{
string[] extensions = {".fdf", ".m1", ".g"};
while (!queue.IsEmpty)
{
string file;
if (queue.TryDequeue(out file))
{
if (extensions.Any(extension => Path.GetExtension(file) == extension))
{
try
{
_jobGraphics.Add(new Graphic(file));
IncrementProgress(qpd);
}
catch (Exception e)
{
ThreadSafeControlMethods.SetText(qpd.LblStatus, "Non-Fatal Error");
WriteLog(e, $"Creating Graphic DOM for {file}");
#if DEBUG
throw;
#endif
}
}
}
}
}

Can you modify your static Regex class ? If so, you can use a Factory instead of static properties:
static class RegexLib
{
static Regex CreateInstString(){
{
return new Regex("YourRegex");
}
static Regex CreateUserdataString(){
{
return new Regex("YourOtherRegex");
}
[..]
}
This way, you regex will not be shared among threads.
You could also use some dependency injection but this means a lot of refactoring in your code.

Related

Update console text from multiple threads not working

I am executing/processing very big files in multi threaded mode in a console app.
When I don't update/write to the console from threads, for testing the whole process take about 1 minute.
But when I try to update/write to console from threads to show the progress, the process stuck and it never finishes (waited several minutes even hours). And also console text/window does not updated as it should.
Update-1: As requested by few kind responder, i added minimal code that can reproduce the same error/problem
Here is the code from the thread function/method:
using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
namespace Large_Text_To_Small_Text
{
class Program
{
static string sAppPath;
static ArrayList objThreadList;
private struct ThreadFileInfo
{
public string sBaseDir, sRFile;
public int iCurFile, iTFile;
public bool bIncludesExtension;
}
static void Main(string[] args)
{
string sFileDir;
DateTime dtStart;
Console.Clear();
sAppPath = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location);
sFileDir = #"d:\Test";
dtStart = DateTime.Now;
///process in multi threaded mode
List<string> lFiles;
lFiles = new List<string>();
lFiles.AddRange(Directory.GetFiles(sFileDir, "*.*", SearchOption.AllDirectories));
if (Directory.Exists(sFileDir + "-Processed") == true)
{
Directory.Delete(sFileDir + "-Processed", true);
}
Directory.CreateDirectory(sFileDir + "-Processed");
sPrepareThreading();
for (int iFLoop = 0; iFLoop < lFiles.Count; iFLoop++)
{
//Console.WriteLine(string.Format("{0}/{1}", (iFLoop + 1), lFiles.Count));
sThreadProcessFile(sFileDir + "-Processed", lFiles[iFLoop], (iFLoop + 1), lFiles.Count, Convert.ToBoolean(args[3]));
}
sFinishThreading();
Console.WriteLine(DateTime.Now.Subtract(dtStart).ToString());
Console.ReadKey();
return;
}
private static void sProcSO(object oThreadInfo)
{
var inputLines = new BlockingCollection<string>();
long lACounter, lCCounter;
ThreadFileInfo oProcInfo;
lACounter = 0;
lCCounter = 0;
oProcInfo = (ThreadFileInfo)oThreadInfo;
var readLines = Task.Factory.StartNew(() =>
{
foreach (var line in File.ReadLines(oProcInfo.sRFile))
{
inputLines.Add(line);
lACounter++;
}
inputLines.CompleteAdding();
});
var processLines = Task.Factory.StartNew(() =>
{
Parallel.ForEach(inputLines.GetConsumingEnumerable(), line =>
{
lCCounter++;
/*
some process goes here
*/
/*If i Comment out these lines program get stuck!*/
//Console.SetCursorPosition(0, oProcInfo.iCurFile);
//Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
});
});
Task.WaitAll(readLines, processLines);
}
private static void sPrepareThreading()
{
objThreadList = new ArrayList();
for (var iTLoop = 0; iTLoop < 5; iTLoop++)
{
objThreadList.Add(null);
}
}
private static void sThreadProcessFile(string sBaseDir, string sRFile, int iCurFile, int iTFile, bool bIncludesExtension)
{
Boolean bMatched;
Thread oCurThread;
ThreadFileInfo oProcInfo;
Salma_RecheckThread:
bMatched = false;
for (int iTLoop = 0; iTLoop < 5; iTLoop++)
{
if (objThreadList[iTLoop] == null || ((System.Threading.Thread)(objThreadList[iTLoop])).IsAlive == false)
{
oProcInfo = new ThreadFileInfo()
{
sBaseDir = sBaseDir,
sRFile = sRFile,
iCurFile = iCurFile,
iTFile = iTFile,
bIncludesExtension = bIncludesExtension
};
oCurThread = new Thread(sProcSO);
oCurThread.IsBackground = true;
oCurThread.Start(oProcInfo);
objThreadList[iTLoop] = oCurThread;
bMatched = true;
break;
}
}
if (bMatched == false)
{
System.Threading.Thread.Sleep(250);
goto Salma_RecheckThread;
}
}
private static void sFinishThreading()
{
Boolean bRunning;
Salma_RecheckThread:
bRunning = false;
for (int iTLoop = 0; iTLoop < 5; iTLoop++)
{
if (objThreadList[iTLoop] != null && ((System.Threading.Thread)(objThreadList[iTLoop])).IsAlive == true)
{
bRunning = true;
}
}
if (bRunning == true)
{
System.Threading.Thread.Sleep(250);
goto Salma_RecheckThread;
}
}
}
}
And here is the screenshot, if I try to update console window:
You see? Nor the line number (oProcInfo.iCurFile) or the whole line is correct!
It should be like this:
1 = xxxxx
2 = xxxxx
3 = xxxxx
4 = xxxxx
5 = xxxxx
Update-1: To test just change the sFileDir to any folder that has some big text file or if you like you can download some big text files from following link:
https://wetransfer.com/downloads/8aecfe05bb44e35582fc338f623ad43b20210602005845/bcdbb5
Am I missing any function/method to update console text from threads?
I can't reproduce it. In my tests the process always runs to completion, without getting stuck. The output is all over the place though, because the two lines below are not synchronized:
Console.SetCursorPosition(0, oProcInfo.iCurFile);
Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
Each thread of the many threads involved in the computation invokes these two statements concurrently with the other threads. This makes it possible for one thread to preempt another, and move the cursor before the first thread has the chance to write in the console. To solve this problem you must add proper synchronization, and the easiest way to do it is to use the lock statement:
class Program
{
static object _locker = new object();
And in the sProcSO method:
lock (_locker)
{
Console.SetCursorPosition(0, oProcInfo.iCurFile);
Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
}
If you want to know more about thread synchronization, I recommend this online resource: Threading in C# - Part 2: Basic Synchronization
If you would like to hear my opinion about the code in the question, and you don't mind receiving criticism, my opinion is that honestly the code is so much riddled with problems that the best course of action would be to throw it away and restart from scratch. Use of archaic data structures (ArrayList???), liberal use of casting from object to specific types, liberal use of the goto statement, use of hungarian notation in public type members, all make the code difficult to follow, and easy for bugs to creep in. I found particularly problematic that each file is processed concurrently with all other files using a dedicated thread, and then each dedicated thread uses a ThreadPool thread (Task.Factory.StartNew) to starts a parallel loop (Parallel.ForEach) with unconfigured MaxDegreeOfParallelism. This setup ensures that the ThreadPool will be saturated so badly, that there is no hope that the availability of threads will ever match the demand. Most probably it will also result to a highly inefficient use of the storage device, especially if the hardware is a classic hard disk.
Your freezing problem may not be C# or code related
on the top left of your console window, on the icon .. right click
select Properties
remove the option of Quick Edit Mode and Insert Mode
you can google that feature, but essentially manifests in the problem you describe above
The formatting problem on the other hand does seem to be, here you need to create a class that serializes writes to the console window from a singe thread. a consumer/producer pattern would work (you could use a BlockingCollection to implement this quite easily)

Writing a unit test for concurrent C# code?

I've been trying to solve this issue for quite some time now. I've written some example code showcasing the usage of lock in C#. Running my code manually I can see that it works the way it should, but of course I would like to write a unit test that confirms my code.
I have the following ObjectStack.cs class:
enum ExitCode
{
Success = 0,
Error = 1
}
public class ObjectStack
{
private readonly Stack<Object> _objects = new Stack<object>();
private readonly Object _lockObject = new Object();
private const int NumOfPopIterations = 1000;
public ObjectStack(IEnumerable<object> objects)
{
foreach (var anObject in objects) {
Push(anObject);
}
}
public void Push(object anObject)
{
_objects.Push(anObject);
}
public void Pop()
{
_objects.Pop();
}
public void ThreadSafeMultiPop()
{
for (var i = 0; i < NumOfPopIterations; i++) {
lock (_lockObject) {
try {
Pop();
}
//Because of lock, the stack will be emptied safely and no exception is ever caught
catch (InvalidOperationException) {
Environment.Exit((int)ExitCode.Error);
}
if (_objects.Count == 0) {
Environment.Exit((int)ExitCode.Success);
}
}
}
}
public void ThreadUnsafeMultiPop()
{
for (var i = 0; i < NumOfPopIterations; i++) {
try {
Pop();
}
//Because there is no lock, an exception is caught when popping an already empty stack
catch (InvalidOperationException) {
Environment.Exit((int)ExitCode.Error);
}
if (_objects.Count == 0) {
Environment.Exit((int)ExitCode.Success);
}
}
}
}
And Program.cs:
public class Program
{
private const int NumOfObjects = 100;
private const int NumOfThreads = 10000;
public static void Main(string[] args)
{
var objects = new List<Object>();
for (var i = 0; i < NumOfObjects; i++) {
objects.Add(new object());
}
var objectStack = new ObjectStack(objects);
Parallel.For(0, NumOfThreads, x => objectStack.ThreadUnsafeMultiPop());
}
}
I'm trying to write a unit that tests the thread unsafe method, by checking the exit code value (0 = success, 1 = error) of the executable.
I tried to start and run the application executable as a process in my test, a couple of 100 times, and checked the exit code value each time in the test. Unfortunately, it was 0 every single time.
Any ideas are greatly appreciated!
Logically, there is one, very small, piece of code where this problem can happen. Once one of the threads enters the block of code that pops a single element, then either the pop will work in which case the next line of code in that thread will Exit with success OR the pop will fail in which case the next line of code will catch the exception and Exit.
This means that no matter how much parallelization you put into the program, there is still only one single point in the whole program execution stack where the issue can occur and that is directly before the program exits.
The code is genuinely unsafe, but the probability of an issue happening in any single execution of the code is extremely low as it requires the scheduler to decide not to execute the line of code that will exit the environment cleanly and instead let one of the other Threads raise an exception and exit with an error.
It is extremely difficult to "prove" that a concurrency bug exists, except for really obvious ones, because you are completely dependent on what the scheduler decides to do.
Looking up some other posts I see this post which is written related to Java but references C#: How should I unit test threaded code?
It includes a link to this which might be useful to you: http://research.microsoft.com/en-us/projects/chess/
Hope this is useful and apologies if it is not. Testing concurrency is inherently unpredictable as is writing example code to cause it.
Thanks for all the input! Although I do agree that this is a concurrency issue quite hard to detect due to the scheduler execution among other things, I seem to have found an acceptable solution to my problem.
I wrote the following unit test:
[TestMethod]
public void Executable_Process_Is_Thread_Safe()
{
const string executablePath = "Thread.Locking.exe";
for (var i = 0; i < 1000; i++) {
var process = new Process() {StartInfo = {FileName = executablePath}};
process.Start();
process.WaitForExit();
if (process.ExitCode == 1) {
Assert.Fail();
}
}
}
When I ran the unit test, it seemed that the Parallel.For execution in Program.cs threw strange exceptions at times, so I had to change that to traditional for-loops:
public class Program
{
private const int NumOfObjects = 100;
private const int NumOfThreads = 10000;
public static void Main(string[] args)
{
var objects = new List<Object>();
for (var i = 0; i < NumOfObjects; i++) {
objects.Add(new object());
}
var tasks = new Task[NumOfThreads];
var objectStack = new ObjectStack(objects);
for (var i = 0; i < NumOfThreads; i++)
{
var task = new Task(objectStack.ThreadUnsafeMultiPop);
tasks[i] = task;
}
for (var i = 0; i < NumOfThreads; i++)
{
tasks[i].Start();
}
//Using this seems to throw exceptions from unit test context
//Parallel.For(0, NumOfThreads, x => objectStack.ThreadUnsafeMultiPop());
}
Of course, the unit test is quite dependent on the machine you're running it on (a fast processor may be able to empty the stack and exit safely before reaching the critical section in all cases).
1.) You could inject IL Inject Context switches on a post build of your code in the form of Thread.Sleep(0) using ILGenerator which would most likely help these issues to arise.
2.) I would recommend you take a look at the CHESS project by Microsoft research team.

Design pattern for dynamic C# object

I have a queue that processes objects in a while loop. They are added asynchronously somewhere.. like this:
myqueue.pushback(String value);
And they are processed like this:
while(true)
{
String path = queue.pop();
if(process(path))
{
Console.WriteLine("Good!");
}
else
{
queue.pushback(path);
}
}
Now, the thing is that I'd like to modify this to support a TTL-like (time to live) flag, so the file path would be added o more than n times.
How could I do this, while keeping the bool process(String path) function signature? I don't want to modify that.
I thought about holding a map, or a list that counts how many times the process function returned false for a path and drop the path from the list at the n-th return of false. I wonder how can this be done more dynamically, and preferably I'd like the TTL to automatically decrement itself at each new addition to the process. I hope I am not talking trash.
Maybe using something like this
class JobData
{
public string path;
public short ttl;
public static implicit operator String(JobData jobData) {jobData.ttl--; return jobData.path;}
}
I like the idea of a JobData class, but there's already an answer demonstrating that, and the fact that you're working with file paths give you another possible advantage. Certain characters are not valid in file paths, and so you could choose one to use as a delimiter. The advantage here is that the queue type remains a string, and so you would not have to modify any of your existing asynchronous code. You can see a list of reserved path characters here:
http://en.wikipedia.org/wiki/Filename#Reserved_characters_and_words
For our purposes, I'll use the percent (%) character. Then you can modify your code as follows, and nothing else needs to change:
const int startingTTL = 100;
const string delimiter = "%";
while(true)
{
String[] path = queue.pop().Split(delimiter.ToCharArray());
int ttl = path.Length > 1?--int.Parse(path[1]):startingTTL;
if(process(path[0]))
{
Console.WriteLine("Good!");
}
else if (ttl > 0)
{
queue.pushback(string.Format("{0}{1}{2}", path[0], delimiter,ttl));
}
else
{
Console.WriteLine("TTL expired for path: {0}" path[0]);
}
}
Again, from a pure architecture standpoint, a class with two properties is a better design... but from a practical standpoint, YAGNI: this option means you can avoid going back and changing other asynchronous code that pushes into the queue. That code still only needs to know about the strings, and will work with this unmodified.
One more thing. I want to point out that this is a fairly tight loop, prone to running away with a cpu core. Additionally, if this is the .Net queue type and your tight loop gets ahead of your asynchronous produces to empty the queue, you'll throw an exception, which would break out of the while(true) block. You can solve both issues with code like this:
while(true)
{
try
{
String[] path = queue.pop().Split(delimiter.ToCharArray());
int ttl = path.Length > 1?--int.Parse(path[1]):startingTTL;
if(process(path[0]))
{
Console.WriteLine("Good!");
}
else if (ttl > 0)
{
queue.pushback(string.Format("{0}{1}{2}", path[0], delimiter,ttl));
}
else
{
Console.WriteLine("TTL expired for path: {0}" path[0]);
}
}
catch(InvalidOperationException ex)
{
//Queue.Dequeue throws InvalidOperation if the queue is empty... sleep for a bit before trying again
Thread.Sleep(100);
}
}
If the constraint is that bool process(String path) cannot be touched/changed then put the functionality into myqueue. You can keep its public signatures of void pushback(string path) and string pop(), but internally you can track your TTL. You can either wrap the string paths in a JobData-like class that gets added to the internal queue, or you can have a secondary Dictionary keyed by path. Perhaps even something as simple as saving the last poped path and if the subsequent push is the same path you can assume it was a rejected/failed item. Also, in your pop method you can even discard a path that has been rejected too many time and internally fetch the next path so the calling code is blissfully unaware of the issue.
You could abstract/encapsulate the functionality of the "job manager". Hide the queue and implementation from the caller so you can do whatever you want without the callers caring. Something like this:
public static class JobManager
{
private static Queue<JobData> _queue;
static JobManager() { Task.Factory.StartNew(() => { StartProcessing(); }); }
public static void AddJob(string value)
{
//TODO: validate
_queue.Enqueue(new JobData(value));
}
private static StartProcessing()
{
while (true)
{
if (_queue.Count > 0)
{
JobData data = _queue.Dequeue();
if (!process(data.Path))
{
data.TTL--;
if (data.TTL > 0)
_queue.Enqueue(data);
}
}
else
{
Thread.Sleep(1000);
}
}
}
private class JobData
{
public string Path { get; set; }
public short TTL { get; set; }
public JobData(string value)
{
this.Path = value;
this.TTL = DEFAULT_TTL;
}
}
}
Then your processing loop can handle the TTL value.
Edit - Added a simple processing loop. This code isn't thread safe, but should hopefully give you an idea.

Classification of instances in Weka

I'm trying to use Weka in my C# application. I've used IKVM to bring the Java parts into my .NET application. This seems to be working quite well. However, I am at a loss when it comes to Weka's API. How exactly do I classify instances if they are programmatically passed around in my application and not available as ARFF files.
Basically, I am trying to integrate a simple co-reference analysis using Weka's classifiers. I've built the classification model in Weka directly and saved it to disk, from where my .NET application opens it and uses the IKVM port of Weka to predict the class value.
Here is what I've got so far:
// This is the "entry" method for the classification method
public IEnumerable<AttributedTokenDecorator> Execute(IEnumerable<TokenPair> items)
{
TokenPair[] pairs = items.ToArray();
Classifier model = ReadModel(); // reads the Weka generated model
FastVector fv = CreateFastVector(pairs);
Instances instances = new Instances("licora", fv, pairs.Length);
CreateInstances(instances, pairs);
for(int i = 0; i < instances.numInstances(); i++)
{
Instance instance = instances.instance(i);
double classification = model.classifyInstance(instance); // array index out of bounds?
if(AsBoolean(classification))
MakeCoreferent(pairs[i]);
}
throw new NotImplementedException(); // TODO
}
// This is a helper method to create instances from the internal model files
private static void CreateInstances(Instances instances, IEnumerable<TokenPair> pairs)
{
instances.setClassIndex(instances.numAttributes() - 1);
foreach(var pair in pairs)
{
var instance = new Instance(instances.numAttributes());
instance.setDataset(instances);
for (int i = 0; i < instances.numAttributes(); i++)
{
var attribute = instances.attribute(i);
if (pair.Features.ContainsKey(attribute.name()) && pair.Features[attribute.name()] != null)
{
var value = pair.Features[attribute.name()];
if (attribute.isNumeric()) instance.setValue(attribute, Convert.ToDouble(value));
else instance.setValue(attribute, value.ToString());
}
else
{
instance.setMissing(attribute);
}
}
//instance.setClassMissing();
instances.add(instance);
}
}
// This creates the data set's attributes vector
private FastVector CreateFastVector(TokenPair[] pairs)
{
var fv = new FastVector();
foreach (var attribute in _features)
{
Attribute att;
if (attribute.Type.Equals(ArffType.Nominal))
{
var values = new FastVector();
ExtractValues(values, pairs, attribute.FeatureName);
att = new Attribute(attribute.FeatureName, values);
}
else
att = new Attribute(attribute.FeatureName);
fv.addElement(att);
}
{
var classValues = new FastVector(2);
classValues.addElement("0");
classValues.addElement("1");
var classAttribute = new Attribute("isCoref", classValues);
fv.addElement(classAttribute);
}
return fv;
}
// This extracts observed values for nominal attributes
private static void ExtractValues(FastVector values, IEnumerable<TokenPair> pairs, string featureName)
{
var strings = (from x in pairs
where x.Features.ContainsKey(featureName) && x.Features[featureName] != null
select x.Features[featureName].ToString())
.Distinct().ToArray();
foreach (var s in strings)
values.addElement(s);
}
private Classifier ReadModel()
{
return (Classifier) SerializationHelper.read(_model);
}
private static bool AsBoolean(double classifyInstance)
{
return classifyInstance >= 0.5;
}
For some reason, Weka throws an IndexOutOfRangeException when I call model.classifyInstance(instance). I have no idea why, nor can I come up with an idea how to rectify this issue.
I am hoping someone might know where I went wrong. The only documentation for Weka I found relies on ARFF files for prediction, and I don't really want to go there.
For some odd reason, this exception was raised by the DTNB classifier (I was using three in a majority vote classification model). Apparently, not using DTNB "fixed" the issue.

How should I refactor a long chain of try-and-catch-wrapped speculative casting operations

I have some C# code that walks XML schemata using the Xml.Schema classes from the .NET framework. The various simple type restrictions are abstracted in the framework as a whole bunch of classes derived from Xml.Schema.XmlSchemaFacet. Unless there is something I've missed, the only way to know which of the derived facet types a given facet is, is to speculatively cast it to one of them, catching the resultant InvalidCastOperation in case of failure. Doing this leaves me with a really ugly function like this:
private void NavigateFacet(XmlSchemaFacet facet)
{
try
{
handler.Length((XmlSchemaLengthFacet)facet);
}
catch(InvalidCastException)
{
try
{
handler.MinLength((XmlSchemaMinLengthFacet)facet);
}
catch(InvalidCastException)
{
try
{
handler.MaxLength((XmlSchemaMaxLengthFacet)facet);
}
catch(InvalidCastException)
{
...
}
}
}
}
I assume there must be more elegant ways to do this; either using some property I've missed from the .NET framework, or with some clever piece of OO trickery. Can anyone enlighten me?
Because I prefer debugging data to debugging code, I'd do it like this, especially if the code had to handle all of the XmlSchemaFacet subclasses:
Dictionary<Type, Action<XmlSchemaFacet>> HandlerMap =
new Dictionary<Type, Action<XmlSchemaFacet>>
{
{typeof(XmlSchemaLengthFacet), handler.Length},
{typeof(XmlSchemaMinLengthFacet), handler.MinLength},
{typeof(XmlSchemaMaxLengthFacet), handler.MaxLength}
};
HandlerMap[facet.GetType()](facet);
This'll throw a KeyNotFoundException if facet isn't of a known type. Note that all of the handler methods will have to cast their argument from XmlSchemaFacet, so you're probably not saving on total lines of code, but you're definitely saving on the number of paths through your code.
There also comes a point where (assuming that the map is pre-built) mapping types to methods with a dictionary will be faster than traversing a linear list of types, which is essentially what using a bunch of if blocks gets you.
You can try using the as keyword. Some others have recommended using the is keyword instead. I found this to be a great explanation of why as is better.
Some sample code:
private void NavigateFacet(XmlSchemaFacet facet)
{
XmlSchemaLengthFacet lengthFacet = facet as XmlSchemaLengthFacet;
if (lengthFacet != null)
{
handler.Length(lengthFacet);
}
else
{
// Re-try with XmlSchemaMinLengthFacet, etc.
}
}
private void NavigateFacet(XmlSchemaFacet facet)
{
if (facet is XmlSchemaLengthFacet)
{
handler.Length((XmlSchemaLengthFacet)facet);
}
else if (facet is XmlSchemaMinLengthFacet)
{
handler.MinLength((XmlSchemaMinLengthFacet)facet);
}
else if (facet is XmlSchemaMaxLengthFacet)
{
handler.MinLength((XmlSchemaMaxLengthFacet)facet);
}
// etc.
}
Update: I decided to benchmark the different methods discussed here (is vs. as). Here is the code that I used:
object c1 = new Class1();
int trials = 10000000;
Class1 tester;
Stopwatch watch = Stopwatch.StartNew();
for (int i = 0; i < trials; i++)
{
if (c1 is Class1)
{
tester = (Class1)c1;
}
}
watch.Stop();
MessageBox.Show(watch.ElapsedMilliseconds.ToString()); // ~104 ms
watch.Reset();
watch.Start();
for (int i = 0; i < trials; i++)
{
tester = c1 as Class1;
if (tester != null)
{
//
}
}
watch.Stop();
MessageBox.Show(watch.ElapsedMilliseconds.ToString()); // ~86 ms
watch.Reset();
watch.Start();
for (int i = 0; i < trials; i++)
{
if (c1 is Class1)
{
//
}
}
watch.Stop();
MessageBox.Show(watch.ElapsedMilliseconds.ToString()); // ~74 ms
watch.Reset();
watch.Start();
for (int i = 0; i < trials; i++)
{
//
}
watch.Stop();
MessageBox.Show(watch.ElapsedMilliseconds.ToString()); // ~50 ms
As expected, using the as keyword and then checking for null is faster than using the is keyword and casting (36 ms vs. 54 ms, subtracting the cost of the loop itself).
However, using the is keyword and then not casting is faster still (24ms), which means that finding the correct type with a series of is checks and then only casting once when the correct type is identified could actually be faster (depending upon the number of different type checks that have to be done in this method before the correct type is identified).
The deeper point, however, is that the number of trials in this test is 10 million, which means it really doesn't make much difference which method you use. Using is and casting takes 0.0000054 milliseconds, while using as and checking for null takes 0.0000036 milliseconds (on my ancient notebook).
You could try using the as keyword - if the cast fails, you get a null instead of an exception.
private void NavigateFacet(XmlSchemaFacet facet)
{
var length = facet as XmlSchemaLengthFacet;
if (length != null)
{
handler.Length(length);
return;
}
var minlength = facet as XmlSchemaMinLengthFacet;
if (minlength != null)
{
handler.MinLength(minlength);
return;
}
var maxlength = facet as XmlSchemaMaxLengthFacet;
if (maxlength != null)
{
handler.MaxLength(maxlength);
return;
}
...
}
If you had control over the classes, I'd suggest using a variant of the Visitor pattern (aka Double Despatch to recover the type information more cleanly, but since you don't, this is one relatively simple approach.
Update: Using the variable to store the result of the as cast avoids the need to go through the type checking logic twice.
Update 2: When C# 4 becomes available, you'll be able to use dynamic to do the dispatching for you:
public class HandlerDemo
{
public void Handle(XmlSchemaLengthFacet facet) { ... }
public void Handle(XmlSchemaMinLengthFacet facet) { ... }
public void Handle(XmlSchemaMaxLengthFacet facet) { ... }
...
}
private void NavigateFacet(XmlSchemaFacet facet)
{
dynamic handler = new HandlerDemo();
handler.Handle(facet);
}
This will work because method dispatch on dynamic objects uses the same rules as normal method overriding, but evaluated at run-time instead of compile-time.
Under the hood, the Dynamic Language Runtime (DLR) will be doing much the same kind of trick as the code shown in this (and other answers), but with the addition of caching for performance.
This is a proper way to do this without any try-catches.
if (facet is XmlSchemaLengthFacet)
{
handler.Length((XmlSchemaLengthFacet)facet);
}
else if (facet is XmlSchemaMinLengthFacet)
{
handler.MinLength((XmlSchemaMinLengthFacet)facet);
}
else if (facet is XmlSchemaMaxLengthFacet)
{
handler.MaxLength((XmlSchemaMaxLengthFacet)facet);
}
else
{
//Handle Error
}
use "is" to determine whether an object is of a given type
use "as" for type conversion, it is faster than normal casting and won't throw exceptions, it reuturns null on error instead
You can do it like this:
private void NavigateFacet(XmlSchemaFacet facet)
{
if (facet is XmlSchemaLengthFacet)
{
handler.Length(facet as XmlSchemaLengthFacet);
}
else if (facet is XmlSchemaMinLengthFacet)
{
handler.MinLength(facet as XmlSchemaMinLengthFacet);
}
else if (facet is XmlSchemaMaxLengthFacet)
{
handler.MaxLength(facet as XmlSchemaMaxLengthFacet);
}
}

Categories