classes and threading - c#

I have the following code:
public class Search
{
StringBuilder sb = new StringBuilder();
string[] myparams;
public void Start()
{
//Start search threads
for (int i = 0; i < 50; i++)
{
tasks.Add(Task.Factory.StartNew(() =>
{
string text1 = GetFirstRequest(url, myparams);
string text2 = GetFirstRequest(url, myparams);
}, ct, TaskCreationOptions.LongRunning, TaskScheduler.Default));
}
}
private string GetFirstRequest(string url, string[] myparams)
{
//Use stringbuilder to build the complete url with params
//Use webrequest, response and stream to return the url contents
}
private string GetSecondRequest(string url, string[] myparams)
{
//Similar to GetFirstRequest
}
}
For my main form I call:
Search search = new Search();
search.Start();
As you can see from the code above, individual threads are created. However, each thread is calling the same private functions in the Search class in order to access the url.
Is the code thread-safe? Is it better to place the private functions into a separate class and create a class for each thread?

Without seeing the actual code for GetFirstRequest and GetSecondRequest, we can't tell - but the fact that you've got an instance variable of type StringBuilder makes me skeptical. StringBuilder itself isn't thread-safe, and if you're modifying a single object in multiple threads I doubt that you'll get the result you want anyway.
If you're using StringBuilder to build a complete URL, why not just create that StringBuilder in each method? If you don't need to change any of the state of your object, you'll be a long way towards being thread-safe.
Also note that your method has a params parameter but could also access the params instance variable (which would need a different name anyway as params is a keyword in C#). Do you really need that duplication? Why not just use the instance variable from the method?
It feels like this class can be made thread-safe, but almost certainly isn't yet. You need to design it to be thread-safe - which means either avoiding any state mutation, or using appropriate locking. (The former approach is usually cleaner where it's possible.)

Related

Access to modified closure: ReSharper

I created a library that handles database access. I recently added transaction handling; however, I came across a small issue. To outline this, I wrote this sample for demonstration purposes:
class Program
{
static void Main(string[] args)
{
String data = null;
DoAction(ref data, () =>
{
Console.WriteLine(data);
});
Console.ReadLine();
}
private static void DoAction(ref String data, Action action)
{
if (data == null)
data = "Initialized Data";
action();
}
}
I get "Access to modified closure" underline on the following code line for the 'data' variable:
Console.WriteLine(data);
I understand that the modification of the ref data variable can cause issues (e.g. when running foreach loops). However, in the following case, I don't see this to happen.
Here is another version with a loop changing the variable further - the output is as expected:
class Program
{
static void Main(string[] args)
{
String data = null;
for (var i = 0; i < 10; i++)
DoAction(ref data, () =>
{
Console.WriteLine(data);
});
Console.ReadLine();
}
private static void DoAction(ref String data, Action action)
{
if (data == null)
data = "Initialized Data";
else
data += "|";
action();
}
}
ReSharper offers me to create a local variable, but I explicitly want to use the created string from the DoAction() method. If I would accept ReSharpers approach, it actually would break the code. Is there any other way to solve this problem? I'd like to use this Action approach, but I don't want ReSharper to complain about it either (and possibly not disable ReSharpers inspection).
Any suggestions?
I would suggest avoid using a ref parameter for this in the first place - it seems needlessly complicated to me. I'd rewrite DoAction as:
static string DoAction(string data, Action<string> action)
{
data = data == null ? "Initialized Data" : data + "|";
action(data);
return data;
}
Then you can have:
data = DoAction(data, Console.WriteLine);
or if you want to use a lambda expression:
data = DoAction(data, txt => Console.WriteLine(txt));
You can make DoAction a void method if you don't actually need the result afterwards. (It's not clear why you need the result to be returned and a delegate to execute in DoAction, but presumably that makes more sense in your wider context.)
In case you feel certain that the warning is not appropriate, there is the InstantHandleAttribute which is documented as:
Tells code analysis engine if the parameter is completely handled
when the invoked method is on stack. If the parameter is a delegate,
indicates that delegate is executed while the method is executed.
If the parameter is an enumerable, indicates that it is enumerated
while the method is executed.
I think is exactly what you want.
You can get the attribute from the JetBrains.Annotations package or alternatively as copy-paste from ReSharper options.

C# usage of lock() statements and caching of data

I have a number of static List's in my application, which are used to store data from my database and are used when looking up information:
public static IList<string> Names;
I also have some methods to refresh this data from the database:
public static void GetNames()
{
SQLEngine sql = new SQLEngine(ConnectionString);
lock (Names)
{
Names = sql.GetDataTable("SELECT * FROM Names").ToList<string>();
}
}
I initially didnt have the lock() in place, however i noticed very occasionally, the requesting thread couldnt find the information in the list. Now, I am assuming that if the requesting thread tries to access the Names list, it cant until it has been fully updated.
Is this the correct methodology and usage of the lock() statement?
As a sidenote, i noticed on MSDN that one shouldnt use lock() on public variables. Could someone please elaborate in my particular scenario?
lock is only useful if all places intended to be synchronized also apply the lock. So every time you access Names you would be required to lock. At the moment, that only stops 2 threads swapping Names at the same time, which frankly isn't a problem here anyway, as reference swaps are atomic anyway.
Another problem; presumably Names starts off null? You can't lock a null. Equally, you shouldn't lock on something that may change reference. If you want to synchronize, a common approach is something like:
// do not use for your scenario - see below
private static readonly object lockObj = new object();
then lock(lockObj) instead of your data.
With regards to not locking things that are visible externally; yes. That is because some other code could randomly choose to lock on it, which could cause unexpected blocking, and quite possibly deadlocks.
The other big risk is that some of your code obtains the names, and then does a sort/add/remove/clear/etc - anything that mutates the data. Personally, I would be using a read-only list here. In fact, with a read-only list, all you have is a reference swap; since that is atomic, you don't need any locking:
public static IList<string> Names { get; private set; }
public static void UpdateNames() {
List<string> tmp = SomeSqlQuery();
Names = tmp.AsReadOnly();
}
And finally: public fields are very very rarely a good idea. Hence the property above. This will be inlined by the JIT, so it is not a penalty.
No, it's not correct since anyone can use the Names property directly.
public class SomeClass
{
private List<string> _names;
private object _namesLock = new object();
public IEnumerable<string> Names
{
get
{
if (_names == null)
{
lock (_namesLock )
{
if (_names == null)
_names = GetNames();
}
}
return _names;
}
}
public void UpdateNames()
{
lock (_namesLock)
GetNames();
}
private void GetNames()
{
SQLEngine sql = new SQLEngine(ConnectionString);
_names = sql.GetDataTable("SELECT * FROM Names").ToList<string>();
}
}
Try to avoid static methods. At least use a singleton.
The check, lock, check is faster than a lock, check since the write will only occur once.
Assigning a property on usage is called lazy loading.
The _namesLock is required since you can't lock on null.
From the oode you have shown, the first time GetNames() is called the Names property is null. I don't known what a lock on a null object would do. I would add a variable to lock on.
static object namesLock = new object();
Then in GetNames()
lock (namesLock)
{
if (Names == null)
Names = ...;
}
We do the if test inside of the lock() to stop race conditions. I'm assuming that the caller of GetNames() also does the same test.

C# Delegates & guid.newguid()

I just started using C# this afternoon, so be a little gentle.
Currently I am working on a type of "template engine" where one of the callbacks needs to generate a globally unique ID. I am using delegates to manage the callbacks.
Currently the code looks like this (though I have also tried an anonymous function & returning NewGuid directly w/o a variable):
static string UID(List<string> p)
{
string s = Guid.NewGuid().ToString();
return s;
}
Which, when called directly, works fine. However if I try to call it via the delegate (added to a StringDictionary via addCallback("generate UID", new CallbackWrapper(UID))), the program will generate the same GUID regardless of how many times I duplicate it; even though calling the method directly both before & after the event occurs results in a unique ID as expected. I'v
No doubt it's just something simple I've missed, inevitably stemming from me being relatively inexperienced at C#.
Any help would be appreciated.
Thanks.
Well, I've now tried Dictionary with the same result.
CallbackWrapper is just the delegate, it's defined like this:
delegate string CallbackWrapper(List<string> parameters);
The remainder of the work is done in another class, which looks like this:
class TemplateParser
{
private Dictionary<string, CallbackWrapper> callbackMap;
public TemplateParser(string directivePrefix, string directiveSuffix)
{
...
callbackMap = new Dictionary<string,CallbackWrapper>();
}
public TemplateParser() : this("<!-- {", "} -->") {}
{
callbackMap.Add(name, callback);
}
public string parse(string filename)
{
...
string replacement =
callbackMap[directiveName](new List<string>(parameters.Split(new string[] { ";", " " }, StringSplitOptions.RemoveEmptyEntries));
...
}
}
I've stripped out the majority of the string handling code to save some space.
The issue is in your calling code, not in the code itself, nor in the delegate.
Using delegates here definitely works if called correctly.
Furthermore, your code can be slightly simplified:
static string UID(List<string> p)
{
return Guid.NewGuid().ToString();
}
(The variable is utterly redundant.)
use delegate.invoke
The difference between direct function call and delegate.invoke is here
http://social.msdn.microsoft.com/Forums/en/csharplanguage/thread/f629c34d-6523-433a-90b3-bb5d445c5587
StringDictionary will automatically cast your CallbackWrapper to a string, meaning it will only run once and store the output of CallbackWrapper.ToString(). This is probably not what you want.
Try using Dictionary<string, CallbackWrapper> instead.

Using string as a lock to do thread synchronization

While i was looking at some legacy application code i noticed it is using a string object to do thread synchronization. I'm trying to resolve some thread contention issues in this program and was wondering if this could lead so some strange situations. Any thoughts ?
private static string mutex= "ABC";
internal static void Foo(Rpc rpc)
{
lock (mutex)
{
//do something
}
}
Strings like that (from the code) could be "interned". This means all instances of "ABC" point to the same object. Even across AppDomains you can point to the same object (thx Steven for the tip).
If you have a lot of string-mutexes, from different locations, but with the same text, they could all lock on the same object.
The intern pool conserves string storage. If you assign a literal string constant to several variables, each variable is set to reference the same constant in the intern pool instead of referencing several different instances of String that have identical values.
It's better to use:
private static readonly object mutex = new object();
Also, since your string is not const or readonly, you can change it. So (in theory) it is possible to lock on your mutex. Change mutex to another reference, and then enter a critical section because the lock uses another object/reference. Example:
private static string mutex = "1";
private static string mutex2 = "1"; // for 'lock' mutex2 and mutex are the same
private static void CriticalButFlawedMethod() {
lock(mutex) {
mutex += "."; // Hey, now mutex points to another reference/object
// You are free to re-enter
...
}
}
To answer your question (as some others already have), there are some potential problems with the code example you provided:
private static string mutex= "ABC";
The variable mutex is not immutable.
The string literal "ABC" will refer to the same interned object reference everywhere in your application.
In general, I would advise against locking on strings. However, there is a case I've ran into where it is useful to do this.
There have been occasions where I have maintained a dictionary of lock objects where the key is something unique about some data that I have. Here's a contrived example:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
ConcurrentDictionary<int, object> _locks = new ConcurrentDictionary<int, object>();
void DoSomething(SomeEntity entity)
{
var mutex = _locks.GetOrAdd(entity.Id, id => new object());
lock(mutex)
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The goal of code like this is to serialize concurrent invocations of DoSomething() within the context of the entity's Id. The downside is the dictionary. The more entities there are, the larger it gets. It's also just more code to read and think about.
I think .NET's string interning can simplify things:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
void DoSomething(SomeEntity entity)
{
lock(string.Intern("dee9e550-50b5-41ae-af70-f03797ff2a5d:" + entity.Id))
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The difference here is that I am relying on the string interning to give me the same object reference per entity id. This simplifies my code because I don't have to maintain the dictionary of mutex instances.
Notice the hard-coded UUID string that I'm using as a namespace. This is important if I choose to adopt the same approach of locking on strings in another area of my application.
Locking on strings can be a good idea or a bad idea depending on the circumstances and the attention that the developer gives to the details.
If you need to lock a string, you can create an object that pairs the string with an object that you can lock with.
class LockableString
{
public string _String;
public object MyLock; //Provide a lock to the data in.
public LockableString()
{
MyLock = new object();
}
}
My 2 cents:
ConcurrentDictionary is 1.5X faster than interned strings. I did a benchmark once.
To solve the "ever-growing dictionary" problem you can use a dictionary of semaphores instead of a dictionary of objects. AKA use ConcurrentDictionary<string, SemaphoreSlim> instead of <string, object>. Unlike the lock statements, Semaphores can track how many threads have locked on them. And once all the locks are released - you can remove it from the dictionary. See this question for solutions like that: Asynchronous locking based on a key
Semaphores are even better because you can even control the concurrency level. Like, instead of "limiting to one concurrent run" - you can "limit to 5 concurrent runs". Awesome free bonus isn't it? I had to code an email-service that needed to limit the number of concurrent connections to a server - this came very very handy.
I imagine that locking on interned strings could lead to memory bloat if the strings generated are many and are all unique. Another approach that should be more memory efficient and solve the immediate deadlock issue is
// Returns an Object to Lock with based on a string Value
private static readonly ConditionalWeakTable<string, object> _weakTable = new ConditionalWeakTable<string, object>();
public static object GetLock(string value)
{
if (value == null) throw new ArgumentNullException(nameof(value));
return _weakTable.GetOrCreateValue(value.ToLower());
}

Large static arrays are slowing down class load, need a better/faster lookup method

I have a class with a couple static arrays:
an int[] with 17,720 elements
a string[] with 17,720 elements
I noticed when I first access this class it takes almost 2 seconds to initialize, which causes a pause in the GUI that's accessing it.
Specifically, it's a lookup for Unicode character names. The first array is an index into the second array.
static readonly int[] NAME_INDEX = {
0x0000, 0x0001, 0x0005, 0x002C, 0x003B, ...
static readonly string[] NAMES = {
"Exclamation Mark", "Digit Three", "Semicolon", "Question Mark", ...
The following code is how the arrays are used (given a character code). [Note: This code isn't a performance problem]
int nameIndex = Array.BinarySearch<int>(NAME_INDEX, code);
if (nameIndex > 0)
{
return NAMES[nameIndex];
}
I guess I'm looking at other options on how to structure the data so that 1) The class is quickly loaded, and 2) I can quickly get the "name" for a given character code.
Should I not be storing all these thousands of elements in static arrays?
Update
Thanks for all the suggestions. I've tested out a Dictionary approach and the performance of adding all the entries seems to be really poor.
Here is some code with the Unicode data to test out Arrays vs Dictionaries
http://drop.io/fontspace/asset/fontspace-unicodesupport-zip
Solution Update
I tested out my original dual arrays (which are faster than both dictionary options) with a background thread to initialize and that helped performance a bit.
However, the real surprise is how well the binary files in resource streams works. It is the fastest solution discussed in this thread. Thanks everyone for your answers!
So a couple of observations. Binary Search is only going to work if your array is sorted, and from your above code snippet, it doesn't look to be sorted.
Since your primary goal is to find a specific name, your code is begging for a hash table. I would suggest using a Dictionary, it will give you O(1) (on average) lookup, without much more overhead than just having the arrays.
As for the load time, I agree with Andrey that the best way is going to be by using a separate thread. You are going to have some initialization overhead when using the amount of data you are using. Normal practice with GUIs is to use a separate thread for these activites so you don't lock up the UI.
First
A Dictionary<int, string> is going to perform far better than your duelling arrays will. Putting aside how this data gets into the arrays/Dictionary (hardcoded vs. read in from another location, like a resource file), this is still a better and more intuitive storage mechanism
Second
As others have suggested, do your loading in another thread. I'd use a helper function to help you deal with this. You could use an approach like this:
public class YourClass
{
private static Dictionary<int, string> characterLookup;
private static ManualResetEvent lookupCreated;
static YourClass()
{
lookupCreated = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(LoadLookup);
}
static void LoadLookup(object garbage)
{
// add your pairs by calling characterLookup.Add(...)
lookupCreated.Set();
}
public static string GetDescription(int code)
{
if (lookupCreated != null)
{
lookupCreated.WaitOne();
lookupCreated.Close();
lookupCreated = null;
}
string output;
if(!characterLookup.TryGetValue(code, out output)) output = null;
return output;
}
}
In your code, call GetDescription in order to translate your integer into the corresponding string. If the UI doesn't call this until later, then you should see a marked decrease in startup time. To be safe, though, I've included a ManualResetEvent that will cause any calls to GetDescription to block until the dictionary has been fully loaded.
"Should I not be storing all these thousands of elements in static arrays?"
A much better way would be to store your data as binary stream in resources in the assembly and then load from the resources. Will be some more programming overhead but therefore doesn't need any object initialization.
Basic idea would be (no real code):
// Load data (two streams):
indices = ResourceManager.GetStream ("indexData");
strings = ResourceManager.GetStream ("stringData");
// Retrieving an entry:
stringIndex = indices.GetIndexAtPosition (char);
string = strings.GetStringFromPosition (stringIndex);
If you want a really good solution (for even some more work) look into using memmapped data files.
Initialize your arrays in separate thread that will not lock the UI
http://msdn.microsoft.com/en-us/library/hz49h034.aspx
if you store the arrays in a file you could do a lazy load
public class Class1
{
const int CountOfEntries = 17700; //or what ever the count is
IEnumerable<KeyValuePair<int, string>> load()
{
using (var reader = File.OpenText("somefile"))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var pair = line.Split(',');
yield return new KeyValuePair<int, string>(int.Parse(pair[0]), pair[1]);
}
}
}
private static Dictionary<int, string> _lookup = new Dictionary<int, string>();
private static IEnumerator<KeyValuePair<int, string>> _loader = null;
private string LookUp(int index)
{
if (_lookup.Count < CountOfEntries && !_lookup.ContainsKey(index))
{
if(_loader == null)
{
_loader = load().GetEnumerator();
}
while(_loader.MoveNext())
{
var pair = _loader.Current;
_lookup.Add(pair.Key,pair.Value);
if (pair.Key == index)
{
return index;
}
}
}
string name;
if (_lookup.TryGetValue(index,out name))
{
return return name;
}
throw new KeyNotFoundException("The given index was not found");
}
}
the code expectes the file to have one pair on each line like so:
index0,name0
index1,name1
If the first index sought is at the end this will perform slower probably (due to IO mainly) but if the access is random the average case woul be reading half of the values the first time if the access is not random make sure to keep the most used in the top of the file
there are a few more issues to considere. The above code is not threadsafe for the load operation and to increase responsiveness of the rest of the code keep the loading in a background thread
hope this helps
What about using a dictionary instead of two arrays? You could initialize the dictionary asynchronously using a thread or thread pool. The lookup would be O(1) instead of O(log(n)) as well.
public static class Lookup
{
private static readonly ManualResetEvent m_Initialized = new ManualResetEvent(false);
private static readonly Dictionary<int, string> m_Dictionary = new Dictionary<int, string>();
public static Lookup()
{
// Start an asynchronous operation to intialize the dictionary.
// You could use ThreadPool.QueueUserWorkItem instead of creating a new thread.
Thread thread = new Thread(() => { Initialize(); });
thread.Start();
}
public static string Lookup(int code)
{
m_Initialized.WaitOne();
lock (m_Dictionary)
{
return m_Dictionary[code];
}
}
private static void Initialize()
{
lock (m_Dictionary)
{
m_Dictionary.Add(0x0000, "Exclamation Point");
// Keep adding items to the dictionary here.
}
m_Initialized.Set();
}
}

Categories