When an online user send a message to a offline user , I keep those messages in a ConcurrentDictionary . each user is running/spinning in its own Task(thread).
public static ConcurrentDictionary<int, List<MSG>> DIC_PROFILEID__MSGS = new ConcurrentDictionary...
So the method look like :
/*1*/ public static void SaveLaterMessages(MSG msg)
/*2*/ {
/*3*/ var dic = Globals.DIC_PROFILEID__MSGS;
/*4*/
/*5*/
/*6*/ lock (saveLaterMssagesLocker)
/*7*/ {
/*8*/ List<MSG> existingLst;
/*9*/ if (!dic.TryGetValue(msg.To, out existingLst))
/*10*/ {
/*11*/ existingLst = new List<MSG>();
/*12*/ dic.TryAdd(msg.To, existingLst);
/*13*/ }
/*14*/ existingLst.Add(msg);
/*15*/ }
/*16*/ }
Please notice the lock at #6. I did this because if 2 threads are in #10 , they will both cause creation of a new List (which is bad).
But I'm bothered by the fact that I lock "too much".
In other words , If I send message to offline-user-20 , there is no reason that one will not be able to send message to offline-user-22.
So I'm thinking about creating additional dictionary of locks :
Dictionary <int , object> DicLocks = new Dictionary <int , object>();
Where the int key is the userID
Later , initialize each entry with new Object()
So now my method will look like :
public static void SaveLaterMessages(MSG msg)
{
var dic = Globals.DIC_PROFILEID__MSGS;
lock (Globals.DicLocks[msg.To]) //changed here !!!
{
List<MSG> existingLst;
if (!dic.TryGetValue(msg.To, out existingLst))
{
existingLst = new List<MSG>();
dic.TryAdd(msg.To, existingLst);
}
existingLst.Add(msg);
}
}
Now , users can inset messages to a different offline-users without interfering.
Question
1) Am I right with this approach , or is there any better approach ?
2) I really hate to lock around the ConcurrentDictionary, it is 100% not right. should I make it a regular dictionary ?
ConcurrentDictonary has tools to help you do the situation you are in, if you switch your retrieval/creation of your existing list to a single thread safe operation it makes the problem much simpler.
public static void SaveLaterMessages(MSG msg)
{
var dic = Globals.DIC_PROFILEID__MSGS;
List<MSG> existingLst = dic.GetOrAdd(msg.To, (key) => new List<MSG>());
lock(((ICollection)existingLst).SyncRoot)
{
existingLst.Add(msg);
}
}
This attempts to get the list from the dictionary and creates a new list if it did not exist, it then locks only on the non thread safe operation of adding to the list on the list object itself.
If possible a even better option is replace your List<MSG> with a thread safe collection like ConcurrentQueue<MSG> and you won't need to perform any locks at all (the ability to do this all depends on how messages are used once they are in the list). If you do need to use a List you don't need Globals.DicLocks[msg.To] to lock on, it is perfectly acceptable to lock on the list object that is returned from the collection.
One advantage you could get from a second lock object is if you are going to have lots of reads but very few writes you could use a ReaderWriterLockSlim to allow multiple concurrent readers but only one writer.
public static void SaveLaterMessages(MSG msg)
{
var dic = Globals.DIC_PROFILEID__MSGS;
List<MSG> existingLst = dic.GetOrAdd(msg.To, (key) => new List<MSG>());
var lockingObj = GetLockingObject(existingLst);
lockingObj.EnterWriteLock();
try
{
existingLst.Add(msg);
}
finally
{
lockingObj.ExitWriteLock();
}
}
private static ConcurrentDictionary<List<MSG>, ReaderWriterLockSlim> _msgLocks = new ConcurrentDictionary<List<MSG>, ReaderWriterLockSlim>();
public static ReaderWriterLockSlim GetLockingObject(List<MSG> msgList)
{
_msgLocks.GetOrAdd(msgList, (key) => new ReaderWriterLockSlim());
}
//Elsewhere in multiple threads.
public MSG PeekNewestMessage(int myId)
{
var dic = Globals.DIC_PROFILEID__MSGS;
var list = dic[myId];
var lockingObj = GetLockingObject(list);
lockingObj.EnterReadLock();
try
{
return list.FirstOrDefault();
}
finally
{
lockingObj.ExitReadLock();
}
}
However I would still recommend the ConcurrentQueue<MSG> approach over this approach.
You could also wrap your "List()" code in a class with a singleton pattern that manages the lifetime of the list per user and adding / removing from it, then your concurrent dictionary and support code becomes lock free. Your "List()" could also be a concurrent queue and further reduce the work you need to do around locking for future adds etc.
http://msdn.microsoft.com/en-us/library/dd267265(v=vs.110).aspx
For high volumes, you could also use a service bus type pattern to cross thread boundaries and create a queue for each user and push messages down it for consumption when the user comes back online.
Related
I have multiple threads writing data to a common source, and I would like two threads to block each other if and only if they are touching the same piece of data.
It would be nice to have a way to lock specifically on an arbitrary key:
string id = GetNextId();
AquireLock(id);
try
{
DoDangerousThing();
}
finally
{
ReleaseLock(id);
}
If nobody else is trying to lock the same key, I would expect they would be able to run concurrently.
I could achieve this with a simple dictionary of mutexes, but I would need to worry about evicting old, unused locks and that could become a problem if the set grows too large.
Is there an existing implementation of this type of locking pattern.
You can try using a ConcurrentDictionary<string, object> to create named object instances. When you need a new lock instance (that you haven't used before), you can add it to the dictionary (adding is an atomic operation through GetOrAdd) and then all threads can share the same named object once you pull it from the dictionary, based on your data.
For example:
// Create a global lock map for your lock instances.
public static ConcurrentDictionary<string, object> GlobalLockMap =
new ConcurrentDictionary<string, object> ();
// ...
var oLockInstance = GlobalLockMap.GetOrAdd ( "lock name", x => new object () );
if (oLockInstance == null)
{
// handle error
}
lock (oLockInstance)
{
// do work
}
You can use the ConcurrentDictionary<string, object> to create and reuse different locks. If you want to remove locks from the dictionary, and also to reopen in future the same named resource, you have always to check inside the critical region if the previously acquired lock has been removed or changed by other threads. And take care to remove the lock from the dictionary as the last step before leaving the critical region.
static ConcurrentDictionary<string, object> _lockDict =
new ConcurrentDictionary<string, object>();
// VERSION 1: single-shot method
public void UseAndCloseSpecificResource(string resourceId)
{
bool isSameLock;
object lockObj, lockObjCheck;
do
{
lock (lockObj = _lockDict.GetOrAdd(resourceId, new object()))
{
if (isSameLock = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
try
{
// ... open, use, and close resource identified by resourceId ...
// ...
}
finally
{
// This must be the LAST statement
_lockDict.TryRemove(resourceId, out lockObjCheck);
}
}
}
}
while (!isSameLock);
}
// VERSION 2: separated "use" and "close" methods
// (can coexist with version 1)
public void UseSpecificResource(string resourceId)
{
bool isSameLock;
object lockObj, lockObjCheck;
do
{
lock (lockObj = _lockDict.GetOrAdd(resourceId, new object()))
{
if (isSameLock = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
// ... open and use (or reuse) resource identified by resourceId ...
}
}
}
while (!isSameLock);
}
public bool TryCloseSpecificResource(string resourceId)
{
bool result = false;
object lockObj, lockObjCheck;
if (_lockDict.TryGetValue(resourceId, out lockObj))
{
lock (lockObj)
{
if (result = (_lockDict.TryGetValue(resourceId, out lockObjCheck) &&
object.ReferenceEquals(lockObj, lockObjCheck)))
{
try
{
// ... close resource identified by resourceId ...
// ...
}
finally
{
// This must be the LAST statement
_lockDict.TryRemove(resourceId, out lockObjCheck);
}
}
}
}
return result;
}
The lock keyword (MSDN) already does this.
When you lock, you pass the object to lock on:
lock (myLockObject)
{
}
This uses the Monitor class with the specific object to synchronize any threads using lock on the same object.
Since string literals are "interned" – that is, they are cached for reuse so that every literal with the same value is in fact the same object – you can also do this for strings:
lock ("TestString")
{
}
Since you aren't dealing with string literals you could intern the strings you read as described in: C#: Strings with same contents.
It would even work if the reference used was copied (directly or indirectly) from an interned string (literal or explicitly interned). But I wouldn't recommend it. This is very fragile and can lead to hard-to-debug problems, due to the ease with which new instances of a string having the same value as an interned string can be created.
A lock will only block if something else has entered the locked section on the same object. Thus, no need to keep a dictionary around, just the applicable lock objects.
Realistically though, you'll need to maintain a ConcurrentDictionary or similar to allow your objects to access the appropriate lock object.
There are a great number of articles available regarding thread safe caching, here's an example:
private static object _lock = new object();
public void CacheData()
{
SPListItemCollection oListItems;
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if(oListItems == null)
{
lock (_lock)
{
// Ensure that the data was not loaded by a concurrent thread
// while waiting for lock.
oListItems = (SPListItemCollection)Cache[“ListItemCacheName”];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
}
However, this example depends on the request for the cache also rebuilding the cache.
I'm looking for a solution where the request and rebuild are separate. Here's the scenario.
I have a web service that I want to monitor for certain types of error. If an error occurs, I create an monitor object and cache - it is updatable and is locked accordingly during update. Alls well so far.
Elsewhere, I check for the existence of the cached object, and the data it contains. This would work straight out of the box except for one particular scenario.
If the cache object is being updated - say a status change, I would like to wait and get the latest info rather than the current info, which if returned, would be out of date. So for my fetch code, I need to check if the object is currently being created/updating, and if so wait, then retry.
As I pointed out, there are many examples of cache locking patterns but I can't seem to find one that for this scenario. Any ideas as to how to go about this would be appreciated?
You can try the following code using two locks. Write lock in the setter is quite simple and protects cache from being written by more than one threads. The getter use a simple double-check lock.
Now, the trick is in Refresh() method, which uses the same lock as the getter. The method uses the lock and in the first step removes list from the cache. It will trigger any getter to fail the first null check and wait for the lock. The method in the meantime gets items, sets cache again and releases the lock.
When it comes back to the getter, it reads the cache again and now it contains the list.
public class CacheData
{
private static object _readLock = new object();
private static object _writeLock = new object();
public SPListItemCollection ListItem
{
get
{
var oListItems = (SPListItemCollection) Cache["ListItemCacheName"];
if (oListItems == null)
{
lock (_readLock)
{
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
return oListItems;
}
set
{
lock (_writeLock)
{
Cache.Add("ListItemCacheName", value, ..);
}
}
}
public void Refresh()
{
lock (_readLock)
{
Cache.Remove("ListItemCacheName");
var oListItems = DoQueryToReturnItems();
ListItem = oListItems;
}
}
}
You can make the method and property static, if you do not need CacheData instance.
Ok I was a little unsure on how best name this problem :) But assume this scenarion, you're
going out and fetching some webpage (with various urls) and caching it locally. The cache part is pretty easy to solve even with multiple threads.
However, imagine that one thread starts fetching an url, and a couple of milliseconds later another want to get the same url. Is there any good pattern for making the seconds thread's method wait on the first one to fetch the page , insert it into the cache and return it so you don't have to do multiple requests. With little enough overhead that it's worth doing even for requests that take about 300-700 ms? And without locking requests for other urls
Basically when requests for identical urls comes in tightly after each other I want the second request to "piggyback" the first request
I had some loose idea of having a dictionary where you insert an object with the key as url when you start fetching a page and lock on it. If there's any matching the key already it get's the object, locks on it and then tries to fetch the url for the actual cache.
I'm a little unsure of the particulars however to make it really thread-safe, using ConcurrentDictionary might be one part of it...
Is there any common pattern and solutions for scenarios like this?
Breakdown wrong behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Starts fetching the same url since it still doesn't exist in Cache
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Finishes and also inserts into cache (or discards it), returns the page
Breakdown correct behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Wants the same url, but sees it's currently being fetched so waits on thread 1
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Notices that thread 1 is finished and returns the page thread 1 it fetched
EDIT
Most solutions sofar seem to misunderstand the problem and only addressing the caching, as I said that isnt the problem, the problem is when doing an external web fetch to make the second fetch that is done before the first one has cached it to use the result from the first rather then doing a second
You could use a ConcurrentDictionary<K,V> and a variant of double-checked locking:
public static string GetUrlContent(string url)
{
object value1 = _cache.GetOrAdd(url, new object());
if (value1 == null) // null check only required if content
return null; // could legitimately be a null string
var urlContent = value1 as string;
if (urlContent != null)
return urlContent; // got the content
// value1 isn't a string which means that it's an object to lock against
lock (value1)
{
object value2 = _cache[url];
// at this point value2 will *either* be the url content
// *or* the object that we already hold a lock against
if (value2 != value1)
return (string)value2; // got the content
urlContent = FetchContentFromTheWeb(url); // todo
_cache[url] = urlContent;
return urlContent;
}
}
private static readonly ConcurrentDictionary<string, object> _cache =
new ConcurrentDictionary<string, object>();
EDIT: My code is quite a bit uglier now, but uses a separate lock per URL. This allows different URLs to be fetched asynchronously, however each URL will only be fetched once.
public class UrlFetcher
{
static Hashtable cache = Hashtable.Synchronized(new Hashtable());
public static String GetCachedUrl(String url)
{
// exactly 1 fetcher is created per URL
InternalFetcher fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
lock( cache.SyncRoot )
{
fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
fetcher = new InternalFetcher(url);
cache[url] = fetcher;
}
}
}
// blocks all threads requesting the same URL
return fetcher.Contents;
}
/// <summary>Each fetcher locks on itself and is initilized with null contents.
/// The first thread to call fetcher.Contents will cause the fetch to occur, and
/// block until completion.</summary>
private class InternalFetcher
{
private String url;
private String contents;
public InternalFetcher(String url)
{
this.url = url;
this.contents = null;
}
public String Contents
{
get
{
if( contents == null )
{
lock( this ) // "this" is an instance of InternalFetcher...
{
if( contents == null )
{
contents = FetchFromWeb(url);
}
}
}
return contents;
}
}
}
}
Will the Semaphore please stand up! stand up! stand up!
use Semaphore you can easily synchronize your threads with it.
on both cases where
you are trying to load a page that is currently being cached
you are saving cache to a file where a page is loading from it.
in both scenarios you will face troubles.
it is just like writers and readers problem that is a common problem in Operating System Racing Issues. just when a thread wants to rebuild a cache or start caching a page no thread should read from it. if a thread is reading it it should wait until reading finished and replace the cache, no 2 threads should cache same page in to a same file. hence it is possible for all readers to read from a cache at anytime since no writer is writing on it.
you should read some semaphore using samples on msdn, it is very easy to use. just the thread that wants to do something is call the semaphore and if the resource can granted it do the works otherwise sleeps and wait to be woken up when the resource is ready.
Disclaimer: This might be a n00bish answer. Please pardon me, if it is.
I'd recommend using some shared dictionary object with locks to keep a track of the url being currently fetched or have already been fetched.
At every request, check the url against this object.
If an entry for the url is present, check the cache. (this means one of the threads has either fetched it or is currently fetching it)
If its available in the cache, use it, else put the current thread to sleep for a while and check back again. (if not in cache, some thread is still fetching it, so wait while its done)
If the entry is not found in the dictionary object, add the url to it and send the request. Once it obtains a response, add it to cache.
This logic should work, however, you would need to take care of cache expiration and removal of the entry from the dictionary object.
my solution is use atomicBoolean to control access database when cache is timeout or unexist;
at the same moment, only one thread(i call it read-th) can access database, the other threads spin until the read-th return data and write it into cache;
here codes; implement by java;
public class CacheBreakDownDefender<K, R> {
/**
* false = do not write null to cache when get null value from database;
*/
private final boolean writeNullToCache;
/**
* cache different query key
*/
private final ConcurrentHashMap<K, AtomicBoolean> selectingDBTagMap = new ConcurrentHashMap<>();
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(false));
}
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType, boolean writeNullToCache) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(writeNullToCache));
}
private CacheBreakDownDefender(boolean writeNullToCache) {
this.writeNullToCache = writeNullToCache;
}
public R readFromCache(K key, Function<K, ? extends R> getFromCache, Function<K, ? extends R> getFromDB, BiConsumer<K, R> writeCache) throws InterruptedException {
R result = getFromCache.apply(key);
if (result == null) {
final AtomicBoolean selectingDB = selectingDBTagMap.computeIfAbsent(key, x -> new AtomicBoolean(false));
if (selectingDB.compareAndSet(false, true)) {
try {
result = getFromDB.apply(key);
if (result != null || writeNullToCache) {
writeCache.accept(key, result);
}
} finally {
selectingDB.getAndSet(false);
selectingDBTagMap.remove(key);
}
} else {
while (selectingDB.get()) {
TimeUnit.MILLISECONDS.sleep(0L);
//do nothing...
}
return getFromCache.apply(key);
}
}
return result;
}
public static void main(String[] args) throws InterruptedException {
Map<String, String> map = new ConcurrentHashMap<>();
CacheBreakDownDefender<String, String> instance = CacheBreakDownDefender.getInstance(String.class, String.class, true);
for (int i = 0; i < 9; i++) {
int finalI = i;
new Thread(() -> {
String kele = null;
try {
if (finalI == 6) {
kele = instance.readFromCache("kele2", map::get, key -> "helloword2", map::put);
} else
kele = instance.readFromCache("kele", map::get, key -> "helloword", map::put);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("resut= {}", kele);
}).start();
}
TimeUnit.SECONDS.sleep(2L);
}
}
This is not exactly for concurrent caches but for all caches:
"A cache with a bad policy is another name for a memory leak" (Raymond Chen)
While i was looking at some legacy application code i noticed it is using a string object to do thread synchronization. I'm trying to resolve some thread contention issues in this program and was wondering if this could lead so some strange situations. Any thoughts ?
private static string mutex= "ABC";
internal static void Foo(Rpc rpc)
{
lock (mutex)
{
//do something
}
}
Strings like that (from the code) could be "interned". This means all instances of "ABC" point to the same object. Even across AppDomains you can point to the same object (thx Steven for the tip).
If you have a lot of string-mutexes, from different locations, but with the same text, they could all lock on the same object.
The intern pool conserves string storage. If you assign a literal string constant to several variables, each variable is set to reference the same constant in the intern pool instead of referencing several different instances of String that have identical values.
It's better to use:
private static readonly object mutex = new object();
Also, since your string is not const or readonly, you can change it. So (in theory) it is possible to lock on your mutex. Change mutex to another reference, and then enter a critical section because the lock uses another object/reference. Example:
private static string mutex = "1";
private static string mutex2 = "1"; // for 'lock' mutex2 and mutex are the same
private static void CriticalButFlawedMethod() {
lock(mutex) {
mutex += "."; // Hey, now mutex points to another reference/object
// You are free to re-enter
...
}
}
To answer your question (as some others already have), there are some potential problems with the code example you provided:
private static string mutex= "ABC";
The variable mutex is not immutable.
The string literal "ABC" will refer to the same interned object reference everywhere in your application.
In general, I would advise against locking on strings. However, there is a case I've ran into where it is useful to do this.
There have been occasions where I have maintained a dictionary of lock objects where the key is something unique about some data that I have. Here's a contrived example:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
ConcurrentDictionary<int, object> _locks = new ConcurrentDictionary<int, object>();
void DoSomething(SomeEntity entity)
{
var mutex = _locks.GetOrAdd(entity.Id, id => new object());
lock(mutex)
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The goal of code like this is to serialize concurrent invocations of DoSomething() within the context of the entity's Id. The downside is the dictionary. The more entities there are, the larger it gets. It's also just more code to read and think about.
I think .NET's string interning can simplify things:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
void DoSomething(SomeEntity entity)
{
lock(string.Intern("dee9e550-50b5-41ae-af70-f03797ff2a5d:" + entity.Id))
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The difference here is that I am relying on the string interning to give me the same object reference per entity id. This simplifies my code because I don't have to maintain the dictionary of mutex instances.
Notice the hard-coded UUID string that I'm using as a namespace. This is important if I choose to adopt the same approach of locking on strings in another area of my application.
Locking on strings can be a good idea or a bad idea depending on the circumstances and the attention that the developer gives to the details.
If you need to lock a string, you can create an object that pairs the string with an object that you can lock with.
class LockableString
{
public string _String;
public object MyLock; //Provide a lock to the data in.
public LockableString()
{
MyLock = new object();
}
}
My 2 cents:
ConcurrentDictionary is 1.5X faster than interned strings. I did a benchmark once.
To solve the "ever-growing dictionary" problem you can use a dictionary of semaphores instead of a dictionary of objects. AKA use ConcurrentDictionary<string, SemaphoreSlim> instead of <string, object>. Unlike the lock statements, Semaphores can track how many threads have locked on them. And once all the locks are released - you can remove it from the dictionary. See this question for solutions like that: Asynchronous locking based on a key
Semaphores are even better because you can even control the concurrency level. Like, instead of "limiting to one concurrent run" - you can "limit to 5 concurrent runs". Awesome free bonus isn't it? I had to code an email-service that needed to limit the number of concurrent connections to a server - this came very very handy.
I imagine that locking on interned strings could lead to memory bloat if the strings generated are many and are all unique. Another approach that should be more memory efficient and solve the immediate deadlock issue is
// Returns an Object to Lock with based on a string Value
private static readonly ConditionalWeakTable<string, object> _weakTable = new ConditionalWeakTable<string, object>();
public static object GetLock(string value)
{
if (value == null) throw new ArgumentNullException(nameof(value));
return _weakTable.GetOrCreateValue(value.ToLower());
}
I have a class with a couple static arrays:
an int[] with 17,720 elements
a string[] with 17,720 elements
I noticed when I first access this class it takes almost 2 seconds to initialize, which causes a pause in the GUI that's accessing it.
Specifically, it's a lookup for Unicode character names. The first array is an index into the second array.
static readonly int[] NAME_INDEX = {
0x0000, 0x0001, 0x0005, 0x002C, 0x003B, ...
static readonly string[] NAMES = {
"Exclamation Mark", "Digit Three", "Semicolon", "Question Mark", ...
The following code is how the arrays are used (given a character code). [Note: This code isn't a performance problem]
int nameIndex = Array.BinarySearch<int>(NAME_INDEX, code);
if (nameIndex > 0)
{
return NAMES[nameIndex];
}
I guess I'm looking at other options on how to structure the data so that 1) The class is quickly loaded, and 2) I can quickly get the "name" for a given character code.
Should I not be storing all these thousands of elements in static arrays?
Update
Thanks for all the suggestions. I've tested out a Dictionary approach and the performance of adding all the entries seems to be really poor.
Here is some code with the Unicode data to test out Arrays vs Dictionaries
http://drop.io/fontspace/asset/fontspace-unicodesupport-zip
Solution Update
I tested out my original dual arrays (which are faster than both dictionary options) with a background thread to initialize and that helped performance a bit.
However, the real surprise is how well the binary files in resource streams works. It is the fastest solution discussed in this thread. Thanks everyone for your answers!
So a couple of observations. Binary Search is only going to work if your array is sorted, and from your above code snippet, it doesn't look to be sorted.
Since your primary goal is to find a specific name, your code is begging for a hash table. I would suggest using a Dictionary, it will give you O(1) (on average) lookup, without much more overhead than just having the arrays.
As for the load time, I agree with Andrey that the best way is going to be by using a separate thread. You are going to have some initialization overhead when using the amount of data you are using. Normal practice with GUIs is to use a separate thread for these activites so you don't lock up the UI.
First
A Dictionary<int, string> is going to perform far better than your duelling arrays will. Putting aside how this data gets into the arrays/Dictionary (hardcoded vs. read in from another location, like a resource file), this is still a better and more intuitive storage mechanism
Second
As others have suggested, do your loading in another thread. I'd use a helper function to help you deal with this. You could use an approach like this:
public class YourClass
{
private static Dictionary<int, string> characterLookup;
private static ManualResetEvent lookupCreated;
static YourClass()
{
lookupCreated = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(LoadLookup);
}
static void LoadLookup(object garbage)
{
// add your pairs by calling characterLookup.Add(...)
lookupCreated.Set();
}
public static string GetDescription(int code)
{
if (lookupCreated != null)
{
lookupCreated.WaitOne();
lookupCreated.Close();
lookupCreated = null;
}
string output;
if(!characterLookup.TryGetValue(code, out output)) output = null;
return output;
}
}
In your code, call GetDescription in order to translate your integer into the corresponding string. If the UI doesn't call this until later, then you should see a marked decrease in startup time. To be safe, though, I've included a ManualResetEvent that will cause any calls to GetDescription to block until the dictionary has been fully loaded.
"Should I not be storing all these thousands of elements in static arrays?"
A much better way would be to store your data as binary stream in resources in the assembly and then load from the resources. Will be some more programming overhead but therefore doesn't need any object initialization.
Basic idea would be (no real code):
// Load data (two streams):
indices = ResourceManager.GetStream ("indexData");
strings = ResourceManager.GetStream ("stringData");
// Retrieving an entry:
stringIndex = indices.GetIndexAtPosition (char);
string = strings.GetStringFromPosition (stringIndex);
If you want a really good solution (for even some more work) look into using memmapped data files.
Initialize your arrays in separate thread that will not lock the UI
http://msdn.microsoft.com/en-us/library/hz49h034.aspx
if you store the arrays in a file you could do a lazy load
public class Class1
{
const int CountOfEntries = 17700; //or what ever the count is
IEnumerable<KeyValuePair<int, string>> load()
{
using (var reader = File.OpenText("somefile"))
{
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
var pair = line.Split(',');
yield return new KeyValuePair<int, string>(int.Parse(pair[0]), pair[1]);
}
}
}
private static Dictionary<int, string> _lookup = new Dictionary<int, string>();
private static IEnumerator<KeyValuePair<int, string>> _loader = null;
private string LookUp(int index)
{
if (_lookup.Count < CountOfEntries && !_lookup.ContainsKey(index))
{
if(_loader == null)
{
_loader = load().GetEnumerator();
}
while(_loader.MoveNext())
{
var pair = _loader.Current;
_lookup.Add(pair.Key,pair.Value);
if (pair.Key == index)
{
return index;
}
}
}
string name;
if (_lookup.TryGetValue(index,out name))
{
return return name;
}
throw new KeyNotFoundException("The given index was not found");
}
}
the code expectes the file to have one pair on each line like so:
index0,name0
index1,name1
If the first index sought is at the end this will perform slower probably (due to IO mainly) but if the access is random the average case woul be reading half of the values the first time if the access is not random make sure to keep the most used in the top of the file
there are a few more issues to considere. The above code is not threadsafe for the load operation and to increase responsiveness of the rest of the code keep the loading in a background thread
hope this helps
What about using a dictionary instead of two arrays? You could initialize the dictionary asynchronously using a thread or thread pool. The lookup would be O(1) instead of O(log(n)) as well.
public static class Lookup
{
private static readonly ManualResetEvent m_Initialized = new ManualResetEvent(false);
private static readonly Dictionary<int, string> m_Dictionary = new Dictionary<int, string>();
public static Lookup()
{
// Start an asynchronous operation to intialize the dictionary.
// You could use ThreadPool.QueueUserWorkItem instead of creating a new thread.
Thread thread = new Thread(() => { Initialize(); });
thread.Start();
}
public static string Lookup(int code)
{
m_Initialized.WaitOne();
lock (m_Dictionary)
{
return m_Dictionary[code];
}
}
private static void Initialize()
{
lock (m_Dictionary)
{
m_Dictionary.Add(0x0000, "Exclamation Point");
// Keep adding items to the dictionary here.
}
m_Initialized.Set();
}
}