I have an application where I need to create files with a unique and sequential number as part of the file name. My first thought was to use (since this application does not have any other data storage) a text file that would contain a number and I would increment this number so then my application would always create a file with a unique id.
Then I thought that maybe at a time when there are more than one user submitting to this application at the same time, one process might be reading the txt file before it has been written by the previous process. So then I am looking for a way to read and write to a file (with try catch so then I can know when it's being used by another process and then wait and try to read from it a few other times) in the same 'process' without unlocking the file in between.
If what I am saying above sounds like a bad option, could you please give me an alternative to this? How would you then keep track of unique identification numbers for an application like my case?
Thanks.
If it's a single application then you can store the current number in your application settings. Load that number at startup. Then with each request you can safely increment it and use the result. Save the sequential number when the program shuts down. For example:
private int _fileNumber;
// at application startup
_fileNumber = LoadFileNumberFromSettings();
// to increment
public int GetNextFile()
{
return Interlocked.Increment(ref _fileNumber);
}
// at application shutdown
SaveFileNumberToSettings(_fileNumber);
Or, you might want to make sure that the file number is saved whenever it's incremented. If so, change your GetNextFile method:
private readonly object _fileLock = new object();
public int GetNextFile()
{
lock (_fileLock)
{
int result = ++_fileNumber;
SaveFileNumbertoSettings(_fileNumber);
return result;
}
}
Note also that it might be reasonable to use the registry for this, rather than a file.
Edit: As Alireza pointed in the comments, it is not a valid way to lock between multiple applications.
You can always lock the access to the file (so you won't need to rely on exceptions).
e.g:
// Create a lock in your class
private static object LockObject = new object();
// and then lock on this object when you access the file like this:
lock(LockObject)
{
... access to the file
}
Edit2: It seems that you can use Mutex to perform inter-application signalling.
private static System.Threading.Mutex m = new System.Threading.Mutex(false, "LockMutex");
void AccessMethod()
{
try
{
m.WaitOne();
// Access the file
}
finally
{
m.ReleaseMutex();
}
}
But it's not the best pattern to generate unique ids. Maybe a sequence in a database would be better ? If you don't have a database, you can use Guids or a local database (even Access would be better I think)
I would prefer a complex and universal solution with the global mutex. It uses a mutex with name prefixed with "Global\" which makes it system-wide i.e. one mutex instance is shared across all processes. if your program runs in friendly environment or you can specify strict permissions limited to a user account you can trust then it works well.
Keep in mind that this solution is not transactional and is not protected against thread-abortion/process-termination.
Not transactional means that if your process/thread is caught in the middle of storage file modification and is terminated/aborted then the storage file will be left in unknown state. For instance it can be left empty. You can protect yourself against loss of data (loss of last used index) by writing the new value first, saving the file and only then removing the previous value. Reading procedure should expect a file with multiple numbers and should take the greatest.
Not protected against thread-abortion means that if a thread which obtained the mutex is aborted unexpectedly and/or you do not have proper exception handling then the mutex could stay locked for the life of the process that created that thread. In order to make solution abort-protected you will have to implement timeouts on obtaining the lock i.e. replace the following line which waits forever
blnResult = iLock.Mutex.WaitOne();
with something with timeout.
Summing this up I try to say that if you are looking for a really robust solution you will come to utilizing some kind of a transactional database or write a kind of such a database yourself :)
Here is the working code without timeout handling (I do not need it in my solution). It is robust enough to begin with.
using System;
using System.IO;
using System.Security.AccessControl;
using System.Security.Principal;
using System.Threading;
namespace ConsoleApplication31
{
class Program
{
//You only need one instance of that Mutex for each application domain (commonly each process).
private static SMutex mclsIOLock;
static void Main(string[] args)
{
//Initialize the mutex. Here you need to know the path to the file you use to store application data.
string strEnumStorageFilePath = Path.Combine(
Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData),
"MyAppEnumStorage.txt");
mclsIOLock = IOMutexGet(strEnumStorageFilePath);
}
//Template for the main processing routine.
public static void RequestProcess()
{
//This flag is used to protect against unwanted lock releases in case of recursive routines.
bool blnLockIsSet = false;
try
{
//Obtain the lock.
blnLockIsSet = IOLockSet(mclsIOLock);
//Read file data, update file data. Do not put much of long-running code here.
//Other processes may be waiting for the lock release.
}
finally
{
//Release the lock if it was obtained in this particular call stack frame.
IOLockRelease(mclsIOLock, blnLockIsSet);
}
//Put your long-running code here.
}
private static SMutex IOMutexGet(string iMutexNameBase)
{
SMutex clsResult = null;
clsResult = new SMutex();
string strSystemObjectName = #"Global\" + iMutexNameBase.Replace('\\', '_');
//Give permissions to all authenticated users.
SecurityIdentifier clsAuthenticatedUsers = new SecurityIdentifier(WellKnownSidType.AuthenticatedUserSid, null);
MutexSecurity clsMutexSecurity = new MutexSecurity();
MutexAccessRule clsMutexAccessRule = new MutexAccessRule(
clsAuthenticatedUsers,
MutexRights.FullControl,
AccessControlType.Allow);
clsMutexSecurity.AddAccessRule(clsMutexAccessRule);
//Create the mutex or open an existing one.
bool blnCreatedNew;
clsResult.Mutex = new Mutex(
false,
strSystemObjectName,
out blnCreatedNew,
clsMutexSecurity);
clsResult.IsMutexHeldByCurrentAppDomain = false;
return clsResult;
}
//Release IO lock.
private static void IOLockRelease(
SMutex iLock,
bool? iLockIsSetInCurrentStackFrame = null)
{
if (iLock != null)
{
lock (iLock)
{
if (iLock.IsMutexHeldByCurrentAppDomain &&
(!iLockIsSetInCurrentStackFrame.HasValue ||
iLockIsSetInCurrentStackFrame.Value))
{
iLock.MutexOwnerThread = null;
iLock.IsMutexHeldByCurrentAppDomain = false;
iLock.Mutex.ReleaseMutex();
}
}
}
}
//Set the IO lock.
private static bool IOLockSet(SMutex iLock)
{
bool blnResult = false;
try
{
if (iLock != null)
{
if (iLock.MutexOwnerThread != Thread.CurrentThread)
{
blnResult = iLock.Mutex.WaitOne();
iLock.IsMutexHeldByCurrentAppDomain = blnResult;
if (blnResult)
{
iLock.MutexOwnerThread = Thread.CurrentThread;
}
else
{
throw new ApplicationException("Failed to obtain the IO lock.");
}
}
}
}
catch (AbandonedMutexException iMutexAbandonedException)
{
blnResult = true;
iLock.IsMutexHeldByCurrentAppDomain = true;
iLock.MutexOwnerThread = Thread.CurrentThread;
}
return blnResult;
}
}
internal class SMutex
{
public Mutex Mutex;
public bool IsMutexHeldByCurrentAppDomain;
public Thread MutexOwnerThread;
}
}
Related
Here's the situation.
I have an application which for all intents and purposes I have to treat like a black box.
I need to be able to open multiple instances of this application each with a set of files. The syntax for opening this is executable.exe file1.ext file2.ext.
If I run executable.exe x amount of times with no arguments, new instances open fine.
If I run executable.exe file1.ext followed by executable.exe file2.ext then the second call opens file 2 in the existing window rather than creating a new instance. This interferes with the rest of my solution and is the problem.
My solution wraps this application and performs various management operations on it, here's one of my wrapper classes:
public class myWrapper
{
public event EventHandler<IntPtr> SplashFinished;
public event EventHandler ProcessExited;
private const string aaTrendLocation = #"redacted";
//private const string aaTrendLocation = "notepad";
private readonly Process _process;
private readonly Logger _logger;
public myWrapper(string[] args, Logger logger =null)
{
_logger = logger;
_logger?.WriteLine("Intiialising new wrapper object...");
if (args == null || args.Length < 1) args = new[] {""};
ProcessStartInfo info = new ProcessStartInfo(aaTrendLocation,args.Aggregate((s,c)=>$"{s} {c}"));
_process = new Process{StartInfo = info};
}
public void Start()
{
_logger?.WriteLine("Starting process...");
_logger?.WriteLine($"Process: {_process.StartInfo.FileName} || Args: {_process.StartInfo.Arguments}");
_process.Start();
Task.Run(()=>MonitorSplash());
Task.Run(() => MonitorLifeTime());
}
private void MonitorLifeTime()
{
_logger?.WriteLine("Monitoring lifetime...");
while (!_process.HasExited)
{
_process.Refresh();
Thread.Sleep(50);
}
_logger?.WriteLine("Process exited!");
_logger?.WriteLine("Invoking!");
ProcessExited?.BeginInvoke(this, null, null, null);
}
private void MonitorSplash()
{
_logger?.WriteLine("Monitoring Splash...");
while (!_process.MainWindowTitle.Contains("Trend"))
{
_process.Refresh();
Thread.Sleep(500);
}
_logger?.WriteLine("Splash finished!");
_logger?.WriteLine("Invoking...");
SplashFinished?.BeginInvoke(this,_process.MainWindowHandle,null,null);
}
public void Stop()
{
_logger?.WriteLine("Killing trend...");
_process.Kill();
}
public IntPtr GetHandle()
{
_logger?.WriteLine("Fetching handle...");
_process.Refresh();
return _process.MainWindowHandle;
}
public string GetMainTitle()
{
_logger?.WriteLine("Fetching Title...");
_process.Refresh();
return _process.MainWindowTitle;
}
}
My wrapper class all works fine until I start providing file arguments and this unexpected instancing behaviour kicks in.
I can't modify the target application and nor do I have access to its source to determine whether this instancing is managed with Mutexes or through some other feature. Consequently, I need to determine if there is a way to prevent the new instance seeing the existing one. Would anyone have any suggestions?
TLDR: How do I prevent an application that is limited to a single instance determining that there is already an instance running
To clarify (following suspicious comments), my company's R&D team wrote executable.exe but I don't have time to wait for their help in this matter (I have days not months) and have permission to do whatever required to deliver the required functionality (there's a lot more to my solution than this question mentions) swiftly.
With some decompiling work I can see that the following is being used to find the existing instance.
Process[] processesByName = Process.GetProcessesByName(Process.GetCurrentProcess().ProcessName);
Is there any way to mess with this short of creating multiple copies of the application with different names? I looked into renaming the Process on the fly but apparently this isn't possible short of writing kernel exploits...
I have solved this problem in the past by creating copies of the source executable. In your case, you could:
Save the 'original.exe' in a specific location.
Each time you need to call it, create a copy of original.exe and name it 'instance_xxxx.exe', where xxxx is a unique number.
Execute your new instance exe as required, and when it completes you can delete it
You could possibly even re-use the instances by creating a pool of them
Building on Dave Lucre's answer I solved it by creating new instances of the executable bound to my wrapper class. Initially, I inherited IDisposable and removed the temporary files in the Disposer but for some reason that was causing issues where the cleanup would block the application, so now my main program performs cleanup at the end.
My constructor now looks like:
public AaTrend(string[] args, ILogger logger = null)
{
_logger = logger;
_logger?.WriteLine("Initialising new aaTrend object...");
if (args == null || args.Length < 1) args = new[] { "" };
_tempFilePath = GenerateTempFileName();
CreateTempCopy(); //Needed to bypass lazy single instance checks
HideTempFile(); //Stops users worrying
ProcessStartInfo info = new ProcessStartInfo(_tempFilePath, args.Aggregate((s, c) => $"{s} {c}"));
_process = new Process { StartInfo = info };
}
With the two new methods:
private void CreateTempCopy()
{
_logger?.WriteLine("Creating temporary file...");
_logger?.WriteLine(_tempFilePath);
File.Copy(AaTrendLocation, _tempFilePath);
}
private string GenerateTempFileName(int increment = 0)
{
string directory = Path.GetDirectoryName(AaTrendLocation); //Obtain pass components.
string fileNameWithoutExtension = Path.GetFileNameWithoutExtension(AaTrendLocation);
string extension = Path.GetExtension(AaTrendLocation);
string tempName = $"{directory}\\{fileNameWithoutExtension}-{increment}{extension}"; //Re-assemble path with increment inserted.
return File.Exists(tempName) ? GenerateTempFileName(++increment) : tempName; //If this name is already used, increment an recurse otherwise return new path.
}
Then in my main program:
private static void DeleteTempFiles()
{
string dir = Path.GetDirectoryName(AaTrend.AaTrendLocation);
foreach (string file in Directory.GetFiles(dir, "aaTrend-*.exe", SearchOption.TopDirectoryOnly))
{
File.Delete(file);
}
}
As a side-note, this approach will only work for applications with (lazy) methods of determining instancing that rely on Process.GetProcessByName(); it won't work if a Mutex is used or if the executable name is explicitly set in the manifests.
There are a great number of articles available regarding thread safe caching, here's an example:
private static object _lock = new object();
public void CacheData()
{
SPListItemCollection oListItems;
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if(oListItems == null)
{
lock (_lock)
{
// Ensure that the data was not loaded by a concurrent thread
// while waiting for lock.
oListItems = (SPListItemCollection)Cache[“ListItemCacheName”];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
}
However, this example depends on the request for the cache also rebuilding the cache.
I'm looking for a solution where the request and rebuild are separate. Here's the scenario.
I have a web service that I want to monitor for certain types of error. If an error occurs, I create an monitor object and cache - it is updatable and is locked accordingly during update. Alls well so far.
Elsewhere, I check for the existence of the cached object, and the data it contains. This would work straight out of the box except for one particular scenario.
If the cache object is being updated - say a status change, I would like to wait and get the latest info rather than the current info, which if returned, would be out of date. So for my fetch code, I need to check if the object is currently being created/updating, and if so wait, then retry.
As I pointed out, there are many examples of cache locking patterns but I can't seem to find one that for this scenario. Any ideas as to how to go about this would be appreciated?
You can try the following code using two locks. Write lock in the setter is quite simple and protects cache from being written by more than one threads. The getter use a simple double-check lock.
Now, the trick is in Refresh() method, which uses the same lock as the getter. The method uses the lock and in the first step removes list from the cache. It will trigger any getter to fail the first null check and wait for the lock. The method in the meantime gets items, sets cache again and releases the lock.
When it comes back to the getter, it reads the cache again and now it contains the list.
public class CacheData
{
private static object _readLock = new object();
private static object _writeLock = new object();
public SPListItemCollection ListItem
{
get
{
var oListItems = (SPListItemCollection) Cache["ListItemCacheName"];
if (oListItems == null)
{
lock (_readLock)
{
oListItems = (SPListItemCollection)Cache["ListItemCacheName"];
if (oListItems == null)
{
oListItems = DoQueryToReturnItems();
Cache.Add("ListItemCacheName", oListItems, ..);
}
}
}
return oListItems;
}
set
{
lock (_writeLock)
{
Cache.Add("ListItemCacheName", value, ..);
}
}
}
public void Refresh()
{
lock (_readLock)
{
Cache.Remove("ListItemCacheName");
var oListItems = DoQueryToReturnItems();
ListItem = oListItems;
}
}
}
You can make the method and property static, if you do not need CacheData instance.
In C# (.NET), can two threads running in the same application have DIFFERENT "WorkingFolders"??
As best I can tell, the answer would be "NO". I think the WORKING DIR is set by the PROCESS in Win32.. Am I wrong here?
According to the following test code, (as well the Win32 SetCurrentDirectory API call), this is NOT possible, but has anyone figured out a way to MAKE it possible?
using System;
using System.Threading;
public class TestClass {
public ManualResetEvent _ThreadDone = new ManualResetEvent(false);
public static void Main() {
Console.WriteLine(Environment.CurrentDirectory);
Thread _Thread = new Thread(new ParameterizedThreadStart(Go));
TestClass test = new TestClass();
_Thread.Start(test);
if(test._ThreadDone.WaitOne()) {
Console.WriteLine("Thread done. Checking Working Dir...");
Console.WriteLine(Environment.CurrentDirectory);
}
}
public static void Go(object instance) {
TestClass m_Test = instance as TestClass;
Console.WriteLine(Environment.CurrentDirectory);
System.IO.Directory.SetCurrentDirectory("L:\\Projects\\");
Console.WriteLine(Environment.CurrentDirectory);
m_Test._ThreadDone.Set();
}
}
I know SOMEONE out there has to have ran across this before!
I'm going to guess what you're trying to do is to make code such as File.Open("Foo.txt") behave differently on different threads. Can you do this? The short answer is No - nor should you be trying to do this. On Windows, the current working directory is set at the process level. The .NET framework does not violate that rule.
A better approach would be to create an abstraction on top of Environment.CurrentDirectory that is thread specific. Something like:
public static class ThreadEnvironment
{
[ThreadStatic]
static string _currentDir;
public static string CurrentDirectory
{
get
{
if (_currentDir == null) // If Current Directory has not been set on this thread yet, set it to the process default
{
_currentDir = Environment.CurrentDirectory;
}
return _currentDir;
}
set
{
if (value == null)
throw new ArgumentException("Cannot set Current Directory to null.");
_currentDir = value;
}
}
}
You can then refer to ThreadEnvironment.CurrentDirectory to get that thread's current directory, which will default to the process directory if it has not been set on that thread. For example:
static void Main(string[] args)
{
(new Thread(Thread1)).Start();
(new Thread(Thread2)).Start();
}
static void Thread1()
{
Console.WriteLine("Thread1 Working Dir is: {0}", ThreadEnvironment.CurrentDirectory);
ThreadEnvironment.CurrentDirectory = #"C:\";
Console.WriteLine("Thread1 Working Dir is: {0}", ThreadEnvironment.CurrentDirectory);
}
static void Thread2()
{
Console.WriteLine("Thread2 Working Dir is: {0}", ThreadEnvironment.CurrentDirectory);
ThreadEnvironment.CurrentDirectory = #"C:\Windows";
Console.WriteLine("Thread2 Working Dir is: {0}", ThreadEnvironment.CurrentDirectory);
}
You would, of course, then need to qualify that path whenever dealing with file IO, however this is arguably a safer design anyway.
has anyone figured out a way to MAKE it possible?
It's simply not possible. You can't even have different working directories per App Domain.
The windows rule is: one Environment set per Process. Running in .NET won't change the basic rules.
Instead of that, if you experienced problem in loading assemblies, consider adding the corresponding folder to the PATH environment variable.
I'm building a T4 template that will help people construct Azure queues in a consistent and simple manner. I'd like to make this self-documenting, and somewhat consistent.
First I made the queue name at the top of the file, the queue names have to be in lowercase so I added ToLower()
The public constructor uses the built-in StorageClient API's to access the connection strings. I've seen many different approaches to this, and would like to get something that works in almost all situations. (ideas? do share)
I dislike the unneeded HTTP requests to check if the queues have been created so I made is a static bool . I didn't implement a Lock(monitorObject) since I don't think one is needed.
Instead of using a string and parsing it with commas (like most MSDN documentation) I'm serializing the object when passing it into the queue.
For further optimization I'm using a JSON serializer extension method to get the most out of the 8k limit. Not sure if an encoding will help optimize this any more
Added retry logic to handle certain scenarios that occur with the queue (see html link)
Q: Is "DataContext" appropriate name for this class?
Q: Is it a poor practice to name the Queue Action Name in the manner I have done?
What additional changes do you think I should make?
public class AgentQueueDataContext
{
// Queue names must always be in lowercase
// Is named like a const, but isn't one because .ToLower won't compile...
static string AGENT_QUEUE_ACTION_NAME = "AgentQueueActions".ToLower();
static bool QueuesWereCreated { get; set; }
DataModel.SecretDataSource secDataSource = null;
CloudStorageAccount cloudStorageAccount = null;
CloudQueueClient cloudQueueClient = null;
CloudQueue queueAgentQueueActions = null;
static AgentQueueDataContext()
{
QueuesWereCreated = false;
}
public AgentQueueDataContext() : this(false)
{
}
public AgentQueueDataContext(bool CreateQueues)
{
// This pattern of setting up queues is from:
// ttp://convective.wordpress.com/2009/11/15/queues-azure-storage-client-v1-0/
//
this.cloudStorageAccount = CloudStorageAccount.FromConfigurationSetting("DataConnectionString");
this.cloudQueueClient = cloudStorageAccount.CreateCloudQueueClient();
this.secDataSource = new DataModel.SecretDataSource();
queueAgentQueueActions = cloudQueueClient.GetQueueReference(AGENT_QUEUE_ACTION_NAME);
if (QueuesWereCreated == false || CreateQueues)
{
queueAgentQueueActions.CreateIfNotExist();
QueuesWereCreated = true;
}
}
// This is the method that will be spawned using ThreadStart
public void CheckQueue()
{
while (true)
{
try
{
CloudQueueMessage msg = queueAgentQueueActions.GetMessage();
bool DoRetryDelayLogic = false;
if (msg != null)
{
// Deserialize using JSON (allows more data to be stored)
AgentQueueEntry actionableMessage = msg.AsString.FromJSONString<AgentQueueEntry>();
switch (actionableMessage.ActionType)
{
case AgentQueueActionEnum.EnrollNew:
{
// Add to
break;
}
case AgentQueueActionEnum.LinkToSite:
{
// Link within Agent itself
// Link within Site
break;
}
case AgentQueueActionEnum.DisableKey:
{
// Disable key in site
// Disable key in AgentTable (update modification time)
break;
}
default:
{
break;
}
}
//
// Only delete the message if the requested agent has been missing for
// at least 10 minutes
//
if (DoRetryDelayLogic)
{
if (msg.InsertionTime != null)
if (msg.InsertionTime < DateTime.UtcNow + new TimeSpan(0, 10, 10))
continue;
// ToDo: Log error: AgentID xxx has not been found in table for xxx minutes.
// It is likely the result of a the registratoin host crashing.
// Data is still consistent. Deleting queued message.
}
//
// If execution made it to this point, then we are either fully processed, or
// there is sufficent reason to discard the message.
//
try
{
queueAgentQueueActions.DeleteMessage(msg);
}
catch (StorageClientException ex)
{
// As of July 2010, this is the best way to detect this class of exception
// Description: ttp://blog.smarx.com/posts/deleting-windows-azure-queue-messages-handling-exceptions
if (ex.ExtendedErrorInformation.ErrorCode == "MessageNotFound")
{
// pop receipt must be invalid
// ignore or log (so we can tune the visibility timeout)
}
else
{
// not the error we were expecting
throw;
}
}
}
else
{
// allow control to fall to the bottom, where the sleep timer is...
}
}
catch (Exception e)
{
// Justification: Thread must not fail.
//Todo: Log this exception
// allow control to fall to the bottom, where the sleep timer is...
// Rationale: not doing so may cause queue thrashing on a specific corrupt entry
}
// todo: Thread.Sleep() is bad
// Replace with something better...
Thread.Sleep(9000);
}
Q: Is "DataContext" appropriate name for this class?
In .NET we have a lot of DataContext classes, so in the sense that you want names to appropriately communicate what the class does, I think XyzQueueDataContext properly communicates what the class does - although you can't query from it.
If you want to stay more aligned to accepted pattern languages, Patterns of Enterprise Application Architecture calls any class that encapsulates access to an external system for a Gateway, while more specifically you may want to use the term Channel in the language of Enterprise Integration Patterns - that's what I would do.
Q: Is it a poor practice to name the Queue Action Name in the manner I have done?
Well, it certainly tightly couples the queue name to the class. This means that if you later decide that you want to decouple those, you can't.
As a general comment I think this class might benefit from trying to do less. Using the queue is not the same thing as managing it, so instead of having all of that queue management code there, I'd suggest injecting a CloudQueue into the instance. Here's how I implement my AzureChannel constructor:
private readonly CloudQueue queue;
public AzureChannel(CloudQueue queue)
{
if (queue == null)
{
throw new ArgumentNullException("queue");
}
this.queue = queue;
}
This better fits the Single Responsibility Principle and you can now implement queue management in its own (reusable) class.
Ok I was a little unsure on how best name this problem :) But assume this scenarion, you're
going out and fetching some webpage (with various urls) and caching it locally. The cache part is pretty easy to solve even with multiple threads.
However, imagine that one thread starts fetching an url, and a couple of milliseconds later another want to get the same url. Is there any good pattern for making the seconds thread's method wait on the first one to fetch the page , insert it into the cache and return it so you don't have to do multiple requests. With little enough overhead that it's worth doing even for requests that take about 300-700 ms? And without locking requests for other urls
Basically when requests for identical urls comes in tightly after each other I want the second request to "piggyback" the first request
I had some loose idea of having a dictionary where you insert an object with the key as url when you start fetching a page and lock on it. If there's any matching the key already it get's the object, locks on it and then tries to fetch the url for the actual cache.
I'm a little unsure of the particulars however to make it really thread-safe, using ConcurrentDictionary might be one part of it...
Is there any common pattern and solutions for scenarios like this?
Breakdown wrong behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Starts fetching the same url since it still doesn't exist in Cache
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Finishes and also inserts into cache (or discards it), returns the page
Breakdown correct behavior:
Thread 1: Checks the cache, it doesnt exists so starts fetching the url
Thread 2: Wants the same url, but sees it's currently being fetched so waits on thread 1
Thread 1: finished and inserts into the cache, returns the page
Thread 2: Notices that thread 1 is finished and returns the page thread 1 it fetched
EDIT
Most solutions sofar seem to misunderstand the problem and only addressing the caching, as I said that isnt the problem, the problem is when doing an external web fetch to make the second fetch that is done before the first one has cached it to use the result from the first rather then doing a second
You could use a ConcurrentDictionary<K,V> and a variant of double-checked locking:
public static string GetUrlContent(string url)
{
object value1 = _cache.GetOrAdd(url, new object());
if (value1 == null) // null check only required if content
return null; // could legitimately be a null string
var urlContent = value1 as string;
if (urlContent != null)
return urlContent; // got the content
// value1 isn't a string which means that it's an object to lock against
lock (value1)
{
object value2 = _cache[url];
// at this point value2 will *either* be the url content
// *or* the object that we already hold a lock against
if (value2 != value1)
return (string)value2; // got the content
urlContent = FetchContentFromTheWeb(url); // todo
_cache[url] = urlContent;
return urlContent;
}
}
private static readonly ConcurrentDictionary<string, object> _cache =
new ConcurrentDictionary<string, object>();
EDIT: My code is quite a bit uglier now, but uses a separate lock per URL. This allows different URLs to be fetched asynchronously, however each URL will only be fetched once.
public class UrlFetcher
{
static Hashtable cache = Hashtable.Synchronized(new Hashtable());
public static String GetCachedUrl(String url)
{
// exactly 1 fetcher is created per URL
InternalFetcher fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
lock( cache.SyncRoot )
{
fetcher = (InternalFetcher)cache[url];
if( fetcher == null )
{
fetcher = new InternalFetcher(url);
cache[url] = fetcher;
}
}
}
// blocks all threads requesting the same URL
return fetcher.Contents;
}
/// <summary>Each fetcher locks on itself and is initilized with null contents.
/// The first thread to call fetcher.Contents will cause the fetch to occur, and
/// block until completion.</summary>
private class InternalFetcher
{
private String url;
private String contents;
public InternalFetcher(String url)
{
this.url = url;
this.contents = null;
}
public String Contents
{
get
{
if( contents == null )
{
lock( this ) // "this" is an instance of InternalFetcher...
{
if( contents == null )
{
contents = FetchFromWeb(url);
}
}
}
return contents;
}
}
}
}
Will the Semaphore please stand up! stand up! stand up!
use Semaphore you can easily synchronize your threads with it.
on both cases where
you are trying to load a page that is currently being cached
you are saving cache to a file where a page is loading from it.
in both scenarios you will face troubles.
it is just like writers and readers problem that is a common problem in Operating System Racing Issues. just when a thread wants to rebuild a cache or start caching a page no thread should read from it. if a thread is reading it it should wait until reading finished and replace the cache, no 2 threads should cache same page in to a same file. hence it is possible for all readers to read from a cache at anytime since no writer is writing on it.
you should read some semaphore using samples on msdn, it is very easy to use. just the thread that wants to do something is call the semaphore and if the resource can granted it do the works otherwise sleeps and wait to be woken up when the resource is ready.
Disclaimer: This might be a n00bish answer. Please pardon me, if it is.
I'd recommend using some shared dictionary object with locks to keep a track of the url being currently fetched or have already been fetched.
At every request, check the url against this object.
If an entry for the url is present, check the cache. (this means one of the threads has either fetched it or is currently fetching it)
If its available in the cache, use it, else put the current thread to sleep for a while and check back again. (if not in cache, some thread is still fetching it, so wait while its done)
If the entry is not found in the dictionary object, add the url to it and send the request. Once it obtains a response, add it to cache.
This logic should work, however, you would need to take care of cache expiration and removal of the entry from the dictionary object.
my solution is use atomicBoolean to control access database when cache is timeout or unexist;
at the same moment, only one thread(i call it read-th) can access database, the other threads spin until the read-th return data and write it into cache;
here codes; implement by java;
public class CacheBreakDownDefender<K, R> {
/**
* false = do not write null to cache when get null value from database;
*/
private final boolean writeNullToCache;
/**
* cache different query key
*/
private final ConcurrentHashMap<K, AtomicBoolean> selectingDBTagMap = new ConcurrentHashMap<>();
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(false));
}
public static <K, R> CacheBreakDownDefender<K, R> getInstance(Class<K> keyType, Class<R> resultType, boolean writeNullToCache) {
return Singleton.get(keyType.getName() + resultType.getName(), () -> new CacheBreakDownDefender<>(writeNullToCache));
}
private CacheBreakDownDefender(boolean writeNullToCache) {
this.writeNullToCache = writeNullToCache;
}
public R readFromCache(K key, Function<K, ? extends R> getFromCache, Function<K, ? extends R> getFromDB, BiConsumer<K, R> writeCache) throws InterruptedException {
R result = getFromCache.apply(key);
if (result == null) {
final AtomicBoolean selectingDB = selectingDBTagMap.computeIfAbsent(key, x -> new AtomicBoolean(false));
if (selectingDB.compareAndSet(false, true)) {
try {
result = getFromDB.apply(key);
if (result != null || writeNullToCache) {
writeCache.accept(key, result);
}
} finally {
selectingDB.getAndSet(false);
selectingDBTagMap.remove(key);
}
} else {
while (selectingDB.get()) {
TimeUnit.MILLISECONDS.sleep(0L);
//do nothing...
}
return getFromCache.apply(key);
}
}
return result;
}
public static void main(String[] args) throws InterruptedException {
Map<String, String> map = new ConcurrentHashMap<>();
CacheBreakDownDefender<String, String> instance = CacheBreakDownDefender.getInstance(String.class, String.class, true);
for (int i = 0; i < 9; i++) {
int finalI = i;
new Thread(() -> {
String kele = null;
try {
if (finalI == 6) {
kele = instance.readFromCache("kele2", map::get, key -> "helloword2", map::put);
} else
kele = instance.readFromCache("kele", map::get, key -> "helloword", map::put);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
log.info("resut= {}", kele);
}).start();
}
TimeUnit.SECONDS.sleep(2L);
}
}
This is not exactly for concurrent caches but for all caches:
"A cache with a bad policy is another name for a memory leak" (Raymond Chen)