I have an application that runs both as a console and as a windows service. It processes files and can handle several files at once with threading. Every now and then it fetches a file that is "tagged" as a testfile.
The flow is something like this:
Read file
Determine if test
Validate file
Save contents of file into db
Move file
Report to other service about the file.
All the steps need to know if the file is a testfile or not, but all the steps don't have access to the file per say.
Instead of passing a bool isTest parameter to every method I would like to create a context variable that is applicable to just that file and that particular execution (specific for the stack). Somewhat similiar to OperationContext in WCF. I would use ThreadStatic but I have a lot of async await, and I'm afraid that the thread can be re-used by another context.
Is there a way to keep a variable in a sort of session or context that is bound to a specific execution? Something like this (just example code):
var isTest = true;
var fileProcessor = new FileProcessor(); // It's injected, but just an example.
using(ContextFactory.CreateContext(isTest))
{
// Process file.
// All methods in this stack should be able to determine if it's a testfile or not.
fileProcessor.ProcessFile(myFilePath);
}
public class FileProcessor
{
public void ProcessFile(string fileName)
{
// Should be able to determine if it's a test or not.
var isTest = ContextFactory.IsTest()
// This method will call other classes and other methods in a long chain.
}
}
I'm using Unity for IoC and C# 4.5.
Related
I have the following C# code in an AspNet WebApi controller:
private static async Task<string> SaveDocumentAsync(HttpContent content) {
var path = "something";
using (var file = File.OpenWrite(path)) {
await content.CopyToAsync(file);
}
return path;
}
public async Task<IHttpActionResult> Put() {
var path = await SaveDocumentAsync(Request.Content);
await SaveDbRecordAsync(path); // writes something to the database using System.Data and awaiting Async methods
return OK();
}
I am sometimes seeing the database record visible before the document has finished being written. Is this a possible execution sequence? (It is also possible my file system isn't giving me the semantics I want).
To clarify how I'm observing this. It is an application that is reading the path out of the database and then trying to read the file and finding it isn't there. The file does appear shortly afterwards.
This doesn't happen every time, normally the file comes first. Maybe 1 in 1000 it happens the wrong way.
This turned out to be down to file system semantics. I thought I'd excluded my replicated file system, but I'd done it wrong. The code is behaving as expected.
Since you're awaiting SaveDocumentAsync function before you call SaveDbRecordAsync, it executes after SaveDocumentAsync completes.
If you were to fire the tasks in parallel then await them:
var saveTask = SaveDocumentAsync(Request.Content);
var dbTask = SaveDbRecordAsync("a/path.ext");
await saveTask;
await dbTask;
then you wouldn't be able to guarantee the completion order.
#Neiston touches a good point: it might be that the app you're using to view the results might be updating with a delay and causing you to think the order is switched.
As you are writing to 2 different files (one file, one database), then the OS is perfectly within it's remit to perform the writes in whatever order is 'best' for the storage medium.
In the old days of spinning storage, the 2 requests would be in the write queue, and if the r/w heads were currently nearer the to the tracks for the database, than the file, then the OS (or maybe the HDD controller) would write the database data first, followed by the file data.
This assumes that both your file and your database server are running on the same physical machine. If you are writing to a shared folder, and/or the DB server is also on a different machine, then who knows what order they will finish in.
I have an application that we are developing using .NET 4.0 and EF 6.0. Premise of the program is quite simple. Watch a particular folder on the file system. As a new file gets dropped into this folder, look up information about this file in the SQL Server database (using EF), and then based on what is found, move the file to another folder on the file system. Once the file move is complete, go back to the DB and update the information about this file (Register File move).
These are large media files so it might take a while for each of them to move to the target location. Also, we might start this service with hundreds of these media files sitting in the source folder already that will need to be dispatched to the target location(s).
So to speed things up, I started out with using Task parallel library (async/await not available as this is .NET 4.0). For each file in the source folder, I look up info about it in the DB, determine which target folder it needs to move to, and then start a new task that begins to move the file…
LookupFileinfoinDB(filename)
{
// use EF DB Context to look up file in DB
}
// start a new task to begin the file move
var moveFileTask = Task<bool>.Factory.StartNew(
() =>
{
var success = false;
try
{
// the code to actually moves the file goes here…
.......
}
}
Now, once this task completes, I have to go back to the DB and update the info about the file. And that is where I am running into problems. (keep in mind that I might have several of these 'move file tasks'running in parallel and they will finish at different times. Currently, I am using task continuations to register the file move in the DB:
filemoveTask.ContinueWith(
t =>
{
if (t.IsCompleted && t.Result)
{
RegisterFileMoveinDB();
}
}
Problem is that I am using the same DB context for looking up the file info in the main task as well as inside the RegistetrFilemoveinDB() method later, that executes on the nested task. I was getting all kinds of weird exceptions thrown at me (mostly about SQL server Data reader etc.) when moving several files together. Online search for the answer revealed that the sharing of DB context among several tasks like I am doing here is a big no no as EF is not thread safe.
I would rather not create a new DB context for each file move as there could be dozens or even hundreds of them going at the same time. What would be a good alternative approach? Is there a way to 'signal' the main task when a nested task completes and finish the File move registration in the main task? Or am I approaching this problem in a wrong way all together and there is a better way to go about this?
Your best bet is to scope your DbContext for each thread. Parallel.ForEach has overloads that are useful for this (the overloads with Func<TLocal> initLocal:
Parallel.ForEach(
fileNames, // the filenames IEnumerable<string> to be processed
() => new YourDbContext(), // Func<TLocal> localInit
( fileName, parallelLoopState, dbContext ) => // body
{
// your logic goes here
// LookUpFileInfoInDB( dbContext, fileName )
// MoveFile( ... )
// RegisterFileMoveInDB( dbContext, ... )
// pass dbContext along to the next iteration
return dbContext;
}
( dbContext ) => // Action<TLocal> localFinally
{
dbContext.SaveChanges(); // single SaveChanges call for each thread
dbContext.Dispose();
} );
You can call SaveChanges() within the body expression/RegisterFileMoveInDB if you prefer to have the DB updated ASAP. I would suggest tying the file system operations in with the DB transaction so that if the DB update fails, the file system operations are rolled back.
You could also pass the ExclusiveScheduler of a ConcurrentExclusiveSchedulerPair instance as a parameter of ContinueWith. This way the continuations will run sequentially instead of concurrently regarding to each other.
TaskScheduler exclusiveScheduler
= new ConcurrentExclusiveSchedulerPair().ExclusiveScheduler;
//...
filemoveTask.ContinueWith(t =>
{
if (t.Result)
{
RegisterFileMoveinDB();
}
}, exclusiveScheduler);
According to #Moho question:
Threads in i.e. built-in IO async operations are taken from
threadpool of .NET runtime CLR so it's very efficient mechanism. If
you create threads by your self you do it in old manner which is
inefficient especially for IO operations.
When you call async you don't have to wait immediately. Postpone waiting until it's necessary.
Best Regards.
I am writing a WCF service that has source data from multiple sources. These are large files in various formats.
I have implemented Caching and set-up a polling interval so these files are kept up to date with fresh data.
I have constructed a manager class that basically is responsible for returning XDocument objects back to the caller. The manager class first checks the cache for existence. If it doesn't exist - it makes the call to retrieve fresh data. Nothing big here.
What I would like to do to keep the response snappy is serialize the file previously downloaded and pass that back to the caller - again nothing new...however...I want to spawn a new thread as soon as the serialization is complete to retrieve the fresh data and overwrite the old file. This is my problem...
Admittedly an intermediate programmer - I came across a few examples on multi-threading (here for that matter)...The problem is it introduced the concept of delegates and I am really struggling with this.
Here is some of my code:
//this method invokes another object that is responsible for making the
//http call, decompressing the file and persisting to the hard drive.
private static void downloadFile(string url, string LocationToSave)
{
using (WeatherFactory wf = new WeatherFactory())
{
wf.getWeatherDataSource(url, LocationToSave);
}
}
//A new thread variable
private static Thread backgroundDownload;
//the delegate...but I am so confused on how to use this...
delegate void FileDownloader(string url, string LocationToSave);
//The method that should be called in the new thread....
//right now the compiler is complaining that I don't have the arguments from
//the delegate (Url and LocationToSave...
//the problem is I don't pass URL and LocationToSave here...
static void Init(FileDownloader download)
{
backgroundDownload = new Thread(new ThreadStart(download));
backgroundDownload.Start();
}
I'd like to implement this the correct way...so a bit of education on how to make this work would be appreciated.
I would use the Task Parallel library to do this:
//this method invokes another object that is responsible for making the
//http call, decompressing the file and persisting to the hard drive.
private static void downloadFile(string url, string LocationToSave)
{
using (WeatherFactory wf = new WeatherFactory())
{
wf.getWeatherDataSource(url, LocationToSave);
}
//Update cache here?
}
private void StartBackgroundDownload()
{
//Things to consider:
// 1. what if we are already downloading, start new anyway?
// 2. when/how to update your cache
var task = Task.Factory.StartNew(_=>downloadFile(url, LocationToSave));
}
I have to restrict my .net 4 WPF application so that it can be run only once per machine. Note that I said per machine, not per session.
I implemented single instance applications using a simple mutex until now, but unfortunately such a mutex is per session.
Is there a way to create a machine wide mutex or is there any other solution to implement a single instance per machine application?
I would do this with a global Mutex object that must be kept for the life of your application.
MutexSecurity oMutexSecurity;
//Set the security object
oMutexSecurity = new MutexSecurity();
oMutexSecurity.AddAccessRule(new MutexAccessRule(new SecurityIdentifier(WellKnownSidType.BuiltinUsersSid, null), MutexRights.FullControl, AccessControlType.Allow));
//Create the global mutex and set its security
moGlobalMutex = new Mutex(True, "Global\\{5076d41c-a40a-4f4d-9eed-bf274a5bedcb}", bFirstInstance);
moGlobalMutex.SetAccessControl(oMutexSecurity);
Where bFirstInstance returns if this is the first instance of your application running globally. If you omited the Global part of the mutex or replaced it with Local then the mutex would only be per session (this is proberbly how your current code is working).
I believe that I got this technique first from Jon Skeet.
The MSDN topic on the Mutex object explains about the two scopes for a Mutex object and highlights why this is important when using terminal services (see second to last note).
I think what you need to do is use a system sempahore to track the instances of your application.
If you create a Semaphore object using a constructor that accepts a name, it is associated with an operating-system semaphore of that name.
Named system semaphores are visible throughout the operating system, and can be used to synchronize the activities of processes.
EDIT: Note that I am not aware if this approach works across multiple windows sessions on a machine. I think it should as its an OS level construct but I cant say for sure as i havent tested it that way.
EDIT 2: I did not know this but after reading Stevo2000's answer, i did some looking up as well and I think that the "Global\" prefixing to make the the object applicable to the global namespace would apply to semaphores as well and semaphore, if created this way, should work.
You could open a file with exclusive rights somewhere in %PROGRAMDATA%
The second instance that starts will try to open the same file and fail if it's already open.
How about using the registry?
You can create a registry entry under HKEY_LOCAL_MACHINE.
Let the value be the flag if the application is started or not.
Encrypt the key using some standard symmetric key encryption method so that no one else can tamper with the value.
On application start-up check for the key and abort\continue accordingly.
Do not forget to obfuscate your assembly, which does this encryption\decryption part, so that no one can hack the key in registry by looking at the code in reflector.
I did something similar once.
When staring up the application list, I checked all running processes for a process with identical name, and if it existed I would not allow to start the program.
This is not bulletproof of course, since if another application have the exact same process name, your application will never start, but if you use a non-generic name it will probably be more than good enough.
For the sake of completeness, I'd like to add the following which I just found now:
This web site has an interesting approach in sending Win32 messages to other processes. This would fix the problem of the user renaming the assembly to bypass the test and of other assemblies with the same name.
They're using the message to activate the main window of the other process, but it seems like the message could be a dummy message only used to see whether the other process is responding to it to know whether it is our process or not.
Note that I haven't tested it yet.
See below for full example of how a single instace app is done in WPF 3.5
public class SingleInstanceApplicationWrapper :
Microsoft.VisualBasic.ApplicationServices.WindowsFormsApplicationBase
{
public SingleInstanceApplicationWrapper()
{
// Enable single-instance mode.
this.IsSingleInstance = true;
}
// Create the WPF application class.
private WpfApp app;
protected override bool OnStartup(
Microsoft.VisualBasic.ApplicationServices.StartupEventArgs e)
{
app = new WpfApp();
app.Run();
return false;
}
// Direct multiple instances.
protected override void OnStartupNextInstance(
Microsoft.VisualBasic.ApplicationServices.StartupNextInstanceEventArgs e)
{
if (e.CommandLine.Count > 0)
{
app.ShowDocument(e.CommandLine[0]);
}
}
}
Second part:
public class WpfApp : System.Windows.Application
{
protected override void OnStartup(System.Windows.StartupEventArgs e)
{
base.OnStartup(e);
WpfApp.current = this;
// Load the main window.
DocumentList list = new DocumentList();
this.MainWindow = list;
list.Show();
// Load the document that was specified as an argument.
if (e.Args.Length > 0) ShowDocument(e.Args[0]);
}
public void ShowDocument(string filename)
{
try
{
Document doc = new Document();
doc.LoadFile(filename);
doc.Owner = this.MainWindow;
doc.Show();
// If the application is already loaded, it may not be visible.
// This attempts to give focus to the new window.
doc.Activate();
}
catch
{
MessageBox.Show("Could not load document.");
}
}
}
Third part:
public class Startup
{
[STAThread]
public static void Main(string[] args)
{
SingleInstanceApplicationWrapper wrapper =
new SingleInstanceApplicationWrapper();
wrapper.Run(args);
}
}
You may need to add soem references and add some using statements but it shoudl work.
You can also download a VS example complete solution by downloading the source code of the book from here.
Taken From "Pro WPF in C#3 2008 , Apress , Matthew MacDonald" , buy the book is gold. I did.
Background
I have a Windows service that uses various third-party DLLs to perform work on PDF files. These operations can use quite a bit of system resources, and occasionally seem to suffer from memory leaks when errors occur. The DLLs are managed wrappers around other unmanaged DLLs.
Current Solution
I'm already mitigating this issue in one case by wrapping a call to one of the DLLs in a dedicated console app and calling that app via Process.Start(). If the operation fails and there are memory leaks or unreleased file handles, it doesn't really matter. The process will end and the OS will recover the handles.
I'd like to apply this same logic to the other places in my app that use these DLLs. However, I'm not terribly excited about adding more console projects to my solution, and writing even more boiler-plate code that calls Process.Start() and parses the output of the console apps.
New Solution
An elegant alternative to dedicated console apps and Process.Start() seems to be the use of AppDomains, like this: http://blogs.geekdojo.net/richard/archive/2003/12/10/428.aspx
I've implemented similar code in my application, but the unit tests have not been promising. I create a FileStream to a test file in a separate AppDomain, but don't dispose it. I then attempt to create another FileStream in the main domain, and it fails due to the unreleased file lock.
Interestingly, adding an empty DomainUnload event to the worker domain makes the unit test pass. Regardless, I'm concerned that maybe creating "worker" AppDomains won't solve my problem.
Thoughts?
The Code
/// <summary>
/// Executes a method in a separate AppDomain. This should serve as a simple replacement
/// of running code in a separate process via a console app.
/// </summary>
public T RunInAppDomain<T>( Func<T> func )
{
AppDomain domain = AppDomain.CreateDomain ( "Delegate Executor " + func.GetHashCode (), null,
new AppDomainSetup { ApplicationBase = Environment.CurrentDirectory } );
domain.DomainUnload += ( sender, e ) =>
{
// this empty event handler fixes the unit test, but I don't know why
};
try
{
domain.DoCallBack ( new AppDomainDelegateWrapper ( domain, func ).Invoke );
return (T)domain.GetData ( "result" );
}
finally
{
AppDomain.Unload ( domain );
}
}
public void RunInAppDomain( Action func )
{
RunInAppDomain ( () => { func (); return 0; } );
}
/// <summary>
/// Provides a serializable wrapper around a delegate.
/// </summary>
[Serializable]
private class AppDomainDelegateWrapper : MarshalByRefObject
{
private readonly AppDomain _domain;
private readonly Delegate _delegate;
public AppDomainDelegateWrapper( AppDomain domain, Delegate func )
{
_domain = domain;
_delegate = func;
}
public void Invoke()
{
_domain.SetData ( "result", _delegate.DynamicInvoke () );
}
}
The unit test
[Test]
public void RunInAppDomainCleanupCheck()
{
const string path = #"../../Output/appdomain-hanging-file.txt";
using( var file = File.CreateText ( path ) )
{
file.WriteLine( "test" );
}
// verify that file handles that aren't closed in an AppDomain-wrapped call are cleaned up after the call returns
Portal.ProcessService.RunInAppDomain ( () =>
{
// open a test file, but don't release it. The handle should be released when the AppDomain is unloaded
new FileStream ( path, FileMode.Open, FileAccess.ReadWrite, FileShare.None );
} );
// sleeping for a while doesn't make a difference
//Thread.Sleep ( 10000 );
// creating a new FileStream will fail if the DomainUnload event is not bound
using( var file = new FileStream ( path, FileMode.Open, FileAccess.ReadWrite, FileShare.None ) )
{
}
}
Application domains and cross-domain interaction is a very thin matter, so one should make sure he really understands how thing work before doing anything... Mmm... Let's say, "non-standard" :-)
First of all, your stream-creating method actually executes on your "default" domain (surprise-surprise!). Why? Simple: the method that you pass into AppDomain.DoCallBack is defined on an AppDomainDelegateWrapper object, and that object exists on your default domain, so that is where its method gets executed. MSDN doesn't say about this little "feature", but it's easy enough to check: just set a breakpoint in AppDomainDelegateWrapper.Invoke.
So, basically, you have to make do without a "wrapper" object. Use static method for DoCallBack's argument.
But how do you pass your "func" argument into the other domain so that your static method can pick it up and execute?
The most evident way is to use AppDomain.SetData, or you can roll your own, but regardless of how exactly you do it, there is another problem: if "func" is a non-static method, then the object that it's defined on must be somehow passed into the other appdomain. It may be passed either by value (whereas it gets copied, field by field) or by reference (creating a cross-domain object reference with all the beauty of Remoting). To do former, the class has to be marked with a [Serializable] attribute. To do latter, it has to inherit from MarshalByRefObject. If the class is neither, an exception will be thrown upon attempt to pass the object to the other domain. Keep in mind, though, that passing by reference pretty much kills the whole idea, because your method will still be called on the same domain that the object exists on - that is, the default one.
Concluding the above paragraph, you are left with two options: either pass a method defined on a class marked with a [Serializable] attribute (and keep in mind that the object will be copied), or pass a static method. I suspect that, for your purposes, you will need the former.
And just in case it has escaped your attention, I would like to point out that your second overload of RunInAppDomain (the one that takes Action) passes a method defined on a class that isn't marked [Serializable]. Don't see any class there? You don't have to: with anonymous delegates containing bound variables, the compiler will create one for you. And it just so happens that the compiler doesn't bother to mark that autogenerated class [Serializable]. Unfortunate, but this is life :-)
Having said all that (a lot of words, isn't it? :-), and assuming your vow not to pass any non-static and non-[Serializable] methods, here are your new RunInAppDomain methods:
/// <summary>
/// Executes a method in a separate AppDomain. This should serve as a simple replacement
/// of running code in a separate process via a console app.
/// </summary>
public static T RunInAppDomain<T>(Func<T> func)
{
AppDomain domain = AppDomain.CreateDomain("Delegate Executor " + func.GetHashCode(), null,
new AppDomainSetup { ApplicationBase = Environment.CurrentDirectory });
try
{
domain.SetData("toInvoke", func);
domain.DoCallBack(() =>
{
var f = AppDomain.CurrentDomain.GetData("toInvoke") as Func<T>;
AppDomain.CurrentDomain.SetData("result", f());
});
return (T)domain.GetData("result");
}
finally
{
AppDomain.Unload(domain);
}
}
[Serializable]
private class ActionDelegateWrapper
{
public Action Func;
public int Invoke()
{
Func();
return 0;
}
}
public static void RunInAppDomain(Action func)
{
RunInAppDomain<int>( new ActionDelegateWrapper { Func = func }.Invoke );
}
If you're still with me, I appreciate :-)
Now, after spending so much time on fixing that mechanism, I am going to tell you that is was purposeless anyway.
The thing is, AppDomains won't help you for your purposes. They only take care of managed objects, while unmanaged code can leak and crash all it wants. Unmanaged code doesn't even know there are such things as appdomains. It only knows about processes.
So, in the end, your best option remains your current solution: just spawn another process and be happy about it. And, I would agree with the previous answers, you don't have to write another console app for each case. Just pass a fully qualified name of a static method, and have the console app load your assembly, load your type, and invoke the method. You can actually package it pretty neatly in a very much the same way as you tried with AppDomains. You can create a method called something like "RunInAnotherProcess", which will examine the argument, get the full type name and method name out of it (while making sure the method is static) and spawn the console app, which will do the rest.
You don't have to create many console applications, you can create a single application that will receive as parameter the full qualified type name. The application will load that type and execute it.
Separating everything into tiny processes is the best method to really dispose all the resources. An application domain cannot do full resources disposing, but a process can.
Have you considered opening a pipe between the main application and the sub applications? This way you could pass more structured information between the two applications without parsing standard output.