We have some basic C# logic that iterates over a directory and returns the folders and files within. When run against a network share (\\server\share\folder) that is inaccessible or invalid, the code seems to 'hang' for about 30 seconds before returning back from the call.
I'd like to end up with a method that will attempt to get folders and files from the given path, but without the timeout period. In other words, to reduce or eliminate the timeout altogether.
I've tried something as simple as validating the existence of the directory ahead of time thinking that an 'unavailable' network drive would quickly return false, but that did not work as expected.
System.IO.Directory.Exists(path) //hangs
System.IO.DirectoryInfo di = new System.IO.DirectoryInfo(path); //hangs
Any suggestions on what may help me achieve an efficient (and hopefully managed) solution?
You can use this code:
var task = new Task<bool>(() => { var fi = new FileInfo(uri.LocalPath); return fi.Exists; });
task.Start();
return task.Wait(100) && task.Result;
Place it on its own thread, if it doesn't come back in a certain amount of time, move on.
Perhaps you could try pinging the server first, and only ask for the directory info if you get a response?
See...
Faster DirectoryExists function?
...for a way of setting the execution time for Directory.Exists
Related
I have the following C# code in an AspNet WebApi controller:
private static async Task<string> SaveDocumentAsync(HttpContent content) {
var path = "something";
using (var file = File.OpenWrite(path)) {
await content.CopyToAsync(file);
}
return path;
}
public async Task<IHttpActionResult> Put() {
var path = await SaveDocumentAsync(Request.Content);
await SaveDbRecordAsync(path); // writes something to the database using System.Data and awaiting Async methods
return OK();
}
I am sometimes seeing the database record visible before the document has finished being written. Is this a possible execution sequence? (It is also possible my file system isn't giving me the semantics I want).
To clarify how I'm observing this. It is an application that is reading the path out of the database and then trying to read the file and finding it isn't there. The file does appear shortly afterwards.
This doesn't happen every time, normally the file comes first. Maybe 1 in 1000 it happens the wrong way.
This turned out to be down to file system semantics. I thought I'd excluded my replicated file system, but I'd done it wrong. The code is behaving as expected.
Since you're awaiting SaveDocumentAsync function before you call SaveDbRecordAsync, it executes after SaveDocumentAsync completes.
If you were to fire the tasks in parallel then await them:
var saveTask = SaveDocumentAsync(Request.Content);
var dbTask = SaveDbRecordAsync("a/path.ext");
await saveTask;
await dbTask;
then you wouldn't be able to guarantee the completion order.
#Neiston touches a good point: it might be that the app you're using to view the results might be updating with a delay and causing you to think the order is switched.
As you are writing to 2 different files (one file, one database), then the OS is perfectly within it's remit to perform the writes in whatever order is 'best' for the storage medium.
In the old days of spinning storage, the 2 requests would be in the write queue, and if the r/w heads were currently nearer the to the tracks for the database, than the file, then the OS (or maybe the HDD controller) would write the database data first, followed by the file data.
This assumes that both your file and your database server are running on the same physical machine. If you are writing to a shared folder, and/or the DB server is also on a different machine, then who knows what order they will finish in.
This might be a long shot but I might as well try here. There is a block of c# code that is rebuilding a solr core. The steps are as follows:
Delete all the existing documents
Get the core entities
Split the entities into batches of 1000
Spin of threads to preform the next set of processes:
Serialize each batch to json and writing the json to a file on the server
hosting the core
Send a command to the core to upload that file using System.Net.WebClient solrurl/corename/update/json?stream.file=myfile.json&stream.contentType=application/json;charset=utf-8
Delete the file. I've also tried deleting the files after all the batches are done, as well as not deleting the files at all
After all batches are done it commits. I've also tried committing
after each batch is done.
My problem is the last batch will not upload if it's much less than the batch size. It flows through like the command was called but nothing happens. It throws no exceptions and I see no errors in the solr logs. My questions are Why? and How can I ensure the last batch always gets uploaded? We think it's a timing issue, but we've added Thread.Sleep(30000) in many parts of the code to test that theory and it still happens.
The only time it doesn't happen is:
if the batch is full or almost full
we don't run multiple threads it
we put a break point at the File.Delete line on the last batch and wait for 30 seconds or so, then continue
Here is the code for writing the file and calling the update command. This is called for each batch.
private const string
FileUpdateCommand = "{1}/update/json?stream.file={0}&stream.contentType=application/json;charset=utf-8",
SolrFilesDir = #"\\MYSERVER\SolrFiles",
SolrFileNameFormat = SolrFilesDir + #"\{0}-{1}.json",
_solrUrl = "http://MYSERVER:8983/solr/",
CoreName = "MyCore";
public void UpdateCoreByFile(List<CoreModel> items)
{
if (items.Count == 0)
return;
var settings = new JsonSerializerSettings { DateTimeZoneHandling = DateTimeZoneHandling.Utc };
var dir = new DirectoryInfo(SolrFilesDir);
if (!dir.Exists)
dir.Create();
var filename = string.Format(SolrFileNameFormat, Guid.NewGuid(), CoreName);
using (var sw = new StreamWriter(filename))
{
sw.Write(JsonConvert.SerializeObject(items, settings));
}
var file = HttpUtility.UrlEncode(filename);
var command = string.Format(FileUpdateCommand, file, CoreName);
using (var client = _clientFactory.GetClient())//System.Net.WebClient
{
client.DownloadData(new Uri(_solrUrl + command));
}
//Thread.Sleep(30000);//doesn't work if I add this
File.Delete(filename);//works here if add breakpoint and wait 30 sec or so
}
I'm just trying to figure out why this is happening and how to address it. I hope this makes sense, and I have provided enough information and code. Thanks for any help.
Since changing the size of the data set and adding a breakpoint "fixes" it, this is most certainly a race condition. Since you haven't added the code that actually indexes the content, it's impossible to say what the issue really is, but my guess is that the last commit happens before all the threads have finished, and only works when all threads are done (if you sleep the threads, the issue will still be there, since all threads sleep for the same time).
The easy fix - use commitWithin instead, and never issue explicit commits. The commitWithin parmaeter makes sure that the documents become available in the index within the given time frame (given as milliseconds). To make sure that the documents you submit becomes available within ten seconds, append &commitWithin=10000 to your URL.
If there's already documents pending a commit, the documents added will be committed before the ten seconds has ellapsed, but even if there's just one last document being submitted as the last batch, it'll never be more than ten seconds before it becomes visible (.. and there will be no documents left forever in a non-committed limbo).
That way you won't have to keep your threads synchronized or issue a final commit, as long as you wait until all threads have finished before exiting your application (if it's an application that actually terminates).
I have the goal of uploading a Products CSV of ~3000 records to my e-commerce site. I want to utilise the REST API that my e-comm platform provides so I have something I can re-use and build upon for future sites that I may create.
My main issue that I am having trouble working through is:
- System.Threading.ThreadAbortException
Which I can only attribute to how long it takes to process through all 3K records via a POST request. My code:
public ActionResult WriteProductsFromFile()
{
string fileNameIN = "19107.txt";
string fileNameOUT = "19107_output.txt";
string jsonUrl = $"/api/products";
List<string> ls = new List<string>();
var engine = new FileHelperAsyncEngine<Prod1>();
using (engine.BeginReadFile(fileNameIN))
{
foreach (Prod1 prod in engine)
{
outputProduct output = new outputProduct();
if (!string.IsNullOrEmpty(prod.name))
{
output.product.name = prod.name;
string productJson = JsonConvert.SerializeObject(output);
ls.Add(productJson);
}
}
}
foreach (String s in ls)
nopApiClient.Post(jsonUrl, s);
return RedirectToAction("GetProducts");
}
}
Since I'm new to web-coding, am I going about this the wrong way? Is there a preferred way to bulk-upload that I haven't come across?
I've attempted to use the TaskCreationOptions.LongRunning flag, which helps the cause slightly but doesn't get me anywhere near my goal.
Web and api controller actions are not meant to do long running tasks - besides locking up the UI/thread, you will be introducing a series of opportunities for failure that you will have little recourse in recovering from.
But it's not all bad you have a lot of options here, there is a lot of literature on async/cloud architecture - which explains how to deal with files and these sorts of scenarios.
What you want to do is disconnect the processing of your file from the API request (in your application not the 3rd party)
It will take a little more work but will ultimately create a more reliable application.
Step 1:
Drop the file immediately to disk - I see you have the file on DISK already not sure how it gets there but either way it will work out the same.
Step 2:
Use a process running as
- a console app (easiest)
- a service (requires some sort of install/uninstall of the service)
- or even a thread in your web app (but you will struggle to know when it fails)
Which ever way you choose, the process will watch a directory for file changes, when there is a change it will kick off your method to happily process the file as you like.
Check out the FileSystemWatchers here is a basic example: https://www.dotnetperls.com/filesystemwatcher
Additionally:
If you are interested in running a thread in your Api/Web app, take a look at https://www.hanselman.com/blog/HowToRunBackgroundTasksInASPNET.aspx for some options.
You don't have to use a FileSystemWatcher of course, you could trigger via a flag in a DB - that is being checked periodically, or a system event.
In my WebApi action method, I want to create/over-write a folder using this code:
string myDir = "...";
if(Directory.Exists(myDir))
{
Directory.Delete(myDir, true);
}
Directory.CreateDirectory(myDir);
// 1 - Check the dir
Debug.WriteLine("Double check if the Dir is created: " + Directory.Exists(myDir));
// Some other stuff here...
// 2 - Check the dir again
Debug.WriteLine("Check again if the Dir still exists: " + Directory.Exists(myDir));
Issue
Strangely, sometimes right after creating the directory, the directory does not exist!
Sometimes when checking the dir for the first time (where the number 1 is); Directory.Exist() returns true, other times false. Same happens when checking the dir for the second time (where the number 2 is).
Notes
None of this part of code throw any exception.
Only can reproduce this when publishing the website on server. (Windows server 2008)
Happens when accessing the same folder.
Questions
Is this a concurrency issue race condition?
Doesn't WebApi or the Operating System handle the concurrency?
Is this the correct way to overwrite a folder?
Should I lock files manually when we have many API requests to the same file?
Or in General:
What's the reason for this strange behavior?
UPDATE:
Using DirectoryInfo and Refresh() instead of Directory does not solve the problem.
Only happens when the recursive option of Delete is true. (and the directory is not empty).
Many filesystem operations are not synchonous on some filesystems (in case of windows - NTFS). Take for example RemoveDirectory call (which is called by Directory.DeleteDirectory at some point):
The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed.
As you see, it will not really delete directory until all handles to it are closed, but Directory.DeleteDirectory will complete fine. In your case that is also most likely such concurrency problem - directory is not really created while you executing Directory.Exists.
So, just periodically check what you need and don't consider filesystem calls in .NET to be synchronous. You can also use FileSystemWatcher in some cases to avoid polling.
EDIT: I was thinking how to reproduce it, and here is the code:
internal class Program {
private static void Main(string[] args) {
const string path = "G:\\test_dir";
while (true) {
if (Directory.Exists(path))
Directory.Delete(path);
Directory.CreateDirectory(path);
if (!Directory.Exists(path))
throw new Exception("Confirmed");
}
}
}
You see that if all filesystem calls were synchronous (in .NET), this code should run without problem. Now, before running that code, create empty directory at specified path (preferrably don't use SSD for that) and open it with windows explorer. Now run the code. For me it either throws Confirmed (which exactly reproduces your issue) or throws on Directory.Delete saying that directory does not exist (almost the same case). It does it 100% of the time for me.
Here is another code which when running on my machine confirms that it's certainly possible for File.Exists to return true directly after File.Delete call:
internal class Program {
private static void Main(string[] args) {
while (true) {
const string path = #"G:\test_dir\test.txt";
if (File.Exists(path))
File.Delete(path);
if (File.Exists(path))
throw new Exception("Confirmed");
File.Create(path).Dispose();
}
}
}
To do this, I opened G:\test_dir folder and during execution of this code tried to open constantly appearing and disappearing test.txt file. After couple of tries, Confirmed exception was thrown (while I didn't create or delete that file, and after exception is thrown, it's not present on filesystem already). So race conditions are possible in multiple cases and my answer is correct one.
I wrote myself a little C# method for synchronous folder deletion using Directory.Delete(). Feel free to copy:
private bool DeleteDirectorySync(string directory, int timeoutInMilliseconds = 5000)
{
if (!Directory.Exists(directory))
{
return true;
}
var watcher = new FileSystemWatcher
{
Path = Path.Combine(directory, ".."),
NotifyFilter = NotifyFilters.DirectoryName,
Filter = directory,
};
var task = Task.Run(() => watcher.WaitForChanged(WatcherChangeTypes.Deleted, timeoutInMilliseconds));
// we must not start deleting before the watcher is running
while (task.Status != TaskStatus.Running)
{
Thread.Sleep(100);
}
try
{
Directory.Delete(directory, true);
}
catch
{
return false;
}
return !task.Result.TimedOut;
}
Note that getting task.Result will block the thread until the task is finished, keeping the CPU load of this thread idle. So that is the point where it gets synchronous.
Sounds like race condition to me. Not sure why - you did not provide enough details - but what you can do is to wrap everything in lock() statement and see if the problem is gone. For sure this is not production-ready solution, it is only a quick way to check. If it's indeed a race condition - you need to rethink your approach of rewriting folders. May be create "GUID" folder and when done - update DB with the most recent GUID to point to the most recent folder?..
Before I go into too detail, my program is written in Visual Studio 2010 using C# .Net 4.0.
I wrote a program that will generate separate log files for each run. The log file is named after the time, and accurate up at millisecond (for example, 20130726103042375.log). The program will also generate a master log file for the day if it has not already exist (for example, *20130726_Master.log*)
At the end of each run, I want to append the log file to a master log file. Is there a way to check if I can append successfully? And retry after Sleep for like a second or something?
Basically, I have 1 executable, and multiple users (let's say there are 5 users).
All 5 users will access and run this executable at the same time. Since it's nearly impossible for all user to start at the exact same time (up to millisecond), there will be no problem generate individual log files.
However, the issue comes in when I attempt to merge those log files to the master log file. Though it is unlikely, I think the program will crash if multiple users are appending to the same master log file.
The method I use is
File.AppendAllText(masterLogFile, File.ReadAllText(individualLogFile));
I have check into the lock object, but I think it doesn't work in my case, as there are multiple instances running instead of multiple threads in one instance.
Another way I look into is try/catch, something like this
try
{
stream = file.Open(FileMode.Open, FileAccess.ReadWrite, FileShare.None);
}
catch {}
But I don't think this solve the problem, because the status of the masterLogFile can change in that brief millisecond.
So my overall question is: Is there a way to append to masterLogFile if it's not in use, and retry after a short timeout if it is? Or if there is an alternative way to create the masterLogFile?
Thank you in advance, and sorry for the long message. I want to make sure I get my message across and explain what I've tried or look into so we are not wasting anyone's time.
Please let me know if there's anymore information I can provide to help you help me.
Your try/catch is the way to do things. If the call to File.Open succeeds, then you can write to to the file. The idea is to keep the file open. I would suggest something like:
bool openSuccessful = false;
while (!openSuccessful)
{
try
{
using (var writer = new StreamWriter(masterlog, true)) // append
{
// successfully opened file
openSuccessful = true;
try
{
foreach (var line in File.ReadLines(individualLogFile))
{
writer.WriteLine(line);
}
}
catch (exceptions that occur while writing)
{
// something unexpected happened.
// handle the error and exit the loop.
break;
}
}
}
catch (exceptions that occur when trying to open the file)
{
// couldn't open the file.
// If the exception is because it's opened in another process,
// then delay and retry.
// Otherwise exit.
Sleep(1000);
}
}
if (!openSuccessful)
{
// notify of error
}
So if you fail to open the file, you sleep and try again.
See my blog post, File.Exists is only a snapshot, for a little more detail.
I would do something along the lines of this as I think in incurs the least overhead. Try/catch is going to generate a stack trace(which could take a whole second) if an exception is thrown. There has to be a better way to do this atomically still. If I find one I'll post it.