Getting `Cannot access a closed file` when asynchronously copying files - c#

I want to copy multiple files asynchronously but i am getting this error,
System.ObjectDisposedException: Cannot access a closed file.
Here is my method,
public Task CopyAllAsync(IList<ProductsImage> productsImage)
{
var tasks = new List<Task>();
foreach (var productImage in productsImage)
{
var task = _fileService.CopyAsync(productImage.ExistingFileName, productImage.NewFileName);
tasks.Add(task);
}
return Task.WhenAll(tasks);
}
here is the FileService.CopyAsync method,
public Task CopyAsync(string sourcePath, string destinationPath)
{
using (var source = File.Open(sourcePath, FileMode.Open))
{
using (var destination = File.Create(destinationPath))
{
return source.CopyToAsync(destination);
}
}
}
Then I am awaiting this,
await _imageService.CopyAllAsync(productsImage);
If I debug then I will not get this error?

You need to await the copy operation instead of simply returning the task. That would make sure you're not ending the using scope too soon which means calling Dispose on your FileStreams
public async Task CopyAsync(string sourcePath, string destinationPath)
{
using (var source = File.Open(sourcePath, FileMode.Open))
{
using (var destination = File.Create(destinationPath))
{
await source.CopyToAsync(destination);
}
}
}

Related

Is my approach correct for concurrent network requests?

I wrote a web crawler and I want to know if my approach is correct. The only issue I'm facing is that it stops after some hours of crawling. No exception, it just stops.
1 - the private members and the constructor:
private const int CONCURRENT_CONNECTIONS = 5;
private readonly HttpClient _client;
private readonly string[] _services = new string[2] {
"https://example.com/items?id=ID_HERE",
"https://another_example.com/items?id=ID_HERE"
}
private readonly List<SemaphoreSlim> _semaphores;
public Crawler() {
ServicePointManager.DefaultConnectionLimit = CONCURRENT_CONNECTIONS;
_client = new HttpClient();
_semaphores = new List<SemaphoreSlim>();
foreach (var _ in _services) {
_semaphores.Add(new SemaphoreSlim(CONCURRENT_CONNECTIONS));
}
}
Single HttpClient instance.
The _services is just a string array that contains the URL, they are not the same domain.
I'm using semaphores (one per domain) since I read that it's not a good idea to use the network queue (I don't remember how it calls).
2 - The Run method, which is the one I will call to start crawling.
public async Run(List<int> ids) {
const int BATCH_COUNT = 1000;
var svcIndex = 0;
var tasks = new List<Task<string>>(BATCH_COUNT);
foreach (var itemId in ids) {
tasks.Add(DownloadItem(svcIndex, _services[svcIndex].Replace("ID_HERE", $"{itemId}")));
if (++svcIndex >= _services.Length) {
svcIndex = 0;
}
if (tasks.Count >= BATCH_COUNT) {
var results = await Task.WhenAll(tasks);
await SaveDownloadedData(results);
tasks.Clear();
}
}
if (tasks.Count > 0) {
var results = await Task.WhenAll(tasks);
await SaveDownloadedData(results);
tasks.Clear();
}
}
DownloadItem is an async function that actually makes the GET request, note that I'm not awaiting it here.
If the number of tasks reaches the BATCH_COUNT, I will await all to complete and save the results to file.
3 - The DownloadItem function.
private async Task<string> DownloadItem(int serviceIndex, string link) {
var needReleaseSemaphore = true;
var result = string.Empty;
try {
await _semaphores[serviceIndex].WaitAsync();
var r = await _client.GetStringAsync(link);
_semaphores[serviceIndex].Release();
needReleaseSemaphore = false;
// DUE TO JSON SIZE, I NEED TO REMOVE A VALUE (IT'S USELESS FOR ME)
var obj = JObject.Parse(r);
if (obj.ContainsKey("blah")) {
obj.Remove("blah");
}
result = obj.ToString(Formatting.None);
} catch {
result = string.Empty;
// SINCE I GOT AN EXCEPTION, I WILL 'LOCK' THIS SERVICE FOR 1 MINUTE.
// IF I RELEASED THIS SEMAPHORE, I WILL LOCK IT AGAIN FIRST.
if (!needReleaseSemaphore) {
await _semaphores[serviceIndex].WaitAsync();
needReleaseSemaphore = true;
}
await Task.Delay(60_000);
} finally {
// RELEASE THE SEMAPHORE, IF NEEDED.
if (needReleaseSemaphore) {
_semaphores[serviceIndex].Release();
}
}
return result;
}
4- The function that saves the result.
private async Task SaveDownloadedData(List<string> myData) {
using var fs = new FileStream("./output.dat", FileMode.Append);
foreach (var res in myData) {
var blob = Encoding.UTF8.GetBytes(res);
await fs.WriteAsync(BitConverter.GetBytes((uint)blob.Length));
await fs.WriteAsync(blob);
}
await fs.DisposeAsync();
}
5- Finally, the Main function.
static async Task Main(string[] args) {
var crawler = new Crawler();
var items = LoadItemIds();
await crawler.Run(items);
}
After all this, is my approach correct? I need to make millions of requests, will take some weeks/months to gather all data I need (due to the connection limit).
After 12 - 14 hours, it just stops and I need to manually restart the app (memory usage is ok, my VPS has 1 GB and it never used more than 60%).

Parallel.ForEach blocking calling method

I am having a problem with Parallel.ForEach. I have written simple application that adds file names to be downloaded to the queue, then using while loop it iterates through the queue, downloads file one at a time, then when file has been downloaded, another async method is called to create object from downloaded memoryStream. Returned result of this method is not awaited, it is discarded, so the next download starts immediately. Everything works fine if I use simple foreach in object creation - objects are being created while download is continuing. But if I would like to speed up the object creation process and use Parallel.ForEach it stops download process until the object is created. UI is fully responsive, but it just won't download the next object. I don't understand why is this happening - Parallel.ForEach is inside await Task.Run() and to my limited knowledge about asynchronous programming this should do the trick. Can anyone help me understand why is it blocking first method and how to avoid it?
Here is a small sample:
public async Task DownloadFromCloud(List<string> constructNames)
{
_downloadDataQueue = new Queue<string>();
var _gcsClient = StorageClient.Create();
foreach (var item in constructNames)
{
_downloadDataQueue.Enqueue(item);
}
while (_downloadDataQueue.Count > 0)
{
var memoryStream = new MemoryStream();
await _gcsClient.DownloadObjectAsync("companyprojects",
_downloadDataQueue.Peek(), memoryStream);
memoryStream.Position = 0;
_ = ReadFileXml(memoryStream);
_downloadDataQueue.Dequeue();
}
}
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = new List<Entity>();
foreach (var item in properties)
{
entityList.Add(CreateObjectsFromDownloadedProperties(item));
}
//Parallel.ForEach(properties item =>
//{
// entityList.Add(CreateObjectsFromDownloadedProperties(item));
//});
});
}
EDIT
This is simplified object creation method:
public Entity CreateObjectsFromDownloadedProperties(RebarProperties properties)
{
var path = new LinearPath(properties.Path);
var section = new Region(properties.Region);
var sweep = section.SweepAsMesh(path, 1);
return sweep;
}
Returned result of this method is not awaited, it is discarded, so the next download starts immediately.
This is also dangerous. "Fire and forget" means "I don't care when this operation completes, or if it completes. Just discard all exceptions because I don't care." So fire-and-forget should be extremely rare in practice. It's not appropriate here.
UI is fully responsive, but it just won't download the next object.
I have no idea why it would block the downloads, but there's a definite problem in switching to Parallel.ForEach: List<T>.Add is not threadsafe.
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = new List<Entity>();
Parallel.ForEach(properties, item =>
{
var itemToAdd = CreateObjectsFromDownloadedProperties(item);
lock (entityList) { entityList.Add(itemToAdd); }
});
});
}
One tip: if you have a result value, PLINQ is often cleaner than Parallel:
private async Task ReadFileXml(MemoryStream memoryStream)
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
await Task.Run(() =>
{
var entityList = proeprties
.AsParallel()
.Select(CreateObjectsFromDownloadedProperties)
.ToList();
});
}
However, the code still suffers from the fire-and-forget problem.
For a better fix, I'd recommend taking a step back and using something more suited to "pipeline"-style processing. E.g., TPL Dataflow:
public async Task DownloadFromCloud(List<string> constructNames)
{
// Set up the pipeline.
var gcsClient = StorageClient.Create();
var downloadBlock = new TransformBlock<string, MemoryStream>(async constructName =>
{
var memoryStream = new MemoryStream();
await gcsClient.DownloadObjectAsync("companyprojects", constructName, memoryStream);
memoryStream.Position = 0;
return memoryStream;
});
var processBlock = new TransformBlock<MemoryStream, List<Entity>>(memoryStream =>
{
var reader = new XmlReader();
var properties = reader.ReadXmlTest(memoryStream);
return proeprties
.AsParallel()
.Select(CreateObjectsFromDownloadedProperties)
.ToList();
});
var resultsBlock = new ActionBlock<List<Entity>>(entities => { /* TODO */ });
downloadBlock.LinkTo(processBlock, new DataflowLinkOptions { PropagateCompletion = true });
processBlock.LinkTo(resultsBlock, new DataflowLinkOptions { PropagateCompletion = true });
// Push data into the pipeline.
foreach (var constructName in constructNames)
await downloadBlock.SendAsync(constructName);
downlodBlock.Complete();
// Wait for pipeline to complete.
await resultsBlock.Completion;
}

How to download correctly files simultaneously?

I am trying to download mltiple files simultaneosly. But all files are downloading one by one, sequantilly. So, at first this file downloaded #"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf", and then this file is started to dowload #"http://download.geofabrik.de/europe/finland-latest.osm.pbf",, and the next file to be downloaded is #"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf" and so on.
But I would like to download simultaneously.
So I've the following code based on the code from this answer:
static void Main(string[] args)
{
Task.Run(async () =>
{
await DownloadFiles();
}).GetAwaiter().GetResult();
}
public static async Task DownloadFiles()
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
foreach (var url in urls)
{
string fileName = url.Substring(url.LastIndexOf('/'));
await DownloadFile(url, fileName);
}
}
public static async Task DownloadFile(string url, string fileName)
{
string address = #"D:\Downloads";
using (var client = new WebClient())
{
await client.DownloadFileTaskAsync(url, $"{address}{fileName}");
}
}
However, when I see in my file system, then I see that files are downloading one by one, sequantially, not simultaneosuly:
In addition, I've tried to use this approach, however there are no simultaneous downloads:
static void Main(string[] args)
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
Parallel.ForEach(urls,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
DownloadFile);
}
public static void DownloadFile(string url)
{
string address = #"D:\Downloads";
using (var sr = new StreamReader(WebRequest.Create(url)
.GetResponse().GetResponseStream()))
using (var sw = new StreamWriter(address + url.Substring(url.LastIndexOf('/'))))
{
sw.Write(sr.ReadToEnd());
}
}
Could you tell me how it is possible to download simultaneosly?
Any help would be greatly appreciated.
foreach (var url in urls)
{
string fileName = url.Substring(url.LastIndexOf('/'));
await DownloadFile(url, fileName); // you wait to download the item and then move the next
}
Instead you should create tasks and wait all of them to complete.
public static Task DownloadFiles()
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
var tasks = urls.Select(url=> {
var fileName = url.Substring(url.LastIndexOf('/'));
return DownloadFile(url, fileName);
}).ToArray();
return Task.WhenAll(tasks);
}
Rest of your code can remain same.
Eldar's solution works with some minor edits. This is the full working DownloadFiles method that was edited:
public static async Task DownloadFiles()
{
IList<string> urls = new List<string>
{
#"http://download.geofabrik.de/europe/cyprus-latest.osm.pbf",
#"http://download.geofabrik.de/europe/finland-latest.osm.pbf",
#"http://download.geofabrik.de/europe/great-britain-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf",
#"http://download.geofabrik.de/europe/belgium-latest.osm.pbf"
};
var tasks = urls.Select(t => {
var fileName = t.Substring(t.LastIndexOf('/'));
return DownloadFile(t, fileName);
}).ToArray();
await Task.WhenAll(tasks);
}
this will download them asynchronously one after each other.
await DownloadFile(url, fileName);
await DownloadFile(url2, fileName2);
this will do what you actually want to achieve:
var task1 = DownloadFile(url, fileName);
var task2 = DownloadFile(url2, fileName2);
await Task.WhenAll(task1, task2);

calling an async task in a windows store app?

I had a working "save reader" that would strip unwanted data out of a winform, then return a list of strings referred to as "items" .
My problem is that when converting this for use with a winstore app; I cannot call an async task, and I need to use one to access the file.
im not 100% sure how to do this, so any help is appreciated; this is what I have so far.
the public SaveGame(StorageFile file)
{
File = promptUser();
}
throws an error, I have tried making the constructor async to, but that throws more errors
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using Windows.Storage;
using Windows.Storage.Pickers;
namespace save_game_reader
{
class SaveGame
{
private StorageFile File { get; set; }
public SaveGame(StorageFile file)
{
File = promptUser();
Task<StorageFile> returnedTaskTResult = promptUser();
StorageFile intResult = await promptUser();
}
public async Task<List<string>> GetAllItems()
{
string WholeSaveString = await Windows.Storage.FileIO.ReadTextAsync(File);
List<string> ToReturn = new List<string>();
List<string> SplitList = WholeSaveString.Split(new string[] { "data" }, StringSplitOptions.None).ToList();
foreach (string line in SplitList)
{
var start = line.IndexOf("value") + 6;
var end = line.IndexOf("type", start);
if (end != -1)
{
string substring = line.Substring(start, (end - start) - 8);
if (substring.Length >= 4)
{
ToReturn.Add(substring);
}
}
}
return ToReturn;
}
private async Task<StorageFile> promptUser()
{
var picker = new FileOpenPicker();
picker.ViewMode = PickerViewMode.Thumbnail;
picker.SuggestedStartLocation = PickerLocationId.DocumentsLibrary;
StorageFile sf = await picker.PickSingleFileAsync();
return sf;
}
}
}
It's pretty rare to need to do so, but the typical way to work around constructors in async is to just use a static method:
class SaveGame
{
SaveGame(StorageFile file)
{
File = file;
}
public async Task<SaveGame> Create()
{
return new SaveGame(await promptUser());
}
}
SaveGame g = await SaveGame.Create();
It may be worth it in your case to consider a cleaner separation between UI code and backend code. If something seems weird or difficult to do, it's often a sign of design flaw.
Also, one does not "call an async task". You call an async method and await a Task.
Constructors cannot be async, change your code as follows:
public SaveGame()
{}
public async Task Init(Storage file)
{
File = promptUser();
Task<StorageFile> returnedTaskTResult = promptUser();
StorageFile intResult = await promptUser();
}
somewhere else in an async method you call it like this:
var save = new SaveGame();
await save.Init(file);

How to use HttpClient PostAsync() with threadpool in C#?

I'm using the following code to post an image to a server.
var image= Image.FromFile(#"C:\Image.jpg");
Task<string> upload = Upload(image);
upload.Wait();
public static async Task<string> Upload(Image image)
{
var uriBuilder = new UriBuilder
{
Host = "somewhere.net",
Path = "/path/",
Port = 443,
Scheme = "https",
Query = "process=false"
};
using (var client = new HttpClient())
{
client.DefaultRequestHeaders.Add("locale", "en_US");
client.DefaultRequestHeaders.Add("country", "US");
var content = ConvertToHttpContent(image);
content.Headers.ContentType = MediaTypeHeaderValue.Parse("image/jpeg");
using (var mpcontent = new MultipartFormDataContent("--myFakeDividerText--")
{
{content, "fakeImage", "myFakeImageName.jpg"}
}
)
{
using (
var message = await client.PostAsync(uriBuilder.Uri, mpcontent))
{
var input = await message.Content.ReadAsStringAsync();
return "nothing for now";
}
}
}
}
I'd like to modify this code to run multiple threads. I've used "ThreadPool.QueueUserWorkItem" before and started to modify the code to leverage it.
private void UseThreadPool()
{
int minWorker, minIOC;
ThreadPool.GetMinThreads(out minWorker, out minIOC);
ThreadPool.SetMinThreads(1, minIOC);
int maxWorker, maxIOC;
ThreadPool.GetMaxThreads(out maxWorker, out maxIOC);
ThreadPool.SetMinThreads(4, maxIOC);
var events = new List<ManualResetEvent>();
foreach (var image in ImageCollection)
{
var resetEvent = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(
arg =>
{
var img = Image.FromFile(image.getPath());
Task<string> upload = Upload(img);
upload.Wait();
resetEvent.Set();
});
events.Add(resetEvent);
if (events.Count <= 0) continue;
foreach (ManualResetEvent e in events) e.WaitOne();
}
}
The problem is that only one thread executes at a time due to the call to "upload.Wait()". So I'm still executing each thread in sequence. It's not clear to me how I can use PostAsync with a thread-pool.
How can I post images to a server using multiple threads by tweaking the code above? Is HttpClient PostAsync the best way to do this?
I'd like to modify this code to run multiple threads.
Why? The thread pool should only be used for CPU-bound work (and I/O completions, of course).
You can do concurrency just fine with async:
var tasks = ImageCollection.Select(image =>
{
var img = Image.FromFile(image.getPath());
return Upload(img);
});
await Task.WhenAll(tasks);
Note that I removed your Wait. You should avoid using Wait or Result with async tasks; use await instead. Yes, this will cause async to grow through you code, and you should use async "all the way".

Categories