How do I properly implement async/await for image data migration? - c#

I'm writing a console app to do some data migration between a legacy system and a new version. Each record has associated images stored on one web server and I'm downloading/altering/uploading each image to Azure (and also recording some data about each image in a database).
Here's a rough outline, in code:
public void MigrateData()
{
var records = GetRecords();
foreach (var record in records)
{
// ...
MigrateImages(record.Id, record.ImageCount);
}
}
public void MigrateImages(int recordId, int imageCount)
{
for (int i = 1; i <= imageCount; i++)
{
var legacyImageData = DownloadImage("the image url");
if (legacyImageData != null && legacyImageData.Length > 0)
{
// discard because we don't need the image id here, but it's used in other workflows
var _ = InsertImage(recordId, legacyImageData);
}
}
}
// This method can be used elsewhere, so the return of int is necessary and cannot be changed
public int InsertImage(int recordId, byte[] imageData)
{
var urls = UploadImage(imageData).Result;
return // method call to save image and return image ID
}
public async Task<(Uri LargeUri, Uri ThumbnailUri)> UploadImage(byte[] imageData)
{
byte[] largeData = ResizeImageToLarge(imageData);
byte[] thumbnailData = ResizeImageToThumbnail(imageData);
var largeUpload = largeBlob.UploadFromByteArrayAsync(largeImage, 0, largeImage.Length);
var thumbUpload = thumbsBlob.UploadFromByteArrayAsync(thumbImage, 0, thumbImage.Length);
await Task.WhenAll(largeUpload, thumbUpload);
var largeUrl = "";// logic to build url
var thumbUrl = "";// logic to build url
return (largeUrl, thumbUrl);
}
I'm using async/await for UploadImage() to allow parallel uploads for the large and thumbnail images, saving time.
My question is (if it's possible/makes sense) how can I utilize async/await for MigrateImages() to do parallel image uploads in order to reduce the overall time it takes for the task to complete? Does the fact that I'm already using async/await within UploadImage() hinder my goal?
It might be obvious due to my question, but await/async is still something I can't fully wrap my head around about how to correct utilize and/or implement it.

The async/await technology is intended for facilitating asynchrony, not concurrency. In your case you want to speed-up the images-uploading process by uploading multiple images concurrently. It is not important if you are wasting some thread-pool threads by blocking them, and you have no UI thread that needs to remain unblocked. You are making a tool that you are going to use once or twice and that's it. So I suggest that you spare yourself the trouble of trying to understand why your app is not behaving the way you expect, and avoid async/await altogether. Stick with the simple and familiar synchronous programming model for this assignment.
Combining synchronous and asynchronous code is dangerous, and even more so if your experience and understanding of async/await is limited. There are many intricacies in this technology. Using the Task.Result property in particular is a red flag. When you become more proficient with async/await code, you are going to treat any use of Result like a an unlocked grenade, ready to explode in your face at any time, and make you look like a fool. When used in apps with synchronization context (Windows Forms, ASP.NET) it can introduce deadlocks so easily that it's not even funny.
Here is how you can achieve the desired concurrency, without having to deal with the complexities of asynchrony:
public (Uri LargeUri, Uri ThumbnailUri) UploadImage(byte[] imageData)
{
byte[] largeData = ResizeImageToLarge(imageData);
byte[] thumbnailData = ResizeImageToThumbnail(imageData);
var largeUpload = largeBlob.UploadFromByteArrayAsync(
largeImage, 0, largeImage.Length);
var thumbUpload = thumbsBlob.UploadFromByteArrayAsync(
thumbImage, 0, thumbImage.Length);
Task.WaitAll(largeUpload, thumbUpload);
var largeUrl = "";// logic to build url
var thumbUrl = "";// logic to build url
return (largeUrl, thumbUrl);
}
I just replaced await Task.WhenAny with Task.WaitAll, and removed the wrapping Task from the method's return value.

Related

how to apply multi threading to a method

I have a program that get user int input "1" and increment it based on the amount of files in a directory then stamp that int on each file( first is 1 and so on 1++). The foreach loop go in each directory gets its files, increment the input and call the stamp method until all files are done. In this process the order is important. However multitasking ( Parallel.ForEach) does't always guarantee order, in my understanding it returns which ever thread done first and maybe also damage the i++ functionality ( correct me if I'm wrong).
The question is how to apply multi threading in this case? i am thinking save the values of the foreach at the end, pass it to the stamping method and have the method stamp x amount of files at a time. I don't know if its possible or how to apply.
Here is my watermark method:
//text comes from the foreach already set.
public void waterMark(string text, string sourcePath, string destinationPathh)
{
using (Bitmap bitmap = new Bitmap(sourcePath))
{
//somecode
using (Graphics graphics = Graphics.FromImage(tempBitmap))
{
//somecode
tempBitmap.Save(destinationPathh, ImageFormat.Tiff);
//Erroe^: a generic error occurred in gdi+
//I think due to trying to save multiple files at once
}
}
}
The foreach loop:
var files = folder.GetFiles();
Parallel.ForEach(files, new ParallelOptions { MaxDegreeOfParallelism = 4 }, (file, state,indexer) =>
{
//somecode that calls the waterMark method in multiple spots as of now
});
Thank you in advance.
There is an overload of Parallel.ForEach that also provides an index for the item being processed:
Parallel.ForEach(someEnumerable, (val, state, idx) => Console.WriteLine(idx))
You can use it to keep track of the index in a thread-safe fashion.
As for the GDI+ stuff (Bitmap), I think you're safe as long as you use a single thread for all interactions with the bitmap. Don't try to do anything clever with async between instantiation and disposal.

Is Marshal.Copy too processor-intensive in this situation?

I am working on a realtime simulation model. The models are written in unmanaged code, but the models are controlled by C# managed code, called the ExecutiveManager. An ExecutiveManager runs multiple models at a time, and controls the timing of the running models (like if a model has a "framerate" of 20 per second, the executive will tell the models when to start it's next frame).
We are seeing a consistently high load on the CPU when running the simulation, it can get up to 100% and stay there on a machine that should be totally appropriate. I have used a processor profiler to determine where the issues are, and it pointed me to two methods: WriteMemoryRegion and ReadMemoryRegion. The ExecutiveManager makes the calls to these methods. Models have shared memory regions, and the ExecutiveManager is used to read and write these regions using these Methods. Both read and write make calls to Marshal.Copy, and my gut tells me that's where the issue is, but I don't want to trust my gut! We are going to do further testing to narrow things down more, but I wanted to do a quick sanity check on Marshal.Copy. WriteMemoryRegion and ReadMemoryRegion are called each frame, and furthermore they're called by each model in the ExecutiveManager, and each model typically has 6 shared regions. So for 10 models each with 6 regions running at 20 frames per second calling both WriteMemoryRegion and ReadMemoryRegion, that's 2400 calls of Marshal.Copy per second. Is this unreasonable, or could my problem lie elsewhere?
public async Task ReadMemoryRegion(MemoryRegionDefinition g) {
if (!cache.ContainsKey(g.Name)) {
cache.Add(g.Name, mmff.CreateOrOpen(g.Name, g.Size));
}
var mmf = cache[g.Name];
using (var stream = mmf.CreateViewStream())
using (var reader = brf.Create(stream)) {
var buffer = reader.ReadBytes(g.Size);
await WriteIcBuffer(g, buffer).ConfigureAwait(false);
}
}
private Task WriteIcBuffer(MemoryRegionDefinition g, byte[] buffer) {
Marshal.Copy(buffer, 0, new IntPtr(g.BaseAddress),
buffer.Length);
return Task.FromResult(0);
}
public async Task WriteMemoryRegion(MemoryRegionDefinition g) {
if (!cache.ContainsKey(g.Name)) {
if (g.Size > 0) {
cache.Add(g.Name, mmff.CreateOrOpen(g.Name, g.Size));
} else if (g.Size == 0){
throw new EmptyGlobalException($#"Global {g.Name} not
created as it does not contain any variables.");
} else {
throw new NegativeSizeGlobalException($#"Global {g.Name}
not created as it has a negative size.");
}
}
var mmf = cache[g.Name];
using (var stream = mmf.CreateViewStream())
using (var writer = bwf.Create(stream)) {
var buffer = await ReadIcBuffer(g);
writer.Write(buffer);
}
}
private Task<byte[]> ReadIcBuffer(MemoryRegionDefinition g) {
var buffer = new byte[g.Size];
Marshal.Copy(new IntPtr(g.BaseAddress), buffer, 0, g.Size);
return Task.FromResult(buffer);
}
I need to come up with a solution so that my processor isn't catching on fire. I'm very green in this area so all ideas are welcome. Again, I'm not sure Marshal.Copy is the issue, but it seems possible. Please let me know if you see other issues that could contribute to the processor problem.

MediaCapture VideoStabilization fails with 0xC00D4A3E

I'm working on a video recording app that supports the VideoStabilization effect, but when I start recording, I receive the following through the MediaCapture.Failed event almost instantly:
The sample allocator is currently empty, due to outstanding requests.
(0xC00D4A3E)
It only happens when I use the recommended configuration from the effect, though. If I don't call SetUpVideoStabilizationRecommendationAsync, it works fine.
Here is how I'm setting it up:
private MediaEncodingProfile _encodingProfile = MediaEncodingProfile.CreateMp4(VideoEncodingQuality.Auto);
private async Task CreateVideoStabilizationEffectAsync()
{
var definition = new VideoStabilizationEffectDefinition();
_videoStabilizationEffect = (VideoStabilizationEffect)await _mediaCapture.AddVideoEffectAsync(definition, MediaStreamType.VideoRecord);
_videoStabilizationEffect.Enabled = true;
await SetUpVideoStabilizationRecommendationAsync();
}
private async Task SetUpVideoStabilizationRecommendationAsync()
{
var properties = _mediaCapture.VideoDeviceController.GetMediaStreamProperties(MediaStreamType.VideoRecord) as VideoEncodingProperties;
var recommendation = _videoStabilizationEffect.GetRecommendedStreamConfiguration(_mediaCapture.VideoDeviceController, properties);
if (recommendation.InputProperties != null)
{
await _mediaCapture.VideoDeviceController.SetMediaStreamPropertiesAsync(MediaStreamType.VideoRecord, recommendation.InputProperties);
}
if (recommendation.OutputProperties != null)
{
_encodingProfile.Video = recommendation.OutputProperties;
}
}
private async Task StartRecordingAsync()
{
var videoFile = await KnownFolders.PicturesLibrary.CreateFileAsync("StableVideo.mp4", CreationCollisionOption.GenerateUniqueName);
await _mediaCapture.StartRecordToStorageFileAsync(_encodingProfile, videoFile);
}
The desiredProperties parameter of the GetRecommendedStreamConfiguration method needs to get MediaEncodingProfile that will be used when calling your choice of MediaCapture.StartRecordTo* (i.e. the "output properties") to see what your desired VideoEncodingProperties are.
The error is being triggered because the VideoEncodingProperties from the VideoDeviceController (i.e. the "input properties") are being passed instead. If you think about it, an instance of the the VideoDeviceController is already being passed in as a parameter to the method, so the effect can already access the information in that properties var; it wouldn't make much sense to have to pass those in separately at the same time. Instead, what it needs is information about the other endpoint. Does that make sense? At least that's how I try to rationalize it.
The official SDK sample for VideoStabilization on the Microsoft github repo shows how to do this correctly:
/// <summary>
/// Configures the pipeline to use the optimal resolutions for VS based on the settings currently in use
/// </summary>
/// <returns></returns>
private async Task SetUpVideoStabilizationRecommendationAsync()
{
Debug.WriteLine("Setting up VS recommendation...");
// Get the recommendation from the effect based on our current input and output configuration
var recommendation = _videoStabilizationEffect.GetRecommendedStreamConfiguration(_mediaCapture.VideoDeviceController, _encodingProfile.Video);
// Handle the recommendation for the input into the effect, which can contain a larger resolution than currently configured, so cropping is minimized
if (recommendation.InputProperties != null)
{
// Back up the current input properties from before VS was activated
_inputPropertiesBackup = _mediaCapture.VideoDeviceController.GetMediaStreamProperties(MediaStreamType.VideoRecord) as VideoEncodingProperties;
// Set the recommendation from the effect (a resolution higher than the current one to allow for cropping) on the input
await _mediaCapture.VideoDeviceController.SetMediaStreamPropertiesAsync(MediaStreamType.VideoRecord, recommendation.InputProperties);
Debug.WriteLine("VS recommendation for the MediaStreamProperties (input) has been applied");
}
// Handle the recommendations for the output from the effect
if (recommendation.OutputProperties != null)
{
// Back up the current output properties from before VS was activated
_outputPropertiesBackup = _encodingProfile.Video;
// Apply the recommended encoding profile for the output, which will result in a video with the same dimensions as configured
// before VideoStabilization was added if an appropriate padded capture resolution was available. Otherwise, it will be slightly
// smaller (due to cropping). This prevents upscaling back to the original size, which can result in a loss of quality
_encodingProfile.Video = recommendation.OutputProperties;
Debug.WriteLine("VS recommendation for the MediaEncodingProfile (output) has been applied");
}
}

Retrieve a string containing html Document source using Task parallel

I really hope there's someone experienced enough both with TPL & System.Net Classes and methods
What started as a simple thought of use TPL on current sequential set of actions led me to a halt in my project.
As I am still fresh With .NET, jumping straight to deep water using TPL ...
I was trying to extract an Aspx page's source/content(html) using WebClient
Having multiple requests per day (around 20-30 pages to go through) and extract specific values out of the source code... being only one of few daily tasks the server has on its list,
Led me to try implement it by using TPL, thus gain some speed.
Although I tried using Task.Factory.StartNew() trying to iterate on few WC instances ,
on first try execution of WC the application just does not get any result from the WebClient
This is my last try on it
static void Main(string[] args)
{
EnumForEach<Act>(Execute);
Task.WaitAll();
}
public static void EnumForEach<Mode>(Action<Mode> Exec)
{
foreach (Mode mode in Enum.GetValues(typeof(Mode)))
{
Mode Curr = mode;
Task.Factory.StartNew(() => Exec(Curr) );
}
}
string ResultsDirectory = Environment.CurrentDirectory,
URL = "",
TempSourceDocExcracted ="",
ResultFile="";
enum Act
{
dolar, ValidateTimeOut
}
void Execute(Act Exc)
{
switch (Exc)
{
case Act.dolar:
URL = "http://www.AnyDomainHere.Com";
ResultFile =ResultsDirectory + "\\TempHtm.htm";
TempSourceDocExcracted = IeNgn.AgilityPacDocExtraction(URL).GetElementbyId("Dv_Main").InnerHtml;
File.WriteAllText(ResultFile, TempSourceDocExcracted);
break;
case Act.ValidateTimeOut:
URL = "http://www.AnotherDomainHere.Com";
ResultFile += "\\TempHtm.htm";
TempSourceDocExcracted = IeNgn.AgilityPacDocExtraction(URL).GetElementbyId("Dv_Main").InnerHtml;
File.WriteAllText(ResultFile, TempSourceDocExcracted);
break;
}
//usage of HtmlAgilityPack to extract Values of elements by their attributes/properties
public HtmlAgilityPack.HtmlDocument AgilityPacDocExtraction(string URL)
{
using (WC = new WebClient())
{
WC.Proxy = null;
WC.Encoding = Encoding.GetEncoding("UTF-8");
tmpExtractedPageValue = WC.DownloadString(URL);
retAglPacHtmDoc.LoadHtml(tmpExtractedPageValue);
return retAglPacHtmDoc;
}
}
What am I doing wrong? Is it possible to use a WebClient using TPL at all or should I use another tool (not being able to use IIS 7 / .net4.5)?
I see at least several issues:
naming - FlNm is not a name - VisualStudio is modern IDE with smart code completion, there's no need to save keystrokes (you may start here, there are alternatives too, main thing is too keep it consistent: C# Coding Conventions.
If you're using multithreading, you need to care about resource sharing. For example FlNm is a static string and it is assigned inside each thread, so it's value is not deterministic (also even if it was running sequentially, code would work faulty - you would adding file name in path in each iteration, so it would be like c:\TempHtm.htm\TempHtm.htm\TempHtm.htm)
You're writing to the same file from different threads (well, at least that was your intent I think) - usually that's a recipe for disaster in multithreading. Question is, if you need at all write anything to disk, or it can be downloaded as string and parsed without touching disk - there's a good example what does it mean to touch a disk.
Overall I think you should parallelize only downloading, so do not involve HtmlAgilityPack in multithreading, as I think you don't know it is thread safe. On the other hand, downloading will have good performance/thread count ratio, html parsing - not so much, may be if thread count will be equal to cores count, but not more. Even more - I would separate downloading and parsing, as it would be easier to test, understand and maintain.
Update: I don't understand your full intent, but this may help you started (it's not production code, you should add retry/error catching, etc.).
Also at the end is extended WebClient class allowing you to get more threads spinning, because by default webclient allows only two connections.
class Program
{
static void Main(string[] args)
{
var urlList = new List<string>
{
"http://google.com",
"http://yahoo.com",
"http://bing.com",
"http://ask.com"
};
var htmlDictionary = new ConcurrentDictionary<string, string>();
Parallel.ForEach(urlList, new ParallelOptions { MaxDegreeOfParallelism = 20 }, url => Download(url, htmlDictionary));
foreach (var pair in htmlDictionary)
{
Process(pair);
}
}
private static void Process(KeyValuePair<string, string> pair)
{
// do the html processing
}
private static void Download(string url, ConcurrentDictionary<string, string> htmlDictionary)
{
using (var webClient = new SmartWebClient())
{
htmlDictionary.TryAdd(url, webClient.DownloadString(url));
}
}
}
public class SmartWebClient : WebClient
{
private readonly int maxConcurentConnectionCount;
public SmartWebClient(int maxConcurentConnectionCount = 20)
{
this.maxConcurentConnectionCount = maxConcurentConnectionCount;
}
protected override WebRequest GetWebRequest(Uri address)
{
var httpWebRequest = (HttpWebRequest)base.GetWebRequest(address);
if (httpWebRequest == null)
{
return null;
}
if (maxConcurentConnectionCount != 0)
{
httpWebRequest.ServicePoint.ConnectionLimit = maxConcurentConnectionCount;
}
return httpWebRequest;
}
}

An elegant / performant way to "Touch" a file in (update ModifiedTime) in WinRT?

An elegant / performant way to "Touch" a file in (update ModifiedTime) WinRT?
I have some code which needs to delete files that are older than 30 days. This works well, but in some cases, I need to update the time on the file to reset the 30 day window, and prevent deletion. On the basicProperties list, the ModifiedTime is read-only, so I need to find another way to update it...
Method 1: Rename twice
// Ugly, and may have side-effects depending on what's using the file
// Sometimes gives access denied...
public static async Task TouchFileAsync(this StorageFile file)
{
var name = file.Name;
await file.RenameAsync("~" + name).AsTask().ContinueWith(
async (task) => { await file.RenameAsync(name); }
);
}
Method 2: Modify a file property
// Sometimes works, but currently throwing an ArgumentException for
// me, and I have no idea why. Also tried many other properties:
// http://msdn.microsoft.com/en-us/library/windows/desktop/bb760658(v=vs.85).aspx
public static async Task TouchFileAsync(this StorageFile file)
{
var prop = new KeyValuePair<string, object>("System.Comment", DateTime.Now.Ticks.ToString());
await file.Properties.SavePropertiesAsync(new[] { prop });
}
Method 3: Use a Win32 API via P/Invoke?
Not sure if this would work on ARM devices?
Pass certification?
Be performant?
Is there a best way to do this? Code sample?
Anyone got any other ideas? I'm a bit stuck :-)
Many thanks,
Jon
I just had a need for this and here is my solution.
usage
await storageFileToTouch.TouchAsync();
code
public static class StorageFileExtensions
{
/// <summary>
/// Touches a file to update the DateModified property.
/// </summary>
public static async Task TouchAsync(this StorageFile file)
{
using (var touch = await file.OpenTransactedWriteAsync())
{
await touch.CommitAsync();
}
}
}
Assuming you're planning on combing a list of files that exist locally on an RT machine, and not somewhere in that cloud (otherwise we woudln't have to worry about the WinRT doc mod process), You could easily use the Application Data Container provided to each app to store very thin data (key value pairs fit very well).
In this way you would store a future delete date for each file that needed to be persisted, so that the next time it was raised for deletion, before the deletion process occurs, the app checks the App Storage Data. Then you wont need to worry about the permissions of the files you're iterating over, when you're only trying to make sure they don't get deleted from your process.
Windows.Storage.ApplicationDataContainer localSettings = Windows.Storage.ApplicationData.Current.LocalSettings;
// Create a setting in a container
Windows.Storage.ApplicationDataContainer container =
localSettings.CreateContainer("FilesToPersist", Windows.Storage.ApplicationDataCreateDisposition.Always);
StorageFile file = fileYouWantToPersist;
if (localSettings.Containers.ContainsKey("FilesToPersist"))
{
localSettings.Containers["FilesToPersist"].Values[file.FolderRelativeId] = DateTime.Now.AddDays(30);
}
// Read data from a setting in a container
bool hasContainer = localSettings.Containers.ContainsKey("FilesToPersist");
bool hasSetting = false;
if (hasContainer)
{
hasSetting = localSettings.Containers["FilesToPersist"].Values.ContainsKey(file.FolderRelativeId);
if(hasSettings)
{
string dt = localSettings.Containers["FilesToPersist"].Values[file.FolderRelativeId];
if(Convert.ToDateTime(dt) < DateTime.Now)
{
//Delete the file
}
}
}
Resources:
http://msdn.microsoft.com/en-us/library/windows/apps/windows.storage.applicationdata.aspx
http://lunarfrog.com/blog/2011/10/10/winrt-storage-accesscache/

Categories