Abandoned memory in posting image data to server - c#

Showing high consumption of memory while posting image data to server and it is not releasing. reportModel in following source code has base64 string of image data. Here is a snapshot of source code,
public async Task<FaultReportResponseModel> ReportFault(ReportFaultRequestModel reportModel)
{
try
{
App.IsConnectedToInternet(true);
reportModel.Token = App.WebOpsToken;
//var httpContent = CreateHttpContent(reportModel);
var jsonBody = JsonConvert.SerializeObject(reportModel);
_log.Trace("ReportFault api jsonBody length: {0}", jsonBody.Length);
var content = new StringContent(jsonBody, Encoding.UTF8, "application/json");
AddAuthorizationHeader();
string serviceURL;
if (reportModel.IssueType == IssueTypes.CantFind)
{
serviceURL = Constants.CantFindSvcURL;
}
else
{
serviceURL = Constants.ReportFaultSvcURL;
}
//var url = string.Format("{0}{1}", Constants.DataSVCBaseURL, serviceURL);
var url = GetURLStringForService(serviceURL, ServiceType.WebOpsData);
var response = await _restClient.PostAsync(url, content);
var responseStr = await response.Content.ReadAsStringAsync();
var parsedResponse = JsonConvert.DeserializeObject<FaultReportResponseModel>(responseStr);
_log.Trace("Uploaded fault text: {0}", parsedResponse.OK);
content.Dispose();
return parsedResponse;
}
catch (Exception ex)
{
_log.Trace("Exception: {0}", ex.Message);
}
return null;
}
Snapshot of the memory footprint,
It is showing that Json serialization is taking memory and that never got released. Because of this abandoned memory, after few cycles of image upload app crashes.
What I tried,
Used Stream content to Post to server. In this case, it is showing memory problem in Stream. Problem pointer changed but the problem is same.
On Internet I found that it is because of Large Object Heap so, I tried to invoke GC manually but no change in memory footprint.
Any help or pointer to get out of this problem would be helpful.

You are creating large blocks of memory on the LOH. This is likely not a memory heap, though it definitely isn't optimal in high throughput applications
Assuming you want to actually use Json.Net on serialisation you can achieve this with JsonTextWriter and serialize directly to a stream (ideally the HttpClient NetworkStream). Note that Test.Json also has a very efficient methods for serializing to stream as well.
To get access to the underlying NetworkStream in HttpClient, you could create a derived HttpContent class
Example
public class SerializedStreamedContent<T> :HttpContent
{
private readonly T _value;
public SerializedStreamedContent(T value) => _value = value;
protected override Task SerializeToStreamAsync(Stream stream, TransportContext? context)
{
try
{
using var writer = new StreamWriter(stream, leaveOpen:true);
using var jsonWriter = new JsonTextWriter(writer);
var ser = new JsonSerializer();
ser.Serialize(jsonWriter, _value);
jsonWriter.Flush();
return Task.CompletedTask;
}
catch (Exception e)
{
return Task.FromException(e);
}
}
protected override bool TryComputeLength(out long length)
{
length = -1;
return false;
}
}
Note 1 : This is not intended to be a complete solution, just an example. There are many considerations that you will need to weigh up using this approach
Note 2 : In .Net 5 there is a JsonContent Class, that does all this for you with Text.Json implementation (and more)

Related

Add BackgroundImage with EPPlus only allows path but cannot get path in Blazor WASM

This may not be 100% an EPPlus issue, but since it is Blazor WASM it appears I cannot get the file path to a static image in the wwwroot/images folder. I can get the url and paste it into a browser and that works, even adding that same path to the src attribute of an img works, neither of those helps me.
FYI "background" in this context means a watermark.
It appears that the EPPlus dev team only wants a drive path the file (ex. C:\SomeFolder\SomeFile.png), and I am not seeing how to get that within Blazor WASM. I can get the bytes of the file in c# and even a stream, but no direct path.
My code is the following:
using (var package = new ExcelPackage(fileName))
{
var sheet = package.Workbook.Worksheets.Add(exportModel.OSCode);
sheet.BackgroundImage.SetFromFile("https://localhost:44303/images/Draft.png");
...
}
This returns an exception:
Unhandled exception rendering component: Can't find file /https:/localhost:44303/images/Draft.png
Noticing that leading / I even tried:
sheet.BackgroundImage.SetFromFile("images/Draft.png");
Which returned the same error:
Unhandled exception rendering component: Can't find file /images/Draft.png
So, I am perhaps needing 1 of 2 possible answers:
A way to get a local drive path to the file so the .SetFromFile method is not going to error.
To have a way to set that BackgroundImage property with a byte array or stream of the image. There is this property BackgroundImage.Image but it is readonly.
Thanks to a slap in the face from #Panagiotis-Kanavos I wound up taking the processing out of the client and moving it to the server. With that, I was able to use Static Files to add the watermark with relatively little pain.
In case anyone may need the full solution (which I always find helpful) here it is:
Here is the code within the button click on the Blazor component or page:
private async Task GenerateFile(bool isFinal)
{
...
var fileStream = await excelExportService.ProgramMap(exportModel);
var fileName = "SomeFileName.xlsx";
using var streamRef = new DotNetStreamReference(stream: fileStream);
await jsRuntime.InvokeVoidAsync("downloadFileFromStream", fileName, streamRef);
}
That calls a client-side service that really just passes control over to the server:
public class ExcelExportService : IExcelExportService
{
private const string baseUri = "api/excel-export";
private readonly IHttpService httpService;
public ExcelExportService(IHttpService httpService)
{
this.httpService = httpService;
}
public async Task<Stream> ProgramMap(ProgramMapExportModel exportModel)
{
return await httpService.PostAsJsonForStreamAsync<ProgramMapExportModel>($"{baseUri}/program-map", exportModel);
}
}
Here is the server-side controller that catches the call from the client:
[Route("api/excel-export")]
[ApiController]
public class ExcelExportController : ControllerBase
{
private readonly ExcelExportService excelExportService;
public ExcelExportController(ExcelExportService excelExportService)
{
this.excelExportService = excelExportService;
}
[HttpPost]
[Route("program-map")]
public async Task<Stream> ProgramMap([FromBody] ProgramMapExportModel exportModel)
{
return await excelExportService.ProgramMap(exportModel);
}
}
And that in-turn calls the server-side service where the magic happens:
public async Task<Stream> ProgramMap(ProgramMapExportModel exportModel)
{
var result = new MemoryStream();
ExcelPackage.LicenseContext = LicenseContext.Commercial;
var fileName = #$"Gets Overwritten";
using (var package = new ExcelPackage(fileName))
{
var sheet = package.Workbook.Worksheets.Add(exportModel.OSCode);
if (!exportModel.IsFinal)
{
var pathToDraftImage = #$"{Directory.GetCurrentDirectory()}\StaticFiles\Images\Draft.png";
sheet.BackgroundImage.SetFromFile(pathToDraftImage);
}
...
sheet.Cells.AutoFitColumns();
package.SaveAs(result);
}
result.Position = 0; // Without this, data does not get written
return result;
}
For some reason, this next method was not needed when doing this on the client-side but now that it is back here, I had to add a method that returned a stream specifically and used the ReadAsStreamAsync instead of ReadAsJsonAsync:
public async Task<Stream> PostAsJsonForStreamAsync<TValue>(string requestUri, TValue value, CancellationToken cancellationToken = default)
{
Stream result = default;
var responseMessage = await httpClient.PostAsJsonAsync(requestUri, value, cancellationToken);
try
{
result = await responseMessage.Content.ReadAsStreamAsync(cancellationToken: cancellationToken);
}
catch (HttpRequestException e)
{
...
}
return result;
}
Lastly, in order for it to give the end-user a download link, this was used (taken from the Microsoft Docs):
window.downloadFileFromStream = async (fileName, contentStreamReference) => {
const arrayBuffer = await contentStreamReference.arrayBuffer();
const blob = new Blob([arrayBuffer]);
const url = URL.createObjectURL(blob);
const anchorElement = document.createElement("a");
anchorElement.href = url;
anchorElement.download = fileName ?? "";
anchorElement.click();
anchorElement.remove();
URL.revokeObjectURL(url);
}

C# - OutOfMemoryException saving a List on a JSON file

I'm trying to save the streaming data of a pressure map.
Basically I have a pressure matrix defined as:
double[,] pressureMatrix = new double[e.Data.GetLength(0), e.Data.GetLength(1)];
Basically, I'm getting one of this pressureMatrix every 10 milliseconds and I want to save all the information in a JSON file to be able to reproduce it later.
What I do is, first of all, write what I call the header with all the settings used to do the recording like this:
recordedData.softwareVersion = Assembly.GetExecutingAssembly().GetName().Version.Major.ToString() + "." + Assembly.GetExecutingAssembly().GetName().Version.Minor.ToString();
recordedData.calibrationConfiguration = calibrationConfiguration;
recordedData.representationConfiguration = representationSettings;
recordedData.pressureData = new List<PressureMap>();
var json = JsonConvert.SerializeObject(csvRecordedData, Formatting.None);
File.WriteAllText(this.filePath, json);
Then, every time I get a new pressure map I create a new Thread to add the new PressureMatrix and re-write the file:
var newPressureMatrix = new PressureMap(datos, DateTime.Now);
recordedData.pressureData.Add(newPressureMatrix);
var json = JsonConvert.SerializeObject(recordedData, Formatting.None);
File.WriteAllText(this.filePath, json);
After about 20-30 min I get an OutOfMemory Exception because the system cannot hold the recordedData var because the List<PressureMatrix> in it is too big.
How can I handle this to save a the data? I would like to save the information of 24-48 hours.
Your basic problem is that you are holding all of your pressure map samples in memory rather than writing each one individually and then allowing it to be garbage collected. What's worse, you are doing this in two different places:
You serialize your entire list of samples to a JSON string json before writing the string to a file.
Instead, as explained in Performance Tips: Optimize Memory Usage, you should serialize and deserialize directly to and from your file in such situations. For instructions on how to do this see this answer to Can Json.NET serialize / deserialize to / from a stream? and also Serialize JSON to a file.
The recordedData.pressureData = new List<PressureMap>(); accumulates all pressure map samples, then writes all of them every time a sample is made.
A better solution would be to write each sample once and forget it, but the requirement for each sample to be nested inside some container objects in the JSON makes it nonobvious how to do that.
So, how to attack issue #2?
First, let's modify your data model as follows, partitioning the header data into a separate class:
public class PressureMap
{
public double[,] PressureMatrix { get; set; }
}
public class CalibrationConfiguration
{
// Data model not included in question
}
public class RepresentationConfiguration
{
// Data model not included in question
}
public class RecordedDataHeader
{
public string SoftwareVersion { get; set; }
public CalibrationConfiguration CalibrationConfiguration { get; set; }
public RepresentationConfiguration RepresentationConfiguration { get; set; }
}
public class RecordedData
{
// Ensure the header is serialized first.
[JsonProperty(Order = 1)]
public RecordedDataHeader RecordedDataHeader { get; set; }
// Ensure the pressure data is serialized last.
[JsonProperty(Order = 2)]
public IEnumerable<PressureMap> PressureData { get; set; }
}
Option #1 is a version of the producer-comsumer pattern. It involves spinning up two threads: one to generate PressureData samples, and one to serialize the RecordedData. The first thread will generate samples and add them to a BlockingCollection<PressureMap> collection that is passed to the second thread. The second thread will then serialize BlockingCollection<PressureMap>.GetConsumingEnumerable()
as the value of RecordedData.PressureData.
The following code gives a skeleton for how to do this:
var sampleCount = 400; // Or whatever stopping criterion you prefer
var sampleInterval = 10; // in ms
using (var pressureData = new BlockingCollection<PressureMap>())
{
// Adapted from
// https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/blockingcollection-overview
// https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=netframework-4.7.2
// Spin up a Task to sample the pressure maps
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < sampleCount; i++)
{
var data = GetPressureMap(i);
Console.WriteLine("Generated sample {0}", i);
pressureData.Add(data);
System.Threading.Thread.Sleep(sampleInterval);
}
pressureData.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
using (var stream = new FileStream(this.filePath, FileMode.Create))
using (var textWriter = new StreamWriter(stream))
using (var jsonWriter = new JsonTextWriter(textWriter))
{
int j = 0;
var query = pressureData
.GetConsumingEnumerable()
.Select(p =>
{
// Flush the writer periodically in case the process terminates abnormally
jsonWriter.Flush();
Console.WriteLine("Serializing item {0}", j++);
return p;
});
var recordedData = new RecordedData
{
RecordedDataHeader = recordedDataHeader,
// Since PressureData is declared as IEnumerable<PressureMap>, evaluation will be lazy.
PressureData = query,
};
Console.WriteLine("Beginning serialization of {0} to {1}:", recordedData, this.filePath);
JsonSerializer.CreateDefault(settings).Serialize(textWriter, recordedData);
Console.WriteLine("Finished serialization of {0} to {1}.", recordedData, this.filePath);
}
}))
{
Task.WaitAll(t1, t2);
}
}
}
Notes:
This solution uses the fact that, when serializing an IEnumerable<T>, Json.NET will not materialize the enumerable as a list. Instead it will take full advantage of lazy evaluation and simply enumerate through it, writing then forgetting each individual item encountered.
The first thread samples PressureData and adds them to the blocking collection.
The second thread wraps the blocking collection in an IEnumerable<PressureData> then serializes that as RecordedData.PressureData.
During serialization, the serializer will enumerate through the IEnumerable<PressureData> enumerable, streaming each to the JSON file then proceeding to the next -- effectively blocking until one becomes available.
You will need to do some experimentation to make sure that the serialization thread can "keep up" with the sampling thread, possibly by setting a BoundedCapacity during construction. If not, you may need to adopt a different strategy.
PressureMap GetPressureMap(int count) should be some method of yours (not shown in the question) that returns the current pressure map sample.
In this technique the JSON file remains open for the duration of the sampling session. If sampling terminates abnormally the file may be truncated. I make some attempt to ameliorate the problem by flushing the writer periodically.
While data serialization will no longer require unbounded amounts of memory, deserializing a RecordedData later will deserialize the PressureData array into a concrete List<PressureMap>. This may possibly cause memory issues during downstream processing.
Demo fiddle #1 here.
Option #2 would be to switch from a JSON file to a Newline Delimited JSON file. Such a file consists of sequences of JSON objects separated by newline characters. In your case, you would make the first object contain the RecordedDataHeader information, and the subsequent objects be of type PressureMap:
var sampleCount = 100; // Or whatever
var sampleInterval = 10;
var recordedDataHeader = new RecordedDataHeader
{
SoftwareVersion = softwareVersion,
CalibrationConfiguration = calibrationConfiguration,
RepresentationConfiguration = representationConfiguration,
};
var settings = new JsonSerializerSettings
{
ContractResolver = new CamelCasePropertyNamesContractResolver(),
};
// Write the header
Console.WriteLine("Beginning serialization of sample data to {0}.", this.filePath);
using (var stream = new FileStream(this.filePath, FileMode.Create))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { recordedDataHeader });
}
// Write each sample incrementally
for (int i = 0; i < sampleCount; i++)
{
Thread.Sleep(sampleInterval);
Console.WriteLine("Performing sample {0} of {1}", i, sampleCount);
var map = GetPressureMap(i);
using (var stream = new FileStream(this.filePath, FileMode.Append))
{
JsonExtensions.ToNewlineDelimitedJson(stream, new[] { map });
}
}
Console.WriteLine("Finished serialization of sample data to {0}.", this.filePath);
Using the extension methods:
public static partial class JsonExtensions
{
// Adapted from the answer to
// https://stackoverflow.com/questions/44787652/serialize-as-ndjson-using-json-net
// by dbc https://stackoverflow.com/users/3744182/dbc
public static void ToNewlineDelimitedJson<T>(Stream stream, IEnumerable<T> items)
{
// Let caller dispose the underlying stream
using (var textWriter = new StreamWriter(stream, new UTF8Encoding(false, true), 1024, true))
{
ToNewlineDelimitedJson(textWriter, items);
}
}
public static void ToNewlineDelimitedJson<T>(TextWriter textWriter, IEnumerable<T> items)
{
var serializer = JsonSerializer.CreateDefault();
foreach (var item in items)
{
// Formatting.None is the default; I set it here for clarity.
using (var writer = new JsonTextWriter(textWriter) { Formatting = Formatting.None, CloseOutput = false })
{
serializer.Serialize(writer, item);
}
// http://specs.okfnlabs.org/ndjson/
// Each JSON text MUST conform to the [RFC7159] standard and MUST be written to the stream followed by the newline character \n (0x0A).
// The newline charater MAY be preceeded by a carriage return \r (0x0D). The JSON texts MUST NOT contain newlines or carriage returns.
textWriter.Write("\n");
}
}
// Adapted from the answer to
// https://stackoverflow.com/questions/29729063/line-delimited-json-serializing-and-de-serializing
// by Yuval Itzchakov https://stackoverflow.com/users/1870803/yuval-itzchakov
public static IEnumerable<TBase> FromNewlineDelimitedJson<TBase, THeader, TRow>(TextReader reader)
where THeader : TBase
where TRow : TBase
{
bool first = true;
using (var jsonReader = new JsonTextReader(reader) { CloseInput = false, SupportMultipleContent = true })
{
var serializer = JsonSerializer.CreateDefault();
while (jsonReader.Read())
{
if (jsonReader.TokenType == JsonToken.Comment)
continue;
if (first)
{
yield return serializer.Deserialize<THeader>(jsonReader);
first = false;
}
else
{
yield return serializer.Deserialize<TRow>(jsonReader);
}
}
}
}
}
Later, you can process the newline delimited JSON file as follows:
using (var stream = File.OpenRead(filePath))
using (var textReader = new StreamReader(stream))
{
foreach (var obj in JsonExtensions.FromNewlineDelimitedJson<object, RecordedDataHeader, PressureMap>(textReader))
{
if (obj is RecordedDataHeader)
{
var header = (RecordedDataHeader)obj;
// Process the header
Console.WriteLine(JsonConvert.SerializeObject(header));
}
else
{
var row = (PressureMap)obj;
// Process the row.
Console.WriteLine(JsonConvert.SerializeObject(row));
}
}
}
Notes:
This approach looks simpler because the samples are added incrementally to the end of the file, rather than inserted inside some overall JSON container.
With this approach both serialization and downstream processing can be done with bounded memory use.
The sample file does not remain open for the duration of sampling, so is less likely to be truncated.
Downstream applications may not have built-in tools for processing newline delimited JSON.
This strategy may integrate more simply with your current threading code.
Demo fiddle #2 here.

Finding a memory leak

I have an issue with the following code. I create a memory stream in the GetDB function and the return value is used in a using block. For some unknown reason if I dump my objects I see that the MemoryStream is still around at the end of the Main method. This cause me a massive leak. Any idea how I can clean this buffer ?
I have actually checked that the Dispose method has been called on the MemoryStream but the object seems to stay around, I have used the diagnostic tools of Visual Studio 2017 for this task.
class Program
{
static void Main(string[] args)
{
List<CsvProduct> products;
using (var s = GetDb())
{
products = Utf8Json.JsonSerializer.Deserialize<List<CsvProduct>>(s).ToList();
}
}
public static Stream GetDb()
{
var filepath = Path.Combine("c:/users/tom/Downloads", "productdb.zip");
using (var archive = ZipFile.OpenRead(filepath))
{
var data = archive.Entries.Single(e => e.FullName == "productdb.json");
using (var s = data.Open())
{
var ms = new MemoryStream();
s.CopyTo(ms);
ms.Seek(0, SeekOrigin.Begin);
return (Stream)ms;
}
}
}
}
For some unknown reason if I dump my objects I see that the MemoryStream is still around at the end of the Main method.
That isn't particuarly abnormal; GC happens separately.
This cause me a massive leak.
That isn't a leak, it is just memory usage.
Any idea how I can clean this buffer ?
I would probably just not use a MemoryStream, instead returning something that wraps the live uncompressing stream (from s = data.Open()). The problem here, though, is that you can't just return s - as archive would still be disposed upon leaving the method. So if I needed to solve this, I would create a custom Stream that wraps an inner stream and which disposes a second object when disposed, i.e.
class MyStream : Stream {
private readonly Stream _source;
private readonly IDisposable _parent;
public MyStream(Stream, IDisposable) {...assign...}
// not shown: Implement all Stream methods via `_source` proxy
public override void Dispose()
{
_source.Dispose();
_parent.Dispose();
}
}
then have:
public static Stream GetDb()
{
var filepath = Path.Combine("c:/users/tom/Downloads", "productdb.zip");
var archive = ZipFile.OpenRead(filepath);
var data = archive.Entries.Single(e => e.FullName == "productdb.json");
var s = data.Open();
return new MyStream(s, archive);
}
(could be improved slightly to make sure that archive is disposed if an exception happens before we return with success)

retrieving partial content using multiple http requsets to fetch data via parllel tasks

i am trying to be as thorough as i can in this post, as it is very important for me,
though the issue is very simple, and only by reading the title of this question, you can get the idea...
question is:
with healthy bandwidth (30mb Vdsl) available...
how is it possible to get multiple httpWebRequest for a single data / file ?,
so each reaquest,will download only a portion of the data
then when all instances have completed, all parts are joined back to one piece.
Code:
...what i have got working so far is same idea only each task =HttpWebRequest = different file,
so speedup is pure tasks parallelism rather acceleration of one download using multiple tasks/threads
as in my question.
see code below
the next part is only more detailed explantion and background on the subject...if you don't mind reading.
while i am still on a similar project that differ from this (in question)one,
in the way that it(see code below..) was trying to fetch as many different data sources for each of separated tasks(different downloads/files).
... so the speedup was gaind while each(task) does not have to wait for the former one to complete first before it get a chance to be executed .
what i am trying to do in this current-subjected question (having allmost everything ready in the code below) is actually targetting same url for same data,
so this time the speedup to gain is for the single-task - current download .
implementing same idea as in code below only this time let SmartWebClient target same url by
using multiple instances.
then (only theory for now) it will request partial content of data,
with multiple requests with each one of instances .
last issue is i need to "put puzle back to one peace"... another problem i need to find out about...
as you can see in this code , what i did not get to work on yet is only the data parsing/processing which i find to be very easy using htmlAgilityPack so no problem.
current code
main entry:
var htmlDictionary = urlsForExtraction.urlsConcrDict();
Parallel.ForEach(
urlList.Values,
new ParallelOptions { MaxDegreeOfParallelism = 20 },
url => Download(url, htmlDictionary)
);
foreach (var pair in htmlDictionary)
{
///Process(pair);
MessageBox.Show(pair.Value);
}
public class urlsForExtraction
{
const string URL_Dollar= "";
const string URL_UpdateUsersTimeOut="";
public ConcurrentDictionary<string, string> urlsConcrDict()
{
//need to find the syntax to extract fileds names so it would be possible to iterate on each instead of specying
ConcurrentDictionary<string, string> retDict = new Dictionary<string,string>();
retDict.TryAdd("URL_Dollar", "Any.Url.com");
retDict.TryAdd("URL_UpdateUserstbl", "http://bing.com");
return retDict;
}
}
/// <summary>
/// second Stage Class consumes the Dictionary of urls for extraction
/// then downloads Each via parallel for each using The Smart WeBClient! (download(); )
/// </summary>
public class InitConcurentHtmDictExtrct
{
private void Download(string url, ConcurrentDictionary<string, string> htmlDictionary)
{
using (var webClient = new SmartWebClient())
{
webClient.Encoding = Encoding.GetEncoding("UTF-8");
webClient.Proxy = null;
htmlDictionary.TryAdd(url, webClient.DownloadString(url));
}
}
private ConcurrentDictionary<string, string> htmlDictionary;
public ConcurrentDictionary<string, string> LoopOnUrlsVia_SmartWC(Dictionary<string, string> urlList)
{
htmlDictionary = new ConcurrentDictionary<string, string>();
Parallel.ForEach(
urlList.Values,
new ParallelOptions { MaxDegreeOfParallelism = 20 },
url => Download(url, htmlDictionary)
);
return htmlDictionary;
}
}
/// <summary>
/// the Extraction Process, done via "HtmlAgility pack"
/// easy usage to collect information within a given html Documnet via referencing elements attributes
/// </summary>
public class Results
{
public struct ExtracionParameters
{
public string FileNameToSave;
public string directoryPath;
public string htmlElementType;
}
public enum Extraction
{
ById, ByClassName, ByElementName
}
public void ExtractHtmlDict( ConcurrentDictionary<string, string> htmlResults, Extract By)
{
// helps with easy elements extraction from the page.
HtmlAttribute htAgPcAttrbs;
HtmlDocument HtmlAgPCDoc = new HtmlDocument();
/// will hold a name+content of each documnet-part that was aventually extracted
/// then from this container the build of the result page will be possible
Dictionary<string, HtmlDocument> dictResults = new Dictionary<string, HtmlDocument>();
foreach (KeyValuePair<string, string> htmlPair in htmlResults)
{
Process(htmlPair);
}
}
private static void Process(KeyValuePair<string, string> pair)
{
// do the html processing
}
}
public class SmartWebClient : WebClient
{
private readonly int maxConcurentConnectionCount;
public SmartWebClient(int maxConcurentConnectionCount = 20)
{
this.Proxy = null;
this.Encoding = Encoding.GetEncoding("UTF-8");
this.maxConcurentConnectionCount = maxConcurentConnectionCount;
}
protected override WebRequest GetWebRequest(Uri address)
{
var httpWebRequest = (HttpWebRequest)base.GetWebRequest(address);
if (httpWebRequest == null)
{
return null;
}
if (maxConcurentConnectionCount != 0)
{
httpWebRequest.ServicePoint.ConnectionLimit = maxConcurentConnectionCount;
}
return httpWebRequest;
}
}
}
this allows me to take advantage of good bandwith,
only i am far from the subjected solution, i will realy appriciate any clue on where to start .
If the server support what's wikipedia calls byte serving, you can multiplex a file download spawning multiple requests with a specific Range header value (using the AddRange method. See also How to download the data from the server discontinuously?). Most serious HTTP servers do support byte-range.
Here is some sample code that implements a parallel download of a file using byte range:
public static void ParallelDownloadFile(string uri, string filePath, int chunkSize)
{
if (uri == null)
throw new ArgumentNullException("uri");
// determine file size first
long size = GetFileSize(uri);
using (FileStream file = new FileStream(filePath, FileMode.Create, FileAccess.Write, FileShare.Write))
{
file.SetLength(size); // set the length first
object syncObject = new object(); // synchronize file writes
Parallel.ForEach(LongRange(0, 1 + size / chunkSize), (start) =>
{
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.AddRange(start * chunkSize, start * chunkSize + chunkSize - 1);
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
lock (syncObject)
{
using (Stream stream = response.GetResponseStream())
{
file.Seek(start * chunkSize, SeekOrigin.Begin);
stream.CopyTo(file);
}
}
});
}
}
public static long GetFileSize(string uri)
{
if (uri == null)
throw new ArgumentNullException("uri");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Method = "HEAD";
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
return response.ContentLength;
}
private static IEnumerable<long> LongRange(long start, long count)
{
long i = 0;
while (true)
{
if (i >= count)
{
yield break;
}
yield return start + i;
i++;
}
}
And sample usage:
private static void TestParallelDownload()
{
string uri = "http://localhost/welcome.png";
string fileName = Path.GetFileName(uri);
ParallelDownloadFile(uri, fileName, 10000);
}
PS: I'd be curious to know if it's really more interesting to do this parallel thing rather than to just use WebClient.DownloadFile... Maybe in slow network scenarios?

Locking with asynchronous httpwebrequest

I have an object that downloads a file from a server, saves it into Isolated Storage asynchronously and provides a GetData method to retrieve the data. Would I use a
IsolatedStorageFile storageObj; //initialized in the constructor
lock(storageObj)
{
//save code
}
In the response and
lock(storageObj)
{
//load code
}
In the GetData method?
Edit: I'll give some context here.
The app (for Windows Phone) needs to download and cache multiple files from a server, so I've created a type that takes 2 strings (a uri and a filename), sends out for data from the given uri, and saves it. The same object also has the get data method. Here's the code (simplified a bit)
public class ServerData: INotifyPropertyChanged
{
public readonly string ServerUri;
public readonly string Filename;
IsolatedStorageFile appStorage;
DownloadState _downloadStatus = DownloadState.NotStarted;
public DownloadState DownloadStatus
{
protected set
{
if (_downloadStatus == value) return;
_downloadStatus = value;
OnPropertyChanged(new PropertyChangedEventArgs("DownloadStatus"));
}
get { return _downloadStatus; }
}
public ServerData(string serverUri, string filename)
{
ServerUri = serverUri;
Filename = filename;
appStorage = IsolatedStorageFile.GetUserStoreForApplication();
}
protected virtual void OnPropertyChanged(PropertyChangedEventArgs args)
{
if (PropertyChanged != null)
PropertyChanged(this, args);
}
public void RequestDataFromServer()
{
DownloadStatus = DownloadState.Downloading;
//this first bit adds a random unused query to the Uri,
//so Silverlight won't cache the request
Random rand = new Random();
StringBuilder uriText = new StringBuilder(ServerUri);
uriText.AppendFormat("?YouHaveGotToBeKiddingMeHack={0}",
rand.Next().ToString());
Uri uri = new Uri(uriText.ToString(), UriKind.Absolute);
HttpWebRequest serverRequest = (HttpWebRequest)WebRequest.Create(uri);
ServerRequestUpdateState serverState = new ServerRequestUpdateState();
serverState.AsyncRequest = serverRequest;
serverRequest.BeginGetResponse(new AsyncCallback(RequestResponse),
serverState);
}
void RequestResponse(IAsyncResult asyncResult)
{
var serverState = (ServerRequestUpdateState)asyncResult.AsyncState;
var serverRequest = (HttpWebRequest)serverState.AsyncRequest;
Stream serverStream;
try
{
// end the async request
serverState.AsyncResponse =
(HttpWebResponse)serverRequest.EndGetResponse(asyncResult);
serverStream = serverState.AsyncResponse.GetResponseStream();
Save(serverStream);
serverStream.Dispose();
}
catch (WebException)
{
DownloadStatus = DownloadState.Error;
}
Deployment.Current.Dispatcher.BeginInvoke(() =>
{
DownloadStatus = DownloadState.FileReady;
});
}
void Save(Stream streamToSave)
{
StreamReader reader = null;
IsolatedStorageFileStream file;
StreamWriter writer = null;
reader = new StreamReader(streamToSave);
lock (appStorage)
{
file = appStorage.OpenFile(Filename, FileMode.Create);
writer = new StreamWriter(file);
writer.Write(reader.ReadToEnd());
reader.Dispose();
writer.Dispose();
}
}
public XDocument GetData()
{
XDocument xml = null;
lock(appStorage)
{
if (appStorage.FileExists(Filename))
{
var file = appStorage.OpenFile(Filename, FileMode.Open);
xml = XDocument.Load(file);
file.Dispose();
}
}
if (xml != null)
return xml;
else return new XDocument();
}
}
Your question doesn't provide an awful lot of context, and with the amount of information given people could be inclined to simply tell you yes, maybe with small, but pertinent additions.
Practice generally sees locking occur on an instance of a dedicated object, being sure to stay away from locking on this since you lock the whole instance of the current object down, which is scarcely, if ever the intent - but, in your case, we don't rightly know to the fullest extent, however, I hardly think locking your storage instance is the way to go.
Also, since you mention client and server interaction, it isn't as straight forward.
Depending on the load and many other factors, you might want to provide many reads of the file from the server yet only a single write at any one time on the client that is downloading; for this purpose I would recommend using the ReaderWriterLockSlim class, which exposes TryEnterReadLock, TryEnterWriteLock and corresponding release methods.
For more detailed information on this class see this MSDN link.
Also, remember to use try, catch and finally when coding within the scope of a lock, always releasing the lock in the finally block.
What class contains this code? That matters as it's important if it's being created more than once. If it's created once in the process' lifetime, you can do this, if not you should lock a static object instance.
I believe though that it's good practice to create a separate object that's used only for the purpose of locking, I've forgotten why. E.g.:
IsolatedStorageFile storageObj; //initialized in the constructor
(static) storageObjLock = new object();
...
// in some method
lock(storageObjLock)
{
//save code
}

Categories