Dispose IRandomAccessStream after DataPackage.SetData or DataPackage.GetDataAsync? - c#

Consider putting data onto a windows clipboard DataPackage using SetData and later retrieving it using GetDataAsync, like this:
IEnumerable<T> objects = ...;
var randomAccessStream = new InMemoryRandomAccessStream();
using (XmlDictionaryWriter xmlWriter = XmlDictionaryWriter.CreateTextWriter(randomAccessStream.AsStreamForWrite(), Encoding.Unicode)) {
var serializer = new DataContractSerializer(typeof(T), knownTypes);
foreach (T obj in objects) {
serializer.WriteObject(xmlWriter, obj);
}
}
dataPackage.SetData(formatId, randomAccessStream);
Then later on (e.g. in Clipboard.ContentsChanged),
randomAccessStream = await dataPackageView.GetDataAsync(formatId) as IRandomAccessStream;
xmlReader = XmlDictionaryReader.CreateTextReader(randomAccessStream.AsStreamForRead(), Encoding.Unicode, XmlDictionaryReaderQuotas.Max, (OnXmlDictionaryReaderClose?)null);
var serializer = new DataContractSerializer(typeof(T), knownTypes);
while (serializer.IsStartObject(xmlReader)) {
object? obj = serializer.ReadObject(xmlReader);
...
}
xmlReader.Dispose(); // in the real code, this is in a finally clause
The question I have is, when do I dispose the randomAccessStream? I've done some searching and all the examples I've seen using SetData and GetDataAsync do absolutely nothing about disposing the object that is put into or obtain from the data package.
Should I dispose it after the SetData, after the GetDataAsync, in DataPackage.OperationCompleted, in some combination of these, or none of them?
sjb
P.S. If I can squeeze in a second question here ... when I put a reference into a DataPackage using for example dataPackage.Properties.Add( "IEnumerable<T>", entities), does it create a security risk -- can other apps access the reference
and use it?

tldr
The Clipboard is designed to pass content between applications and can only pass string content or a references to files, all other content must be either serialized to string, or saved to a file, or must behave like a file, to be access across application domains via the clipboard.
There is support and guidance for passing custom data and formats via the clipboard, ultimately this involves discrete management around what is "how to prepare the content on the provider side" and "how to interpret the content on the consumer side". If you can use simple serialization for this, then KISS.
IEnumerable<Test> objectsIn = new Test[] { new Test { Name = "One" }, new Test { Name = "two" } };
var dataPackage = new DataPackage();
dataPackage.SetData("MyCustomFormat", Newtonsoft.Json.JsonConvert.SerializeObject(objectsIn));
Clipboard.SetContent(dataPackage);
...
var dataPackageView = Clipboard.GetContent();
string contentJson = (await dataPackageView.GetDataAsync("MyCustomFormat")) as string;
IEnumerable<Test> objectsOut = Newtonsoft.Json.JsonConvert.DeserializeObject<IEnumerable<Test>>(contentJson);
In WinRT the DataPackageView class implementation does support passing streams however the normal rules apply for the stream in terms of lifecycle and if the stream is disposed or not. This is useful for transferring large content or when the consumer might request the content in different formats.
If you do not have an advanced need for it, or you are not transmitting file or image based resources, then you do not need to use a stream to transfer your data.
DataPackageView - Remarks
During a share operation, the source app puts the data being shared in a DataPackage object and sends that object to the target app for processing. The DataPackage class includes a number of methods to support the following default formats: text, Rtf, Html, Bitmap, and StorageItems. It also has methods to support custom data formats. To use these formats, both the source app and target app must already be aware that the custom format exists.
OPs attempt to save a stream to the Clipboard is in this case an example of saving an arbitrary or custom object to the clipboard, it is neither a string or a pointer to a file, so the OS level does not have a native way to handle this information.
Historically, putting string data, or a file reference onto the clipboard is effectively broadcasting this information to ALL applications on the same running OS, however Windows 10 extends this by making your clipboard content able to be synchronised across devices as well. The DataTransfer namespace implementation allows you to affect the scope of this availability, but ultimately this feature is designed to allow you to push data outside of your current application sandboxed domain.
So whether you choose serialize the content yourself, or you want the DataTransfer implementation to try and do it for you, the content will be serialized if it is not already a string or file reference format, and that serialized content, if it succeeds, is what will be made available to consumers.
In this way there is no memory leak or security issue where you might inadvertently provide external processes access to your current process memory or execution context, but data security is still a concern, so don't use the clipboard to pass sensitive content.
A simpler example for Arbitrary or Custom data
OPs example is to put an IEnumerable<T> collection of objects onto the clipboard, and to retrieve them later. OP is choosing to use XML serialization via the DataContractSerializer however a reference to the stream used by the serializer was saved to the clipboard, and not the actual content.
There is a lot of plumbing and first principals logic going on that for little benefit, streams are useful if you are going to stream the content, so if you are going to allow the consumer to control the stream but if you were going to write to the stream in a single synchronous process, then it is better to close off the stream altogether and pass around the buffer that you filled via your stream, we don't even try to re-use the same stream at a later point in time.
The following solution works for Clipboard access in WinRT to pre-serialize a collection of objects and pass them to a consumer:
IEnumerable<Test> objectsIn = new Test[] { new Test { Name = "One" }, new Test { Name = "two" } };
var dataPackage = new DataPackage();
string formatId = "MyCustomFormat";
var serial = Newtonsoft.Json.JsonConvert.SerializeObject(objectsIn);
dataPackage.SetData(formatId, serial);
Clipboard.SetContent(dataPackage);
Then in perhaps an entirely different application:
string formatId = "MyCustomFormat";
var dataPackageView = Clipboard.GetContent();
object content = await dataPackageView.GetDataAsync(formatId);
string contentString = content as string;
var objectsOut = Newtonsoft.Json.JsonConvert.DeserializeObject<IEnumerable<Test>>(contentString);
foreach (var o in objectsOut)
{
Console.WriteLine(o);
}
The definition of Test, in both the provider and the consumer application contexts:
public class Test
{
public string Name { get; set; }
}

when do I dispose the randomAccessStream?
Only Dispose the stream when you have finished using it, when you have Diposed the stream it will be no longer usable in any other contexts, even if you have stored or passed multiple references to it in other object instances.
If you are talking about the original stream referenced in the SetData() logic then look at this from the other angle, If you dispose too early, the consuming code will no longer have access to the stream and will fail.
As a general rule we should try to design the logic such that at any given point in time there is a clear and single Owner for any given stream, in that way it should be clear who has responsibility for Disposing the stream. This response to a slightly different scenario explains it well, https://stackoverflow.com/a/8791525/1690217 however as a general pattern only the scope that created the stream should be responsible for Disposing it.
The one exception to that is that if you need to access the stream outside of the creating method, then the parent class should hold a reference to it, in that scenario you should make the parent class implement IDisposable and make sure it cleans up any resources that might be hanging around.
The reason that you don't see this in documentation is often that the nuances around the timing for calling Dispose() are out of scope or will get lost in examples that are contrived for other purposes.
Specifically for examples where streams are passed via any mechanism and later used, as with DataPackage, it is too hard to show all of the orchestration code to cover the time in between storing the stream with DataPackage.SetData(...) and later accessing the stream via DataPackage.GetDataAsync(...)
Also consider the most common scenario for DataPackage where the consumer is not only in a different logical scope, but most likely in an entirely different application domain, to include all the code to cover when or if to call dispose should encompass the entire code base for 2 different applications.

Related

ParquetWriter not sending all information to blob storage

public async Task UploadParquetFromObjects<T>(string fileName, T objects)
{
var stringJson = JArray.FromObject(objects).ToString();
var parsedJson = ChoJSONReader.LoadText(stringJson);
var desBlob = blobClient.GetBlockBlobClient(fileName);
using (var outStream = await desBlob.OpenWriteAsync(true).ConfigureAwait(false))
using (ChoParquetWriter parser = new ChoParquetWriter(outStream))
{
parser.Write(parsedJson);
}
}
I'm using this code to send some data to a file on an Azure Blob Storage. At first, it worked fine, it created the file, put some information on it and it was readable, but with some investigation, it only write a fraction of the data I send. For example, I send a list of 15 items and it only writes 3. I tried different datasets, with different sizes and composed of different objects, the writer varies on the number of registers written, but it never gets to 100%.
Am I doing something wrong?
This issue is being tracked and addressed in GitHub issues section.
https://github.com/Cinchoo/ChoETL/issues/230
The issue was the input JSON has inconsistent members, hence missing datetime members are set as null by JSON reader. Parquet writer couldn't handle such null datetime values. Applied fix.
Sample fiddle: https://dotnetfiddle.net/PwxNWX
Packages used:
ChoETL.JSON.Core v1.2.1.49 (beta2)
ChoETL.Parquet v1.0.1.23 (beta6)

Filename vs. FileStream - what should I use as argument for file writing?

I am creating a class to handle file-writing to a custom file format our company uses.
Since the data-acquisition workflow that generates the file content is comprised of multiple steps (and as such of multiple "bursts" of data-recording alternated with variable-duration pauses so that the user can reconfigure the sensors during the same recording session), I wonder how to handle opening and closing of FileStream along the whole procedure.
I am in doubt about the following options:
====
// 1) WriterClass receives path as argument and handles FileStream internally
var writer = new WriterClass(filepath);
var datasource = new datasource();
writer.configure(configInfo);
writer.start();
writer.record();
// some time passes while data is received and saved
writer.pause();
writer.resumeRecording();
writer.stop();
writer.finish();
====
// 2) Client code handles FileStream itself, and passes it to WriterClass as argument
using (var stream = File.OpenWrite(filepath)
{
var writer = new WriterClass(stream);
var datasource = new datasource();
writer.configure(configInfo);
writer.start();
writer.record();
// some time passes while data is received and saved
writer.pause();
writer.resumeRecording();
writer.stop();
writer.finish();
}
====
Currently option 2 seems a lot easier to me, since the filestream is guaranteed to remain open during the whole Writer lifecycle (which makes sense to the application intent, by the way), but is there anything that I should take into account? Is there a problem in keeping a FileStream open for a potentially indefinite amount of time (due to pause and resumeRecording arbitrary nature)?
On the other hand, option 1 leaves to the WriterClass itself to handle stream opening and closing, but I tried to implement it and it raised a lot of seemingly unnecessary complication, since I had to handle opening and closing of the stream in almost every method. Worst yet would be to represent a stateful stream as a property of WriterClass (I didn't even try).
Finally, it would be healthy to consider what would happen (file corruption vs file recoverability) with a partially written file if the application crashes before the acquisition procedure is complete.

Overriding WebHostBufferPolicySelector for Non-Buffered File Upload

In an attempt to create a non-buffered file upload I have extended System.Web.Http.WebHost.WebHostBufferPolicySelector, overriding function UseBufferedInputStream() as described in this article: http://www.strathweb.com/2012/09/dealing-with-large-files-in-asp-net-web-api/. When a file is POSTed to my controller, I can see in trace output that the overridden function UseBufferedInputStream() is definitely returning FALSE as expected. However, using diagnostic tools I can see the memory growing as the file is being uploaded.
The heavy memory usage appears to be occurring in my custom MediaTypeFormatter (something like the FileMediaFormatter here: http://lonetechie.com/). It is in this formatter that I would like to incrementally write the incoming file to disk, but I also need to parse json and do some other operations with the Content-Type:multipart/form-data upload. Therefore I'm using HttpContent method ReadAsMultiPartAsync(), which appears to be the source of the memory growth. I have placed trace output before/after the "await", and it appears that while the task is blocking the memory usage is increasing fairly rapidly.
Once I find the file content in the parts returned by ReadAsMultiPartAsync(), I am using Stream.CopyTo() in order to write the file contents to disk. This writes to disk as expected, but unfortunately the source file is already in memory by this point.
Does anyone have any thoughts about what might be going wrong? It seems that ReadAsMultiPartAsync() is buffering the whole post data; if that is true why do we require var fileStream = await fileContent.ReadAsStreamAsync() to get the file contents? Is there another way to accomplish the splitting of the parts without reading them into memory? The code in my MediaTypeFormatter looks something like this:
// save the stream so we can seek/read again later
Stream stream = await content.ReadAsStreamAsync();
var parts = await content.ReadAsMultipartAsync(); // <- memory usage grows rapidly
if (!content.IsMimeMultipartContent())
{
throw new HttpResponseException(HttpStatusCode.UnsupportedMediaType);
}
//
// pull data out of parts.Contents, process json, etc.
//
// find the file data in the multipart contents
var fileContent = parts.Contents.FirstOrDefault(
x => x.Headers.ContentDisposition.DispositionType.ToLower().Trim() == "form-data" &&
x.Headers.ContentDisposition.Name.ToLower().Trim() == "\"" + DATA_CONTENT_DISPOSITION_NAME_FILE_CONTENTS + "\"");
// write the file to disk
using (var fileStream = await fileContent.ReadAsStreamAsync())
{
using (FileStream toDisk = File.OpenWrite("myUploadedFile.bin"))
{
((Stream)fileStream).CopyTo(toDisk);
}
}
WebHostBufferPolicySelector only specifies if the underlying request is bufferless. This is what Web API will do under the hood:
IHostBufferPolicySelector policySelector = _bufferPolicySelector.Value;
bool isInputBuffered = policySelector == null ? true : policySelector.UseBufferedInputStream(httpContextBase);
Stream inputStream = isInputBuffered
? requestBase.InputStream
: httpContextBase.ApplicationInstance.Request.GetBufferlessInputStream();
So if your implementation returns false, then the request is bufferless.
However, ReadAsMultipartAsync() loads everything into MemoryStream - because if you don't specify a provider, it defaults to MultipartMemoryStreamProvider.
To get the files to save automatically to disk as every part is processed use MultipartFormDataStreamProvider (if you deal with files and form data) or MultipartFileStreamProvider (if you deal with just files).
There is an example on asp.net or here. In these examples everything happens in controllers, but there is no reason why you wouldn't use it in i.e. a formatter.
Another option, if you really want to play with streams is to implement a custom class inheritng from MultipartStreamProvider that would fire whatever processing you want as soon as it grabs part of the stream. The usage would be similar to the aforementioned providers - you'd need to pass it to the ReadAsMultipartAsync(provider) method.
Finally - if you are feeling suicidal - since the underlying request stream is bufferless theoretically you could use something like this in your controller or formatter:
Stream stream = HttpContext.Current.Request.GetBufferlessInputStream();
byte[] b = new byte[32*1024];
while ((n = stream.Read(b, 0, b.Length)) > 0)
{
//do stuff with stream bit
}
But of course that's very, for the lack of better word, "ghetto."

Issues with StreamReader, ThreadSafety and Read Mode

I have following code to read a file
StreamReader str = new StreamReader(File.Open(fileName, FileMode.Open, FileAccess.Read));
string fichier = str.ReadToEnd();
str.Close();
This is part of a asp.net webservice and has been working fine for an year now in production. Now with increasing load on server, customer has started getting "File already in use" error. That file is being read from this code and is never written to from application.
One problem that I clearly see is that we are not caching the contents of file for future use. We will do that. But I need to understand why and how we are getting this issue.
Is it because of multiple threads trying to read the file? I read that StreamReader is not thread safe but why should it be a problem when I am opening file in Read mode?
You need to open the file with read access allowed. Use this overload of File.Open to specify a file sharing mode. You can use FileShare.Read to allow read access to this file.
Anothr possible solution is to load this file once into memory in a static constructor of a class and then store the contents in a static read-only variable. Since a static constructor is guaranteed to run only once and is thread-safe, you don't have to do anything special to make it work.
If you never change the contents in memory, you won't even need to lock when you access the data. If you do change the contents, you need to first clone this data every time when you're about to change it but then again, you don't need a lock for the clone operation since your actual original data never changes.
For example:
public static class FileData
{
private static readonly string s_sFileData;
static FileData ()
{
s_sFileData = ...; // read file data here using your code
}
public static string Contents
{
get
{
return ( string.Copy ( s_sFileData ) );
}
}
}
This encapsulates your data and gives you read-only access to it.
You only need String.Copy() if your code may modify the file contents - this is just a precaution to force creating a new string instance to protect the original string. Since string is immutable, this is only necessary if your code uses string pointers - I only added this bit because I ran into an issue with a similar variable in my own code just last week where I used pointers to cached data. :)
FileMode just controls what you can do (read/write).
Shared access to files is handled at the operating system level, and you can request behaviors with FileShare (3rd param), see doc

ActiveMQ - deserialize an ActiveMQBytesMessage message

In my job, I work with an application developped partly in c++ and C#. The C++ code is responsible to manage activeMQ (send, receive message).
I've developped an application to monitor the messages sent in the topic by subscribing myself with my C# application.
So when a message is sent to a topic, my application manage to handle the message, but the message is serialized in ActiveMQBytesMessage.
How can I deserialize this object ?
public void OnMessage(IMessage message)
{
if (message != null)
{
var content = (message as ActiveMQBytesMessage).Content; // This is a byte[], I tried to deserialize using BinaryFormatter but it throws an exception, I can't write it here because I'm at home.
}
}
I just noticed that ActiveMQBytesMessage inherits IBytesMessage from namespace Apache.NMS, but I see nothing which helps me to deserialize the message.
I use the last version of ActiveMQ with NMS
[NB] The goal of my C# application is to simply monitor what's happening inside an ActiveMQ channel. That's why I need to deserialize the ActiveMQBytesMessage so I can display the name of the object and its content in a gridview.
[Added more information]
Here's what i tried to deserialize.
var memoryStream = new MemoryStream((message as ActiveMQBytesMessage).Content);
var binaryFormatter = new BinaryFormatter();
memoryStream.Position = 0;
var deserializedMessage = binaryFormatter.Deserialize(memoryStream);
And I get this error when it deserializes:
The input stream is not a valid binary format. The starting contents (in bytes) are: 00-00-00-00-00-00-4F-8C-00-00-00-09-00-00-00-00-54 ...
(I am making a few assumptions here, since you didn't specify certain details.) The BinaryFormatter you are attempting to use will only work for .NET objects, not for C++ objects. Most likely, these objects have not been encoded in a platform neutral way, and are in a C++ format specific to that particular compiler and platform. Therefore, it is up to you to parse the binary code directly to determine what object is encoded, and then to manually decode the data. If these are non-trivial objects, this will be a difficult task. If at all possible, try to get the original application to encode the objects into a platform neutral format that can be easily parsed and instantiated in C#. (I prefer using a TextMessage and XML encoding.) It won't be as efficient as the direct C++ to C++ encoding/decoding that is apparently going on right now, but it will allow external monitoring of the message stream. When I do this, I put the full typename (including namespace) of the object in the NMSType header property. This then tells me the internal structure of message content, and I can instantiate the correct object for parsing the data out of the message.
If all of that doesn't help, or the assumption is wrong and you are using Managed C++, perhaps this question/answer will help you: What serialization method is used for an ActiveMQ NMS C# object message?

Categories