IEnumerable emit as stream? - c#

Is there a way to take an IEnumerable<T> and emit it as a readable stream kinda like this?
private void DoTheThing(IEnumerable<Foo> foos)
{
using (var myStream = foos.EmitAsStream(f =>
{
var line = JsonConvert.SerializeObject(f);
return line;
}))
using(var streamReader = new StreamReader(myStream))
{
while (!streamReader.EndOfStream)
{
var line = streamReader.ReadLine();
Console.WriteLine(line);
}
}
}
Obviously, there are some problems with this, for example it seems to imply a StreamWriter with none specified, but the idea would be that the stream reader would just yield enumerate through the IEnumerable under the hood, apply the transform delegate, and pull the results without making any new, potentially large objects in memory.
I have some very large enumerable objects in memory, and I need to push them to external places (like Amazon S3) which accepts a stream. I really can't afford to build a MemoryStream with the collection and send that; I can spool to disk and read from disk, but I'd prefer not to if I have the option, since it seems like an extra step.
Is this possible? If it's possible, is it practical?

You can, achieve that by using TakeWhile, if I right understood your problem.
Something like :
while(..end ..)
{
var enumeration = foos.TakeWhile(..line bounds..);
var stream = StreamFromEnum(enumeration ); //custom implementation
// stream --> S3
}
I presume you have some custom definition of "line", which might be some sort of stride/slice of data from the stream.
A lazily-evaluated stream wrapper for IEnumerable is a one possible implementation of IEnumerable<T> -> Stream conversion.

The EnumerableToStream package will do exactly what you ask:
using EnumerableToStream;
using (var myStream = foos.Select(f =>
{
var line = JsonConvert.SerializeObject(f);
return line;
}).ToStream())
That ToStream() method is the one that does the magic. If you are interested in the implementation details, have a look at the source code.

Related

Out of memory exception reading "Large" file

I'm trying to serialize an object into a string
The first problem I encountered was that the XMLSerializer.Serialize method threw an Out of memory exception, I've trying all kind of solutions and none worked so I serialized it into a file.
The file is about 300mb's (32 bit process, 8gb ram) and trying to read it with StreamReader.ReadToEnd also results in Out of memory exception.
The XML format and loading it on a string are not an option but a must.
The question is:
Any reason that a 300mb file will throw that kind of exception? 300mb is not really a large file.
Serialization code that fails on .Serialize
using (MemoryStream ms = new MemoryStream())
{
var type = obj.GetType();
if (!serializers.ContainsKey(type))
serializers.Add(type,new XmlSerializer(type));
// new XmlSerializer(obj.GetType()).Serialize(ms, obj);
serializers[type].Serialize(ms, obj);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms))
{
return sr.ReadToEnd();
}
}
Serialization and read from file that fails on ReadToEnd
var type = obj.GetType();
if (!serializers.ContainsKey(type))
serializers.Add(type,new XmlSerializer(type));
FileStream fs = new FileStream(#"c:/temp.xml", FileMode.Create);
TextWriter writer = new StreamWriter(fs, new UTF8Encoding());
serializers[type].Serialize(writer, obj);
writer.Close();
fs.Close();
using (StreamReader sr = new StreamReader(#"c:/temp.xml"))
{
return sr.ReadToEnd();
}
The object is large because its an elaborate system entire configuration object...
UPDATE:
Reading the file in chucks (8*1024 chars) will load the file into a StringBuilder but the builders fails on ToString().... starting to think there is no way which is really strange.
Yeah, if you're using 32-bit, trying to load 300MB in one chunk is going to be awkward, especially when using approaches that don't know the final size (number of characters, not bytes) in advance, thus have to keep doubling an internal buffer. And that is just when processing the string! It then needs to rip that into a DOM, which can often take several times as much space as the underlying data. And finally, you need to deserialize it into the actual objects, usually taking about the same again.
So - indeed, trying to do this in 32-bit will be tough.
The first thing to try is: don't use ReadToEnd - just use XmlReader.Create with either the file path or the FileStream, and let XmlReader worry about how to load the data. Don't load the contents for it.
After that... the next thing to do is: don't limit it to 32-bit.
Well, you could try enabling the 3GB switch, but... moving to 64-bit would be preferable.
Aside: xml is not a good choice for large volumes of data.
Exploring the source code for StreamReader.ReadToEnd reveals that it internally makes use of the StringBuilder.Append method:
public override String ReadToEnd()
{
if (stream == null)
__Error.ReaderClosed();
#if FEATURE_ASYNC_IO
CheckAsyncTaskInProgress();
#endif
// Call ReadBuffer, then pull data out of charBuffer.
StringBuilder sb = new StringBuilder(charLen - charPos);
do {
sb.Append(charBuffer, charPos, charLen - charPos);
charPos = charLen; // Note we consumed these characters
ReadBuffer();
} while (charLen > 0);
return sb.ToString();
}
which most probably throws this exception that leads to the this question/answer: interesting OutOfMemoryException with StringBuilder
al

Issues using StreamReader.EndOfStream?

So I'm doing a project where I am reading in a config file. The config file is just a list of string like "D 1 1", "C 2 2", etc. Now I haven't ever done a read/write in C# so I looked it up online expecting to find some sort of rendition of C/C++ .eof(). I couldn't find one.
So what I have is...
TextReader tr = new StreamReader("/mypath");
Of all the examples online of how I found to read to the end of a file the two examples that kept occurring were
while ((line = tr.ReadLine() != null)
or
while (tr.Peek() >= 0)
I noticed that StreamReader has a bool EndOfStream but no one was suggesting it which led me to believe something was wrong with that solution. I ended up trying it like this...
while (!(tr as StreamReader).EndOfStream)
and it seems to work just fine.
So I guess my question is would I experience issues with casting a TextReader as a StreamReader and checking EndOfStream?
One obvious downside is that it makes your code StreamReader specific. Given that you can easily write the code using just TextReader, why not do so? That way if you need to use a StringReader (or something similar) for unit tests etc, there won't be any difficulties.
Personally I always use the "read a line until it's null" approach - sometimes via an extension method so that I can use
foreach (string line in reader.EnumerateLines())
{
}
EnumerateLines would then be an extension method on TextReader using an iterator block. (This means you can also use it for LINQ etc easily.)
Or you could use ReadAllLines, to simplify your code:
http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx
This way, you let .NET take care of all the EOF/EOL management, and you focus on your content.
No you wont experience any issue's. If you look at the implementation if EndToStream, you'll find that it just checks if there is still data in the buffer and if not, if it can read more data from the underlying stream:
public bool EndOfStream
{
get
{
if (this.stream == null)
{
__Error.ReaderClosed();
}
if (this.charPos < this.charLen)
{
return false;
}
int num = this.ReadBuffer();
return num == 0;
}
}
Ofcourse casting in your code like that makes it dependend on StreamReader being the actual type of your reader which isn't pretty to begin with.
Maybe read it all into a string and then parse it: StreamReader.ReadToEnd()
using (StreamReader sr = new StreamReader(path))
{
//This allows you to do one Read operation.
string contents = sr.ReadToEnd());
}
Well, StreamReader is a specialisation of TextReader, in the sense that StreamReader inherits from TextReader. So there shouldn't be a problem. :)
var arpStream = ExecuteCommandLine(cmd, arg);
arpStream.ReadLine(); // Read entries
while (!arpStream.EndOfStream)
{
var line1 = arpStream.ReadLine().Trim();
// TeststandInt.SendLogPrint(line, true);
}

How to generate a binary file? C#

I need to make a method that generates a binary (4 bytes long), receives List of integers and writes this List one by one in the file. So, I have this:
public void FrameCodesBinaryWriter(List<int> frameCodes)
{
using (FileStream fileStream = new FileStream(binaryFilePath, FileMode.Create)) // destiny file directory.
{
using (BinaryWriter binaryWriter = new BinaryWriter(fileStream))
{
for (int i = 0; i < frameCodes.Count; i++)
{
binaryWriter.Write(frameCodes[i]);
}
binaryWriter.Close();
}
}
}
is this correct? or some other solution please
As it is it should work fine. Here is a refactored version, you can pick and choose which bits of the refactoring you like.
public void WriteFrameCodesAsBinary(IEnumerable<int> frameCodes)
{
using (FileStream fileStream = new FileStream(binaryFilePath, FileMode.Create))
using (BinaryWriter binaryWriter = new BinaryWriter(fileStream))
{
foreach (int frameCode in frameCodes) {
binaryWriter.Write(frameCode);
}
}
}
I would rename the function to describe the action it will do. FrameCodesBinaryWriter sounds more like a class name to me.
If you don't need the ordering of the List<T>, it might be an idea to accept an IEnumerable<T> instead. That way you can be more flexible about what you pass in.
Some people like to stack their using statements to remove a layer of nesting (code indentation). Personally I'm not a massive fan of this, but it's a matter of personal taste and style.
Using an IEnumerable<T> forces us to use foreach, but even with List<T> it can look cleaner/make it more obvious you are iterating through the list.
As previously mentioned, if you are using using you don't need to explicitly close the binary writer - that will be done automatically when the using block is exited.
You don't need to close binaryWriter because you have a using clause anyway. binaryFilePath needs to be a field of the class, apart from that it looks ok.

How to save a human readable file

Currently i have an application that reads and writes several properties from one or two basic classes to a .txt file using the Binary Serializer.
I've opened up the .txt file in NotePad and as it's formatted for the application it's not very readable to the human eye, not for me anyway =D
I've heard of using XML but pretty much most of my searches seem to overcomplicate things.
The kind of data im trying to save is simply a collection of "Person.cs" classes,nothing more than a name and address, all private strings but with properties and marked as Serializable.
What would be the best way to actually save my data in a way that can be easily read by a person? It would also make it easier to make small changes to the application's data directly in the file instead of having to load it, change it and save it.
Edit:
I have added the current way i am saving and loading my data, my _userCollection is as it suggests and the nUser/nMember are an integer.
#region I/O Operations
public bool SaveData()
{
try
{
//Open the stream using the Data.txt file
using (Stream stream = File.Open("Data.txt", FileMode.Create))
{
//Create a new formatter
BinaryFormatter bin = new BinaryFormatter();
//Copy data in collection to the file specified earlier
bin.Serialize(stream, _userCollection);
bin.Serialize(stream, nMember);
bin.Serialize(stream, nUser);
//Close stream to release any resources used
stream.Close();
}
return true;
}
catch (IOException ex)
{
throw new ArgumentException(ex.ToString());
}
}
public bool LoadData()
{
//Check if file exsists, otherwise skip
if (File.Exists("Data.txt"))
{
try
{
using (Stream stream = File.Open("Data.txt", FileMode.Open))
{
BinaryFormatter bin = new BinaryFormatter();
//Copy data back into collection fields
_userCollection = (List<User>)bin.Deserialize(stream);
nMember = (int)bin.Deserialize(stream);
nUser = (int)bin.Deserialize(stream);
stream.Close();
//Sort data to ensure it is ordered correctly after being loaded
_userCollection.Sort();
return true;
}
}
catch (IOException ex)
{
throw new ArgumentException(ex.ToString());
}
}
else
{
//Console.WriteLine present for testing purposes
Console.WriteLine("\nLoad failed, Data.txt not found");
return false;
}
}
Replace your BinaryFormatter with XMLSerializer and run the same exact code.
The only change you need to make is the BinaryFormatter takes an empty constructor, while for the XMLSerializer you need to declare the type in the constructor:
XmlSerializer serializer = new XmlSerializer(typeof(Person));
Using XmlSerializer is not really complicated. Have a look at this MSDN page for an example: http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.aspx
You could implement your own PersonsWriter, that takes a StreamWriter as constructor argument and has a Write method that takes an IList<Person> as input to parse out a nice text representation.
For example:
public class PersonsWriter : IDisposable
{
private StreamWriter _wr;
public PersonsWriter(IList<Person> persons, StreamWriter writer)
{
this._wr = writer;
}
public void Write(IList<Persons> people) {
foreach(Person dude in people)
{
_wr.Write(#"{0} {1}\n{2}\n{3} {4}\n\n",
dude.FirstName,
dude.LastName,
dude.StreetAddress,
dude.ZipCode,
dude.City);
}
}
public void Dispose()
{
_wr.Flush();
_wr.Dispose();
}
}
YAML is another option for human readable markup that is also easy to parse. there are libraries available for c# as well as almost all other popular languages. Here's a sample of what yaml looks like:
invoice: 34843
date : 2001-01-23
bill-to: &id001
given : Chris
family : Dumars
address:
lines: |
458 Walkman Dr.
Suite #292
city : Royal Oak
state : MI
postal : 48046
Frankly, as a human, I don't find XML to be all that readable. In fact, it's not really designed to be read by humans.
If you want a human readable format, then you have to build it.
Say, you have a Person class that has a First Name, a last Name and a SSN as properties. Create your file, and have it write out 3 lines, with a description of the field in the first fifty (random number from my head) and then with character 51 have the value start being written.
This will produce a file that looks like:
First Name-------Stephen
Last Name -------Wrighton
SSN -------------XXX-XX-XXXX
Then, reading it back in, your program would know where the data begins on each line, and what each line is for (the program would know that Line 3 is the SSN value).
But remember, to truly gain human readability, you sacrifice data portability.
Try the DataContractSerializer
It serializes objects to XML and is very easy to use
Write a CSV reader writer if you want a good compromise between human and machine readable in a Windows environment
Loads into Excel too.
There's a discussion about it here:
http://knab.ws/blog/index.php?/archives/3-CSV-file-parser-and-writer-in-C-Part-1.html
EDIT
That is a C# article... it just confusingly has "C" in the URL.
I really think you should go with XML (look into DataContractSerializer). Its not that complicated. You could probably even just replace BinarySerializer with XMLSerializer and go.
If you still don't want to do that, though, you can write a delimited text file. Then you'll have to write your own reader method (although, it could almost just use the split method).
//Inside the Person class:
public override string ToString()
{
List<String> propValues = new List<String>();
// Get the type.
Type t = this.GetType();
// Cycle through the properties.
foreach (PropertyInfo p in t.GetProperties())
{
propValues.add("{0}:={1}", p.Name, p.GetValue(o, null));
}
return String.Join(",". propValues.ToArray())
}
using (System.IO.TextWriter tw = new System.IO.StreamWriter("output.txt"))
{
tw.WriteLine(person.ToString());
}

Bytes consumed by StreamReader

Is there a way to know how many bytes of a stream have been used by StreamReader?
I have a project where we need to read a file that has a text header followed by the start of the binary data. My initial attempt to read this file was something like this:
private int _dataOffset;
void ReadHeader(string path)
{
using (FileStream stream = File.OpenRead(path))
{
StreamReader textReader = new StreamReader(stream);
do
{
string line = textReader.ReadLine();
handleHeaderLine(line);
} while(line != "DATA") // Yes, they used "DATA" to mark the end of the header
_dataOffset = stream.Position;
}
}
private byte[] ReadDataFrame(string path, int frameNum)
{
using (FileStream stream = File.OpenRead(path))
{
stream.Seek(_dataOffset + frameNum * cbFrame, SeekOrigin.Begin);
byte[] data = new byte[cbFrame];
stream.Read(data, 0, cbFrame);
return data;
}
return null;
}
The problem is that when I set _dataOffset to stream.Position, I get the position that the StreamReader has read to, not the end of the header. As soon as I thought about it this made sense, but I still need to be able to know where the end of the header is and I'm not sure if there's a way to do it and still take advantage of StreamReader.
You can find out how many bytes the StreamReader has actually returned (as opposed to read from the stream) in a number of ways, none of them too straightforward I'm afraid.
Get the result of textReader.CurrentEncoding.GetByteCount(totalLengthOfAllTextRead) and then seek to this position in the stream.
Use some reflection hackery to retrieve the value of the private variable of the StreamReader object that corresponds to the current byte position within the internal buffer (different from that with the stream - usually behind, but no more than equal to of course). Judging by .NET Reflector, the this variable seems to be named bytePos.
Don't bother using a StreamReader at all but instead implement your custom ReadLine function built on top of the Stream or BinaryReader even (BinaryReader is guaranteed never to read further ahead than what you request). This custom function must read from the stream char by char, so you'd actually have to use the low-level Decoder object (unless the encoding is ASCII/ANSI, in which case things are a bit simpler due to single-byte encoding).
Option 1 is going to be the least efficient I would imagine (since you're effectively re-encoding text you just decoded), and option 3 the hardest to implement, though perhaps the most elegant. I'd probably recommend against using the ugly reflection hack (option 2), even though it's looks tempting, being the most direct solution and only taking a couple of lines. (To be quite honest, the StreamReader class really ought to expose this variable via a public property, but alas it does not.) So in the end, it's up to you, but either method 1 or 3 should do the job nicely enough...
Hope that helps.
So the data is utf8 (the default encoding for StreamReader). This is a multibyte encoding, so IndexOf would be inadvisable. You could:
Encoding.UTF8.GetByteCount(string)
on your data so far, adding 1 or 2 bytes for the missing line ending.
If you're needing to count bytes, I'd go with the BinaryReader. You can take the results and cast them about as needed, but I find its idea of its current position to be more reliable (in that since it reads in binary, its immune to character-set problems).
So your last line contains 'DATA' + an unknown amount of data bytes. You could extract the position by using IndexOf() with your last read line. Then readjust the stream.Position.
But I am not sure if you should use ReadLine() at all in this case. Maybe it would be better to read byte by byte until you reach the 'DATA' mark.
The line breaks are easily identifiable without needing to decode the stream first (except for some encodings rarely used for text files like EBCDIC, UTF-16, UTF-32), so you can just read each line as bytes and then decode the entire line:
using (FileStream stream = File.OpenRead(path)) {
List<byte> buffer = new List<byte>();
bool hasCr = false;
bool done = false;
while (!done) {
int b = stream.ReadByte();
if (b == -1) throw new IOException("End of file reached in header.");
if (b == 13) {
hasCr = true;
} else if (b == 10 && hasCr) {
string line = Encoding.UTF8.GetString(buffer.ToArray(), 0, buffer.Count);
if (line == "DATA") {
done = true;
} else {
HandleHeaderLine(line);
}
buffer.Clear();
hasCr = false;
} else {
if (hasCr) buffer.Add(13);
hasCr = false;
buffer.Add((byte)b);
}
}
_dataOffset = stream.Position;
}
Instead of closing the stream and open it again, you could of course just keep on reading the data.

Categories