I'm having trouble with StreamWriter when serializing a big amount of objects in a foreach loop. Here's my code :
public bool Export(ItemToSerialize it){
try{
using(StreamWriter sw = new StreamWriter(Path.Combine(MySettings.ExportPath, randomFileName + ".xml"))){
XmlSerializer ser = new XmlSerializer(typeof(ItemToSerialize));
ser.Serialize(sw,it);
}
return true;
}
catch{throw;}
}
public bool ExportAll(){
List<ItemToSerialize> lst = RetrieveListToSerialize();
foreach(ItemToSerialize it in lst){
Export(it);
}
}
When I have a lot of data to export, it skips most of them. At first, I though that I had to flush the writer but flush / close when the export is done doesn't change anything.
What is surprising is that when I add a sleep (System.Threading.Thread.Sleep(1000)), it works. Most surprising, when I disminuish the sleep at 500ms, it keeps skipping some of them.
I suspect that it's writing faster than the writer can open / close or simply write. However, I expect the Export function not to return until the file is totally written. Is there something like a 'background' task when writing with a StreamWriter ?
Because with the provided code, I really don't understand this behaviour.
Thanks !
Related
I'm writing a program that uses text files in C#.
I use a parser class as an interface between the file structure and the program.
This class contains a StreamReader, a StreamWriter and a FileStream. I use the FileStream as a common stream for the reader and the writer, else these two will conflict when both of them have the file open.
The parser class has a class variable called m_path, this is the path to the file. I've checked it extensively, and the path is correct. OpenStreams() and and ResetStreams() work perfectly, however after calling CloseStreams() in the delete() function, the program goes to the catch clause, so File.Delete(m_path) won't get executed. In other situations the CloseStreams() function works perfectly. It goes wrong when I'm trying to close the StreamReader (m_writer), but it does give an exception (File is Already Closed).
/**
* Function to close the streams.
*/
private void closeStreams() {
if (m_streamOpen) {
m_fs.Close();
m_reader.Close();
m_writer.Close(); // Goes wrong
m_streamOpen = false;
}
}
/**
* Deletes the file.
*/
public int delete() {
try {
closeStreams(); // Catch after this
File.Delete(m_path);
return 0;
}
catch { return -1; }
}
I call the function like this:
parser.delete();
Could anybody give me some tips?
Your File.Delete(m_path); will never be called, because you get an exception here:
private void closeStreams() {
if (m_streamOpen) {
m_fs.Close();
m_reader.Close();
m_writer.Close(); // throws an exception here
m_streamOpen = false;
}
}
The exception is "Cannot access a closed file"
The cause is explained in the documentation of Close() in StreamReader:
Closes the System.IO.StreamReader object and the underlying stream, and releases any system resources associated with the reader.
There are also some articles about this behaviour:
Does disposing streamreader close the stream?
Is there any way to close a StreamWriter without closing its BaseStream?
Can you keep a StreamReader from disposing the underlying stream?
Avoiding dispose of underlying stream
You should consider re-writing your code and use using() statements.
However, I experimented a bit with your code, and it worked with calling Close() in other order:
m_writer.Close();
m_reader.Close();
m_fs.Close();
However, I assume that this works only by coincidence (I used .NET 4.0 and probably this will not work in another .NET version). I would strongly advice to not do it in this way.
I tested this:
using (FileStream fs = new FileStream(m_path, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
using (StreamReader reader = new StreamReader(fs))
using (StreamWriter writer = new StreamWriter(fs))
{
// so some work here
}
File.Delete(m_path);
But, I know that this may not be for you, since you may want the read and write streams available as fields in your class.
At least, you have some samples to start with ...
File.Delete should work, either you didn't call your delete method, or m_path is an invalid path
I'm trying to serialize an object into a string
The first problem I encountered was that the XMLSerializer.Serialize method threw an Out of memory exception, I've trying all kind of solutions and none worked so I serialized it into a file.
The file is about 300mb's (32 bit process, 8gb ram) and trying to read it with StreamReader.ReadToEnd also results in Out of memory exception.
The XML format and loading it on a string are not an option but a must.
The question is:
Any reason that a 300mb file will throw that kind of exception? 300mb is not really a large file.
Serialization code that fails on .Serialize
using (MemoryStream ms = new MemoryStream())
{
var type = obj.GetType();
if (!serializers.ContainsKey(type))
serializers.Add(type,new XmlSerializer(type));
// new XmlSerializer(obj.GetType()).Serialize(ms, obj);
serializers[type].Serialize(ms, obj);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms))
{
return sr.ReadToEnd();
}
}
Serialization and read from file that fails on ReadToEnd
var type = obj.GetType();
if (!serializers.ContainsKey(type))
serializers.Add(type,new XmlSerializer(type));
FileStream fs = new FileStream(#"c:/temp.xml", FileMode.Create);
TextWriter writer = new StreamWriter(fs, new UTF8Encoding());
serializers[type].Serialize(writer, obj);
writer.Close();
fs.Close();
using (StreamReader sr = new StreamReader(#"c:/temp.xml"))
{
return sr.ReadToEnd();
}
The object is large because its an elaborate system entire configuration object...
UPDATE:
Reading the file in chucks (8*1024 chars) will load the file into a StringBuilder but the builders fails on ToString().... starting to think there is no way which is really strange.
Yeah, if you're using 32-bit, trying to load 300MB in one chunk is going to be awkward, especially when using approaches that don't know the final size (number of characters, not bytes) in advance, thus have to keep doubling an internal buffer. And that is just when processing the string! It then needs to rip that into a DOM, which can often take several times as much space as the underlying data. And finally, you need to deserialize it into the actual objects, usually taking about the same again.
So - indeed, trying to do this in 32-bit will be tough.
The first thing to try is: don't use ReadToEnd - just use XmlReader.Create with either the file path or the FileStream, and let XmlReader worry about how to load the data. Don't load the contents for it.
After that... the next thing to do is: don't limit it to 32-bit.
Well, you could try enabling the 3GB switch, but... moving to 64-bit would be preferable.
Aside: xml is not a good choice for large volumes of data.
Exploring the source code for StreamReader.ReadToEnd reveals that it internally makes use of the StringBuilder.Append method:
public override String ReadToEnd()
{
if (stream == null)
__Error.ReaderClosed();
#if FEATURE_ASYNC_IO
CheckAsyncTaskInProgress();
#endif
// Call ReadBuffer, then pull data out of charBuffer.
StringBuilder sb = new StringBuilder(charLen - charPos);
do {
sb.Append(charBuffer, charPos, charLen - charPos);
charPos = charLen; // Note we consumed these characters
ReadBuffer();
} while (charLen > 0);
return sb.ToString();
}
which most probably throws this exception that leads to the this question/answer: interesting OutOfMemoryException with StringBuilder
al
Some help from experts. I'm trying to use the function below to print strings to a file. When I use Console.Write() or Console.WriteLine() the output file grows up 3MB or 4MB per seconds, but when I try to use StreamWriter or File.AppendAllText the output in the way shown below, the file grows up only in 20KB or 30KB per second.
Why the print speed decreases too much when I use StreamWriter instead of Console.WriteLine()?
What method should I use to write to a file maintaining the same speed of Console.WriteLine()?
public static void PrintFunction()
{
//using (StreamWriter writer = File.AppendText(#"C:\OuputFile.txt"))
using (StreamWriter writer = new StreamWriter(#"C:\OuputFile.txt", true))
{
//Console.Write("This is "); // Print speed is about 3MB-4MB per second
writer.Write("This is "); //Print decreases to 20KB-30KB per second
//File.AppendAllText(#"C:\OuputFile.txt", "This is "); Print decreases to 20KB-30KB per second
// SOME CODE
// SOME CODE
//Console.WriteLine("the first line"); // Print speed is about 3MB-4MB per second
writer.WriteLine("the first line"); // Print decreases to 20KB-30KB per second
//File.AppendAllText(#"C:\OuputFile.txt", "the first line"); // Print decreases to 20KB-30KB per second
}
}
Update:
When I say I'm using Console.WriteLine() I mean, I'm using Console.WriteLine() inside the code but to save those prints in a file, I'm redirecting the output like this:
MyProgram.exe inputfile > outputfile.txt
I know the difference of memory and hard disk, but why when I use Console.WriteLine() redirecting the output as mentioned above (is printing to hard disk), the printing is more than 1000 times faster than using StreamWriter?
I've tried increasing the buffer size like below, but the speed of printing doesn't growp up.
using (StreamWriter writer = new StreamWriter(#"C:\OuputFile.txt", true, Encoding.UTF8, 65536))
Update 2:
Hello to all, Thanks for all the help, you were rigth!!!. Following all your suggestions and examples I defined StreamWriter outside
the PrintFunction and this time the writer process is called only once and the output file remains open till the end and in this way
the printing process speed is the same as Console.WrileLine().
I've passed the writer as argument of the function like below and it works. I've tested with a buffer size of 4KB, 64KB and with
default values like shown below and the faster result was when I set explicitely used buffer of 4096 bytes. The function was called
a little bit more than 10 million times and output file was 670 MB.
*StreamWriter(#"C:\OuputFile.txt", true, Encoding.UTF8, 4096) --> 660845.1181 ms --> 11.0140853 min
StreamWriter(#"C:\OuputFile.txt", true, Encoding.UTF8, 65536) --> 675755.0119 ms --> 11.2625835 min
StreamWriter(#"C:\OuputFile.txt") --> 712830.3706 ms --> 11.8805061 min*
Thanks again for the help.
Regards
The code looks like this:
public static void ProcessFunction()
{
StreamWriter writer = new StreamWriter(#"C:\OuputFile.txt", true, Encoding.UTF8, 4096);
while ( condition)
{
PrintFunction(writer);
}
if( writer != null )
{
writer.Dispose();
writer.Close();
}
}
public static void PrintFunction(StreamWriter writer)
{
//SOME CODE
writer.Write("Some string...");
//SOME CODE
}
I profiled this and it looks like it is completely the opposite. I was able to get about .25GB/s written to a standard 10K rpm drive (no SSD). It looks like you're calling this function a lot and writing to the file by connecting to it new each time. Try something like this (I snipped this together quickly from a piece of old console logging code, so it might be a bit buggy, and error handling is certainly not complete):
public static class LogWriter
{
// we keep a static reference to the StreamWriter so the stream stays open
// this could be closed when not needed, but each open() takes resources
private static StreamWriter writer = null;
private static string LogFilePath = null;
public static void Init(string FilePath)
{
LogFilePath = FilePath;
}
public static void WriteLine(string LogText)
{
// create a writer if one does not exist
if(writer==null)
{
writer = new StreamWriter(File.Open(LogFilePath,FileMode.OpenOrCreate,FileAccess.Write,FileShare.ReadWrite));
}
try
{
// do the actual work
writer.WriteLine(LogText);
}
catch (Exception ex)
{
// very simplified exception logic... Might want to expand this
if(writer!=null)
{
writer.Dispose();
}
}
}
// Make sure you call this before you end
public static void Close()
{
if(writer!=null)
{
writer.Dispose();
writer = null;
}
}
}
Why the print speed decreases too much when I use StreamWriter instead of Console.WriteLine()?
When you redirect the command output to a file, the write-only access of the output file was acquired by cmd.exe at once whatever how many times you call PrintFunction() with console.Write()
if you use stream writer in the PrintFunction(), the writer was initialized each time, try to access the file, write a line then release the file handle. The overhead kills the performance.
What method should I use to write to a file maintaining the same speed of Console.WriteLine()?
You can try one of the following
Buffer all your output in memory (e.g. using StringBuilder), then write to the file at once
Passing the StreamWriter object to the PrintFunction() to avoid the overhead. Proper handle the StreamWriter.Close() at the end
I've got a list of 369 different names and I want to print these names into a csv file. All's going well until I take a look at the outputted csv file and it only has 251 rows. I've tried outputting to a .txt instead, and still it only outputs 251 rows. Ive stepped through with the debugger and it is still calling writer.WriteLine() 369 times.
Is there some sort of writing restriction in place? If so, why 251? How do I write all 369 names?
Here's my code just in case:
List<String> names = new List<String>();
//Retrieve names from a separate source.
var writer = new StreamWriter(File.OpenWrite(#"C:names.txt"));
for (int i = 0; i < names.Count; i++ )
{
System.Console.WriteLine(names[i].ToString());
writer.WriteLine(names[i].ToString());
}
System.Console.Write(names.Count);
The output on the console shows all 369 names and the names.Count prints 369.
You need to close your StreamWriter, the best way is to use a using block like so:
using(StreamWriter writer = new StreamWriter(File.OpenWrite("C:\\names.txt")) {
// code here
}
The using block will always call the .Dispose method of StreamWriter which has the effect of flushing the stream. Presently you have buffered-but-unwritten data in your StreamWriter instance.
You do not show anywhere that you properly close writer. If your program terminates abnormally, the writer would never be flushed to disk.
Try making use of a using block.
// NOTE: The is should be C:\names.txt. The posted code is missing a \
using (var writer = new StreamWriter(File.OpenWrite(#"C:names.txt")))
{
// Your code here
}
You have to flush buffer after last write. Put writer inside using statement.
Dispose method of writer flushes buffer. You can also call writer.Flush(). But since you still have to make sure that writer is disposed just put it in a using statement as other suggested.
List<String> names = new List<String>();
var sb = new StringBuilder()
//Retrieve names from a separate source.
for (int i = 0; i < names.Count; i++ )
{
System.Console.WriteLine(names[i].ToString());
sb.WriteLine(names[i].ToString());
}
using (var writer = new StreamWriter(File.OpenWrite(#"C:\names.txt")))
{
writer.WriteLine(sb.ToString());
}
I've got written a service that has a separate thread running that reads roughly 400 records from a database and serializes them into xml files. It runs fine, there are no errors and it reports all files have been exported correctly, yet only a handful of xml files appear afterwards, and its always a different number each time. I've checked to see if it's a certain record causing problems, but they all read out fine, and seem to write fin, but don't...
After playing around and putting a delay in of 250ms between each write they are all exported properly, so I assume it must have something to do with writing so many files in such a quick succession, but I have no idea why, I would have thought it would report some kind of error if they didn't write properly, yet there's nothing.
Here is the code for anyone who wants to try it:
static void Main(string[] args)
{
ExportTestData();
}
public static void ExportTestData()
{
List<TestObject> testObjs = GetData();
foreach (TestObject obj in testObjs)
{
ExportObj(obj);
//Thread.Sleep(10);
}
}
public static List<TestObject> GetData()
{
List<TestObject> result = new List<TestObject>();
for (int i = 0; i < 500; i++)
{
result.Add(new TestObject()
{
Date = DateTime.Now.AddDays(-1),
AnotherDate = DateTime.Now.AddDays(-2),
AnotherAnotherDate = DateTime.Now,
DoubleOne = 1.0,
DoubleTwo = 2.0,
DoubleThree = 3.0,
Number = 345,
SomeCode = "blah",
SomeId = "wobble wobble"
});
}
return result;
}
public static void ExportObj(TestObject obj)
{
try
{
string path = Path.Combine(#"C:\temp\exports", String.Format("{0}-{1}{2}", DateTime.Now.ToString("yyyyMMdd"), String.Format("{0:HHmmssfff}", DateTime.Now), ".xml"));
SerializeTo(obj, path);
}
catch (Exception ex)
{
}
}
public static bool SerializeTo<T>(T obj, string path)
{
XmlSerializer xs = new XmlSerializer(obj.GetType());
using (TextWriter writer = new StreamWriter(path, false))
{
xs.Serialize(writer, obj);
}
return true;
}
Try commenting\uncommenting the Thread.Sleep(10) to see the problem
Does anybody have any idea why it does this? And can suggest how I can avoid this problem?
Thanks
EDIT: Solved. The time based filename wasn't unique enough and was overwriting previously written files. Should've spotted it earlier, thanks for your help
Perhaps try putting the writer in a using block for immediate disposal? Something like
XmlSerializer xs = new XmlSerializer(obj.GetType());
using(TextWriter writer = new StreamWriter(path, false))
{
xs.Serialize(writer, obj);
}
Ok I've found the problem, I was using a time based filename that I thought would be unique enough for each file, turns out in a loop that tight they're coming out with the same filenames and are over-writing each other.
If I change it to use actually unique filenames it works! Thanks for your help
Dispose the writer
public static bool SerializeTo<T>(T obj, string path)
{
XmlSerializer xs = new XmlSerializer(obj.GetType());
using(TextWriter writer = new StreamWriter(path, false)) {
xs.Serialize(writer, obj);
writer.Close();
}
return true;
}
If you're not getting any exceptions, then the using statements proposed by other answers won't help - although you should change to use them anyway. At that point, you don't need the close call any more:
XmlSerializer xs = new XmlSerializer(obj.GetType());
using(TextWriter writer = new StreamWriter(path, false))
{
xs.Serialize(writer, obj);
}
I don't think the problem lies in this code, however. I suspect it's something like the "capturing a loop variable in a lambda expression" problem which crops up so often. If you can come up with a short but complete program which demonstrates the problem, it will be a lot easier to diagnose.
I suggest you create a simple console application which tries to create (say) 5000 files serializing some simple object. See if you can get that to fail in the same way.
Multi-threading may cause that problem. The 250ms delay is an evidence of that.
Do you have multiple threads doing that?