I'm using a StreamWriter to write data to a file. The data is a (potentially) long string that should be saved in full.
I've seen cases where the file is created, but its contents are not the entire string; that is - the string appears to have been "cut" during saving, and not saved in its entirety.
The "buggy" file i have contains exactly 4096 characters, which is exactly the length of the internal buffer used by the StreamWriter class.
Example, similar to the code we're using:
string output = "......" // long string
sw = File.CreateText(filename);
if (fileWriter == null)
{
return;
}
try
{
sw.Write (output);
}
finally
{
if (sw != null)
{
sw.Close ();
}
}
My question is:
is this an expected scenario?
e.g: can StreamWriter write only part of the string it should save? if so, is there any simple way to overcome this ?
Related
I am trying to modify a file-stream inline as the file has the potential to be very large and I don't want to load it into memory. The piece of information I'm editing will always be the same length so in theory I can just swap the content out using a stream reader but it doesn't seem to be writing to the correct place
I have created a section of code that using a stream reader will read line by line until it finds a regex match and will then attempt to swap the bytes out with the edited line. The code is as follows:
private void UpdateFile(string newValue, string path, string pattern)
{
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
int index = 0;
string line = "";
using (var fileStream = File.OpenRead(path))
using (var streamReader = new StreamReader(fileStream, Encoding.Default, true, 128))
{
while ((line = streamReader.ReadLine()) != null)
{
if (regex.Match(line).Success)
{
break;
}
index += Encoding.Default.GetBytes(line).Length;
}
}
if (line != null)
{
using (Stream stream = File.Open(path, FileMode.Open))
{
stream.Position = index + 1;
var newLine = regex.Replace(line, newValue);
var oldBytes = Encoding.Default.GetBytes(line);
var newBytes = Encoding.Default.GetBytes("\n" + newLine);
stream.Write(newBytes, 0, newBytes.Length);
}
}
}
The code almost works as expected, it inserts the updated line but it always does it a little early, just how early varies slightly based on the file I'm editing. I expect it is something to do with the way I am managing the stream position but I don't know the correct way to approach this.
Unfortunately the exact files I'm working on are under NDA.
The structure is as follows though:
A file will have an unkown amount of data followed by a line of a known format, for example:
Description: ABCDEF
I know the portion that follows "Description: " will always be 6 characters, so I do a replace on the line to replace with, for example, UVWXYZ.
The problem is that for example if a file read as
'...
UNIMPORTANT UNKNOWN DATA
DESCRIPTION: ABCDEF
MORE DATA
...'
it will come out as something like
'...
UNIMPORTANT UNKNOWN DDESCRIPTION: UVWXYZDEF
MORE DATA
...'
I think the problem here is that you are not considering the line feed ("\n") for each line you are getting and therefore your index is incorrectly setting the position of your stream. Try the following code:
private void UpdateFile(string newValue, string path, string pattern)
{
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
int index = 0;
string line = "";
using (var fileStream = File.OpenRead(path))
using (var streamReader = new StreamReader(fileStream, Encoding.Default, true, 128))
{
while ((line = streamReader.ReadLine()) != null)
{
if (regex.Match(line).Success)
{
break;
}
index += Encoding.ASCII.GetBytes(line + "\n").Length;
}
}
if (line != null)
{
using (Stream stream = File.Open(path, FileMode.Open))
{
stream.Position = index;
var newBytes = Encoding.Default.GetBytes(regex.Replace(line + "\n", newValue));
stream.Write(newBytes, 0, newBytes.Length);
}
}
}
In your example, you are "off" by 4 Characters. Not quite the common "off by one error", but close. But maybe a different pattern would help the most?
Programms nowadays rarely work "on the file" like that. There is just too much to go wrong, all the way to a power loss mid-process. Instead they:
create a empty new file at the same location. Often temporary named and hidden.
write the output to the new file
Once you are done and eveyrthing is good - all the caches are flushed and everything is on the disk (done by Stream.Close() or Dispose()) - just replace the old file with the new file using the OS move operation.
The advantage is that it is impossible to have data-loss. Even if the computer looses power mid-operation, at tops the temporary file is messed up. You still got the orignal file and yoou can just delte the temporary file and restart the work from scratch if you need too. Indeed recovery only makes sense in rare cases (Word Processors)
The replacement of old file by new file is done with a move order. If they are on the same partition, that is literally just a rename operation in the Filesytem. And as modern FS are basically designed like a topline, robust relational Databases there is no danger in this.
You can find that pattern in everything from your Word Porcessor of choice, to backup programms, the download manager of Firefox (as you might be overriding a file that was there befroe) and even zipping programms. Everytime you got a long writing phase and want to minimize the danger, it is to go to pattern.
And as you can work entirely in memory without having to deal with moving around the read/write head, it will get around your issue too.
Edit: I made some source code for it from memory/documentation. Might contain syntax errors
string sourcepath; //containts the source file path, set by other code
string temppath; //containts teh path of the tempfile. Should be in the same folder, and thus same partiion
//Open both Streams, can use a single using for this
//The supression of any Buffering on the output should be optional and will be detrimental to performance
using(var sourceStream = File.OpenRead(sourcepath),
outStream = File.Create(temppath, 0, FileOptions.WriteThrough )){
string line = "";
//itterte over the input
while((line = streamReader.ReadLine()) != null){
//do processing on line here
outStream.Write(line);
}
}
//replace the files. Pretty sure it will just overwrite without asking
File.Move(temppath, sourcepath);
Hello
I've been working on terminal-like application to get better at programming in c#, just something to help me learn. I've decided to add a feature that will copy a file exactly as it is, to a new file... It seems to work almost perfect. When opened in Notepad++ the file are only a few lines apart in length, and very, very, close to the same as far as actual file size goes. However, the duplicated copy of the file never runs. It says the file is corrupt. I have a feeling it's within the methods for reading and rewriting binary to files that I created. The code is as follows, thank for the help. Sorry for the spaghetti code too, I get a bit sloppy when I'm messing around with new ideas.
Class that handles the file copying/writing
using System;
using System.IO;
//using System.Collections.Generic;
namespace ConsoleFileExplorer
{
class FileTransfer
{
private BinaryWriter writer;
private BinaryReader reader;
private FileStream fsc; // file to be duplicated
private FileStream fsn; // new location of file
int[] fileData;
private string _file;
public FileTransfer(String file)
{
_file = file;
fsc = new FileStream(file, FileMode.Open);
reader = new BinaryReader(fsc);
}
// Reads all the original files data to an array of bytes
public byte[] ReadAllDataToArray()
{
byte[] bytes = reader.ReadBytes((int)fsc.Length); // reading bytes from the original file
return bytes;
}
// writes the array of original byte data to a new file
public void WriteDataFromArray(byte[] fileData, string path) // got a feeling this is the problem :p
{
fsn = new FileStream(path, FileMode.Create);
writer = new BinaryWriter(fsn);
int i = 0;
while(i < fileData.Length)
{
writer.Write(fileData[i]);
i++;
}
}
}
}
Code that interacts with this class .
(Sleep(5000) is because I was expecting an error on first attempt...
case '3':
Console.Write("Enter source file: ");
string sourceFile = Console.ReadLine();
if (sourceFile == "")
{
Console.Clear();
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.Error.WriteLine("Must input a proper file path.\n");
Console.ForegroundColor = ConsoleColor.White;
Menu();
} else {
Console.WriteLine("Copying Data"); System.Threading.Thread.Sleep(5000);
FileTransfer trans = new FileTransfer(sourceFile);
//copying the original files data
byte[] data = trans.ReadAllDataToArray();
Console.Write("Enter Location to store data: ");
string newPath = Console.ReadLine();
// Just for me to make sure it doesnt exit if i forget
if(newPath == "")
{
Console.Clear();
Console.ForegroundColor = ConsoleColor.DarkRed;
Console.Error.WriteLine("Cannot have empty path.");
Console.ForegroundColor = ConsoleColor.White;
Menu();
} else
{
Console.WriteLine("Writing data to file"); System.Threading.Thread.Sleep(5000);
trans.WriteDataFromArray(data, newPath);
Console.WriteLine("File stored.");
Console.ReadLine();
Console.Clear();
Menu();
}
}
break;
File compared to new file
right-click -> open in new tab is probably a good idea
Original File
New File
You're not properly disposing the file streams and the binary writer. Both tend to buffer data (which is a good thing, especially when you're writing one byte at a time). Use using, and your problem should disappear. Unless somebody is editing the file while you're reading it, of course.
BinaryReader and BinaryWriter do not just write "raw data". They also add metadata as needed - they're designed for serialization and deserialization, rather than reading and writing bytes. Now, in the particular case of using ReadBytes and Write(byte[]) in particular, those are really just raw bytes; but there's not much point to use these classes just for that. Reading and writing bytes is the thing every Stream gives you - and that includes FileStreams. There's no reason to use BinaryReader/BinaryWriter here whatsover - the file streams give you everything you need.
A better approach would be to simply use
using (var fsn = ...)
{
fsn.Write(fileData, 0, fileData.Length);
}
or even just
File.WriteAllBytes(fileName, fileData);
Maybe you're thinking that writing a byte at a time is closer to "the metal", but that simply isn't the case. At no point during this does the CPU pass a byte at a time to the hard drive. Instead, the hard drive copies data directly from RAM, with no intervention from the CPU. And most hard drives still can't write (or read) arbitrary amounts of data from the physical media - instead, you're reading and writing whole sectors. If the system really did write a byte at a time, you'd just keep rewriting the same sector over and over again, just to write one more byte.
An even better approach would be to use the fact that you've got file streams open, and stream the files from source to destination rather than first reading everything into memory, and then writing it back to disk.
There is an File.Copy() Method in C#, you can see it here https://msdn.microsoft.com/ru-ru/library/c6cfw35a(v=vs.110).aspx
If you want to realize it by yourself, try to place a breakpoint inside your methods and use a debug. It is like a story about fisher and god, who gived a rod to fisher - to got a fish, not the exactly fish.
Also, look at you int[] fileData and byte[] fileData inside last method, maybe this is problem.
I am able to do read/write/append operation on text file storing in isolated storage in WP7 application.
My scenario is that I am storing space seperated values in text file inside isolated storage.
So if I have to find for some particular line having some starting key then how to overwrite
value for that key without affecting the other line before and after it.
Example:
Key Value SomeOtherValue
*status read good
status1 unread bad
status2 null cantsay*
So if I have to change the whole second line based on some condition with key as same
status1 read good
How can I achieve this?
There are a number of ways you could do this, and the method you choose should be best suited to the size and complexity of the data file.
One option to get you started is to use the static string.Replace() method. This is crude, but if your file is only small then there is nothing wrong with it.
class Program
{
static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine("*status read good");
sb.AppendLine("status1 unread bad");
sb.AppendLine("status2 null cantsay*");
string input = sb.ToString();
var startPos = input.IndexOf("status1");
var endPos = input.IndexOf(Environment.NewLine, startPos);
var modifiedInput = input.Replace(oneLine.Substring(startPos, endPos - startPos), "status1 read good");
Console.WriteLine(modifiedInput);
Console.ReadKey();
}
}
If you store this information in text files then there won't be a way around replacing whole files. The following code does exactly this and might even be what you are doing right now.
// replace a given line in a given text file with a given replacement line
private void ReplaceLine(string fileName, int lineNrToBeReplaced, string newLine)
{
using (IsolatedStorageFile isf = IsolatedStorageFile.GetUserStoreForApplication())
{
// the memory writer will hold the read and modified lines
using (StreamWriter memWriter = new StreamWriter(new MemoryStream()))
{
// this is for reading lines from the source file
using (StreamReader fileReader = new StreamReader(new IsolatedStorageFileStream(fileName, System.IO.FileMode.Open, isf)))
{
int lineCount = 0;
// iterate file and read lines
while (!fileReader.EndOfStream)
{
string line = fileReader.ReadLine();
// check if this is the line which should be replaced; check is done by line
// number but could also be based on content
if (lineCount++ != lineNrToBeReplaced)
{
// just copy line from file
memWriter.WriteLine(line);
}
else
{
// replace line from file
memWriter.WriteLine(newLine);
}
}
}
memWriter.Flush();
memWriter.BaseStream.Position = 0;
// re-create file and save all lines from memory to this file
using (IsolatedStorageFileStream fileStream = new IsolatedStorageFileStream(fileName, System.IO.FileMode.Create, isf))
{
memWriter.BaseStream.CopyTo(fileStream);
}
}
}
}
private void button1_Click(object sender, RoutedEventArgs e)
{
ReplaceLine("test.txt", 1, "status1 read good");
}
And I agree with slugster: using SQLCE database might be a solution with better performance.
I have an application that crunches a bunch of text files. Currently, I have code like this (snipped-together excerpt):
FileInfo info = new FileInfo(...)
if (info.Length > 0) {
string content = getFileContents(...);
// uses a StreamReader
// returns reader.ReadToEnd();
Debug.Assert(!string.IsNullOrEmpty(contents)); // FAIL
}
private string getFileContents(string filename)
{
TextReader reader = null;
string text = "";
try
{
reader = new StreamReader(filename);
text = reader.ReadToEnd();
}
catch (IOException e)
{
// File is concurrently accessed. Come back later.
text = "";
}
finally
{
if (reader != null)
{
reader.Close();
}
}
return text;
}
Why am I getting a failed assert? The FileInfo.Length attribute was already used to validate that the file is non-empty.
Edit: This appears to be a bug -- I'm catching IO exceptions and returning empty-string. But, because of the discussion around fileInfo.Length(), here's something interesting: fileInfo.Length returns 2 for an empty, only-BOM-marker text file (created in Notepad).
You might have a file which is empty apart from a byte-order mark. I think TextReader.ReadToEnd() would remove the byte-order mark, giving you an empty string.
Alternatively, the file could have been truncated between checking the length and reading it.
For diagnostic purposes, I suggest you log the file length when you get an empty string.
See that catch (IOException) block you have? That's what returns an empty string and triggers the assert even when the file is not empty.
If I remember well, a file ends with end of file, which won't be included when you call ReadToEnd.
Therefore, the file size is not 0, but it's content size is.
What's in the getFileContents method?
It may be repositioning the stream's pointer to the end of the stream before ReadToEnd() is called.
I have a text file that contains about 100000 articles.
The structure of file is:
.Document ID 42944-YEAR:5
.Date 03\08\11
.Cat political
Article Content 1
.Document ID 42945-YEAR:5
.Date 03\08\11
.Cat political
Article Content 2
I want to open this file in c# for processing it line by line.
I tried this code:
String[] FileLines = File.ReadAllText(
TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray());
But it says:
Exception of type
'System.OutOfMemoryException' was
thrown.
The question is How can I open this file and read it line by line.
File Size: 564 MB (591,886,626 bytes)
File Encoding: UTF-8
File contains Unicode characters.
You can open the file and read it as a stream rather than loading everything into memory all at once.
From MSDN:
using System;
using System.IO;
class Test
{
public static void Main()
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
}
Your file is too large to be read into memory in one go, as File.ReadAllText is trying to do. You should instead read the file line by line.
Adapted from MSDN:
string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(#"c:\yourfile.txt"))
{
while ((line = file.ReadLine()) != null)
{
Console.WriteLine(line);
// do your processing on each line here
}
}
In this way, no more than a single line of the file is in memory at any one time.
If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.
MSDN Documentation - File.ReadLines Method (String)
Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0
Something like this:
using (var fileStream = File.OpenText(#"path to file"))
{
do
{
var fileLine = fileStream.ReadLine();
// process fileLine here
} while (!fileStream.EndOfStream);
}