Why is my filestream not writing correctly - c#

I am trying to modify a file-stream inline as the file has the potential to be very large and I don't want to load it into memory. The piece of information I'm editing will always be the same length so in theory I can just swap the content out using a stream reader but it doesn't seem to be writing to the correct place
I have created a section of code that using a stream reader will read line by line until it finds a regex match and will then attempt to swap the bytes out with the edited line. The code is as follows:
private void UpdateFile(string newValue, string path, string pattern)
{
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
int index = 0;
string line = "";
using (var fileStream = File.OpenRead(path))
using (var streamReader = new StreamReader(fileStream, Encoding.Default, true, 128))
{
while ((line = streamReader.ReadLine()) != null)
{
if (regex.Match(line).Success)
{
break;
}
index += Encoding.Default.GetBytes(line).Length;
}
}
if (line != null)
{
using (Stream stream = File.Open(path, FileMode.Open))
{
stream.Position = index + 1;
var newLine = regex.Replace(line, newValue);
var oldBytes = Encoding.Default.GetBytes(line);
var newBytes = Encoding.Default.GetBytes("\n" + newLine);
stream.Write(newBytes, 0, newBytes.Length);
}
}
}
The code almost works as expected, it inserts the updated line but it always does it a little early, just how early varies slightly based on the file I'm editing. I expect it is something to do with the way I am managing the stream position but I don't know the correct way to approach this.
Unfortunately the exact files I'm working on are under NDA.
The structure is as follows though:
A file will have an unkown amount of data followed by a line of a known format, for example:
Description: ABCDEF
I know the portion that follows "Description: " will always be 6 characters, so I do a replace on the line to replace with, for example, UVWXYZ.
The problem is that for example if a file read as
'...
UNIMPORTANT UNKNOWN DATA
DESCRIPTION: ABCDEF
MORE DATA
...'
it will come out as something like
'...
UNIMPORTANT UNKNOWN DDESCRIPTION: UVWXYZDEF
MORE DATA
...'

I think the problem here is that you are not considering the line feed ("\n") for each line you are getting and therefore your index is incorrectly setting the position of your stream. Try the following code:
private void UpdateFile(string newValue, string path, string pattern)
{
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
int index = 0;
string line = "";
using (var fileStream = File.OpenRead(path))
using (var streamReader = new StreamReader(fileStream, Encoding.Default, true, 128))
{
while ((line = streamReader.ReadLine()) != null)
{
if (regex.Match(line).Success)
{
break;
}
index += Encoding.ASCII.GetBytes(line + "\n").Length;
}
}
if (line != null)
{
using (Stream stream = File.Open(path, FileMode.Open))
{
stream.Position = index;
var newBytes = Encoding.Default.GetBytes(regex.Replace(line + "\n", newValue));
stream.Write(newBytes, 0, newBytes.Length);
}
}
}

In your example, you are "off" by 4 Characters. Not quite the common "off by one error", but close. But maybe a different pattern would help the most?
Programms nowadays rarely work "on the file" like that. There is just too much to go wrong, all the way to a power loss mid-process. Instead they:
create a empty new file at the same location. Often temporary named and hidden.
write the output to the new file
Once you are done and eveyrthing is good - all the caches are flushed and everything is on the disk (done by Stream.Close() or Dispose()) - just replace the old file with the new file using the OS move operation.
The advantage is that it is impossible to have data-loss. Even if the computer looses power mid-operation, at tops the temporary file is messed up. You still got the orignal file and yoou can just delte the temporary file and restart the work from scratch if you need too. Indeed recovery only makes sense in rare cases (Word Processors)
The replacement of old file by new file is done with a move order. If they are on the same partition, that is literally just a rename operation in the Filesytem. And as modern FS are basically designed like a topline, robust relational Databases there is no danger in this.
You can find that pattern in everything from your Word Porcessor of choice, to backup programms, the download manager of Firefox (as you might be overriding a file that was there befroe) and even zipping programms. Everytime you got a long writing phase and want to minimize the danger, it is to go to pattern.
And as you can work entirely in memory without having to deal with moving around the read/write head, it will get around your issue too.
Edit: I made some source code for it from memory/documentation. Might contain syntax errors
string sourcepath; //containts the source file path, set by other code
string temppath; //containts teh path of the tempfile. Should be in the same folder, and thus same partiion
//Open both Streams, can use a single using for this
//The supression of any Buffering on the output should be optional and will be detrimental to performance
using(var sourceStream = File.OpenRead(sourcepath),
outStream = File.Create(temppath, 0, FileOptions.WriteThrough )){
string line = "";
//itterte over the input
while((line = streamReader.ReadLine()) != null){
//do processing on line here
outStream.Write(line);
}
}
//replace the files. Pretty sure it will just overwrite without asking
File.Move(temppath, sourcepath);

Related

StreamWriter adds extra character(s) on new line(s) at the end of file

I'm trying to modify an .ini file, in C# with .NET 5.0, using FileStream and StreamReader / StreamWriter. I just need to modify the first line of the file so I read the entire file into a list of strings called strList, modify the first line, and then write it all back to the same file.
List<string> strList = new List<string>();
using (FileStream fs = File.OpenRead(#"C:\MyFolder\test.ini"))
{
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
{
strList.Add(sr.ReadLine());
}
}
}
strList[0] = "test01";
using (FileStream fs = File.OpenWrite(#"C:\MyFolder\test.ini"))
{
using (StreamWriter sw = new StreamWriter(fs))
{
for (int x = 0; x < ewsLines.Count; x++)
{
sw.WriteLine(strList[x]);
}
}
}
The issue I'm running into is that I'll have new character(s) at the end of my file on new line(s). I verified that the number of lines I read from the file matches what is in the file and that the for loop only writes that same number of lines back into the file. I don't have any issues writing other strings except for "test01". This string is the only one that causes the issue that I just described. It seems to be grabbing characters from the last line like R or LAYER from MULTI_LAYER.
Ex 1: This
S10087_U1
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
Becomes this
test01
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
R
Ex 2: This
test01 - Copy
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
ER
Becomes this
test01
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
LAYER
Replacing the StreamWriter portion with the following seems to fix the issue but I'm trying to figure out why using StreamWriter doesn't work as I expect it to.
File.WriteAllLines(#"C:\MyFolder\test.ini", strList);
This is because you're using File.OpenWrite. From the remarks in the documentation:
The OpenWrite method opens a file if one already exists for the file path, or creates a new file if one does not exist. For an existing file, it does not append the new text to the existing text. Instead, it overwrites the existing characters with the new characters. If you overwrite a longer string (such as "This is a test of the OpenWrite method") with a shorter string (such as "Second run"), the file will contain a mix of the strings ("Second runtest of the OpenWrite method").
While you could just change your code to use File.Create instead, I'd suggest changing the code more significantly - not just the writing, but the reading too:
string path = #"C:\MyFolder\test.ini";
var lines = File.ReadAllLines(path);
lines[0] = "test01";
File.WriteAllLines(path, lines);
That's much simpler code to do the same thing.
The half-way house between the two would be to use File.OpenText (to return a StreamWriter) and File.CreateText (to return a StreamWriter). There's no need to do the wrapping yourself.

C# - Read bytes from file from a specific string

I'm trying to parse a crg-file in C#. The file is mixed with plain text and binary data. The first section of the file contains plain text while the rest of the file is binary (lots of floats), here's an example:
$
$ROAD_CRG
reference_line_start_u = 100
reference_line_end_u = 120
$
$KD_DEFINITION
#:KRBI
U:reference line u,m,730.000,0.010
D:reference line phi,rad
D:long section 1,m
D:long section 2,m
D:long section 3,m
...
$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
�#z����RA����\�l
...
I know I can read bytes starting at a specific offset but how do I find out which byte to start from? The last row before the binary section will always contain at least four dollar signs "$$$$". Here's what I've got so far:
using var fs = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read);
var startByte = ??; // How to find out where to start?
using (BinaryReader reader = new BinaryReader(fs))
{
reader.BaseStream.Seek(startByte, SeekOrigin.Begin);
var f = reader.ReadSingle();
Debug.WriteLine(f);
}
When you have a mixture of text data and binary data, you need to treat everything as binary. This means you should be using raw Stream access, or something similar, and using binary APIs to look through the text data (often looking for cr/lf/crlf at bytes as sentinels, although it sounds like in your case you could just look for the $$$$ using binary APIs, then decode the entire block before, and scan forwards). When you think you have an entire line, then you can use Encoding to parse each line - the most convenient API being encoding.GetString(). When you've finished looking through the text data as binary, then you can continue parsing the binary data, again using the binary API. I would usually recommend against BinaryReader here too, because frankly it doesn't gain you much over more direct API. The other problem you might want to think about is CPU endianness, but assuming that isn't a problem: BitConverter.ToSingle() may be your friend.
If the data is modest in size, you may find it easiest to use byte[] for the data; either via File.ReadAllBytes, or by renting an oversized byte[] from the array-pool, and loading it from a FileStream. The Stream API is awkward for this kind of scenario, because once you've looked at data: it has gone - so you need to maintain your own back-buffers. The pipelines API is ideal for this, when dealing with large data, but is an advanced topic.
UPDATE: This code may not work as expected. Please review the valuable information in the comments.
using (var fs = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read))
{
using (StreamReader sr = new StreamReader(fs, Encoding.ASCII, true, 1, true))
{
var line = sr.ReadLine();
while (!string.IsNullOrWhiteSpace(line) && !line.Contains("$$$$"))
{
line = sr.ReadLine();
}
}
using (BinaryReader reader = new BinaryReader(fs))
{
// TODO: Start reading the binary data
}
}
Solution
I know this is far from the most optimized solution but in my case it did the trick and since the plain text section of the file was known to be fairly small this didn't cause any noticable performance issues. Here's the code:
using var fileStream = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read);
using var reader = new BinaryReader(fileStream);
var newLine = '\n';
var markerString = "$$$$";
var currentString = "";
var foundMarker = false;
var foundNewLine = false;
while (!foundNewLine)
{
var c = reader.ReadChar();
if (!foundMarker)
{
currentString += c;
if (currentString.Length > markerString.Length)
currentString = currentString.Substring(1);
if (currentString == markerString)
foundMarker = true;
}
else
{
if (c == newLine)
foundNewLine = true;
}
}
if (foundNewLine)
{
// Read binary
}
Note: If you're dealing with larger or more complex files you should probably take a look at Mark Gravell's answer and the comment sections.

In C#, How can I copy a file with arbitrary encoding, reading line by line, without adding or deleting a newline

I need to be able to take a text file with unknown encoding (e.g., UTF-8, UTF-16, ...) and copy it line by line, making specific changes as I go. In this example, I am changing the encoding, however there are other uses for this kind of processing.
What I can't figure out is how to determine if the last line has a newline! Some programs care about the difference between a file with these records:
Rec1<newline>
Rec2<newline>
And a file with these:
Rec1<newline>
Rec2
How can I tell the difference in my code so that I can take appropriate action?
using (StreamReader reader = new StreamReader(sourcePath))
using (StreamWriter writer = new StreamWriter(destinationPath, false, outputEncoding))
{
bool isFirstLine = true;
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (isFirstLine)
{
writer.Write(line);
isFirstLine = false;
}
else
{
writer.Write("\r\n" + line);
}
}
//if (LastLineHasNewline)
//{
// writer.Write("\n");
//}
writer.Flush();
}
The commented out code is what I want to be able to do, but I can't figure out how to set the condition lastInputLineHadNewline! Remember, I have no a priori knowledge of the input file encoding.
Remember, I have no a priori knowledge of the input file encoding.
That's the fundamental problem to solve.
If the file could be using any encoding, then there is no concept of reading "line by line" as you can't possibly tell what the line ending is.
I suggest you first address this part, and the rest will be easy. Now, without knowing the context it's hard to say whether that means you should be asking the user for the encoding, or detecting it heuristically, or something else - but I wouldn't start trying to use the data before you can fully understand it.
As often happens, the moment you go to ask for help, the answer comes to the surface. The commented out code becomes:
if (LastLineHasNewline(reader))
{
writer.Write("\n");
}
And the function looks like this:
private static bool LastLineHasNewline(StreamReader reader)
{
byte[] newlineBytes = reader.CurrentEncoding.GetBytes("\n");
int newlineByteCount = newlineBytes.Length;
reader.BaseStream.Seek(-newlineByteCount, SeekOrigin.End);
byte[] inputBytes = new byte[newlineByteCount];
reader.BaseStream.Read(inputBytes, 0, newlineByteCount);
for (int i = 0; i < newlineByteCount; i++)
{
if (newlineBytes[i] != inputBytes[i])
return false;
}
return true;
}

Reading of particular line of text file in WP7 based on some starting string and replacing (overwriting) it

I am able to do read/write/append operation on text file storing in isolated storage in WP7 application.
My scenario is that I am storing space seperated values in text file inside isolated storage.
So if I have to find for some particular line having some starting key then how to overwrite
value for that key without affecting the other line before and after it.
Example:
Key Value SomeOtherValue
*status read good
status1 unread bad
status2 null cantsay*
So if I have to change the whole second line based on some condition with key as same
status1 read good
How can I achieve this?
There are a number of ways you could do this, and the method you choose should be best suited to the size and complexity of the data file.
One option to get you started is to use the static string.Replace() method. This is crude, but if your file is only small then there is nothing wrong with it.
class Program
{
static void Main(string[] args)
{
StringBuilder sb = new StringBuilder();
sb.AppendLine("*status read good");
sb.AppendLine("status1 unread bad");
sb.AppendLine("status2 null cantsay*");
string input = sb.ToString();
var startPos = input.IndexOf("status1");
var endPos = input.IndexOf(Environment.NewLine, startPos);
var modifiedInput = input.Replace(oneLine.Substring(startPos, endPos - startPos), "status1 read good");
Console.WriteLine(modifiedInput);
Console.ReadKey();
}
}
If you store this information in text files then there won't be a way around replacing whole files. The following code does exactly this and might even be what you are doing right now.
// replace a given line in a given text file with a given replacement line
private void ReplaceLine(string fileName, int lineNrToBeReplaced, string newLine)
{
using (IsolatedStorageFile isf = IsolatedStorageFile.GetUserStoreForApplication())
{
// the memory writer will hold the read and modified lines
using (StreamWriter memWriter = new StreamWriter(new MemoryStream()))
{
// this is for reading lines from the source file
using (StreamReader fileReader = new StreamReader(new IsolatedStorageFileStream(fileName, System.IO.FileMode.Open, isf)))
{
int lineCount = 0;
// iterate file and read lines
while (!fileReader.EndOfStream)
{
string line = fileReader.ReadLine();
// check if this is the line which should be replaced; check is done by line
// number but could also be based on content
if (lineCount++ != lineNrToBeReplaced)
{
// just copy line from file
memWriter.WriteLine(line);
}
else
{
// replace line from file
memWriter.WriteLine(newLine);
}
}
}
memWriter.Flush();
memWriter.BaseStream.Position = 0;
// re-create file and save all lines from memory to this file
using (IsolatedStorageFileStream fileStream = new IsolatedStorageFileStream(fileName, System.IO.FileMode.Create, isf))
{
memWriter.BaseStream.CopyTo(fileStream);
}
}
}
}
private void button1_Click(object sender, RoutedEventArgs e)
{
ReplaceLine("test.txt", 1, "status1 read good");
}
And I agree with slugster: using SQLCE database might be a solution with better performance.

How do I locate a particular word in a text file using .NET

I am sending mails (in asp.net ,c#), having a template in text file (.txt) like below
User Name :<User Name>
Address : <Address>.
I used to replace the words within the angle brackets in the text file using the below code
StreamReader sr;
sr = File.OpenText(HttpContext.Current.Server.MapPath(txt));
copy = sr.ReadToEnd();
sr.Close(); //close the reader
copy = copy.Replace(word.ToUpper(),"#" + word.ToUpper()); //remove the word specified UC
//save new copy into existing text file
FileInfo newText = new FileInfo(HttpContext.Current.Server.MapPath(txt));
StreamWriter newCopy = newText.CreateText();
newCopy.WriteLine(copy);
newCopy.Write(newCopy.NewLine);
newCopy.Close();
Now I have a new problem,
the user will be adding new words within an angle, say for eg, they will be adding <Salary>.
In that case i have to read out and find the word <Salary>.
In other words, I have to find all the words, that are located with the angle brackets (<>).
How do I do that?
Having a stream for your file, you can build something similar to a typical tokenizer.
In general terms, this works as a finite state machine: you need an enumeration for the states (in this case could be simplified down to a boolean, but I'll give you the general approach so you can reuse it on similar tasks); and a function implementing the logic. C#'s iterators are quite a fit for this problem, so I'll be using them on the snippet below. Your function will take the stream as an argument, will use an enumerated value and a char buffer internally, and will yield the strings one by one. You'll need this near the start of your code file:
using System.Collections.Generic;
using System.IO;
using System.Text;
And then, inside your class, something like this:
enum States {
OUT,
IN,
}
IEnumerable<string> GetStrings(TextReader reader) {
States state=States.OUT;
StringBuilder buffer;
int ch;
while((ch=reader.Read())>=0) {
switch(state) {
case States.OUT:
if(ch=='<') {
state=States.IN;
buffer=new StringBuilder();
}
break;
case States.IN:
if(ch=='>') {
state=States.OUT;
yield return buffer.ToString();
} else {
buffer.Append(Char.ConvertFromUtf32(ch));
}
break;
}
}
}
The finite-state machine model always has the same layout: while(READ_INPUT) { switch(STATE) {...}}: inside each case of the switch, you may be producing output and/or altering the state. Beyond that, the algorithm is defined in terms of states and state changes: for any given state and input combination, there is an exact new state and output combination (the output can be "nothing" on those states that trigger no output; and the state may be the same old state if no state change is triggered).
Hope this helps.
EDIT: forgot to mention a couple of things:
1) You get a TextReader to pass to the function by creating a StreamReader for a file, or a StringReader if you already have the file on a string.
2) The memory and time costs of this approach are O(n), with n being the length of the file. They seem quite reasonable for this kind of task.
Using regex.
var matches = Regex.Matches(text, "<(.*?)>");
List<string> words = new List<string>();
for (int i = 0; i < matches.Count; i++)
{
words.Add(matches[i].Groups[1].Value);
}
Of course, this assumes you already have the file's text in a variable. Since you have to read the entire file to achieve that, you could look for the words as you are reading the stream, but I don't know what the performance trade off would be.
This is not an answer, but comments can't do this:
You should place some of your objects into using blocks. Something like this:
using(StreamReader sr = File.OpenText(HttpContext.Current.Server.MapPath(txt)))
{
copy = sr.ReadToEnd();
} // reader is closed by the end of the using block
//remove the word specified UC
copy = copy.Replace(word.ToUpper(), "#" + word.ToUpper());
//save new copy into existing text file
FileInfo newText = new FileInfo(HttpContext.Current.Server.MapPath(txt));
using(var newCopy = newText.CreateText())
{
newCopy.WriteLine(copy);
newCopy.Write(newCopy.NewLine);
}
The using block ensures that resources are cleaned up even if an exception is thrown.

Categories