I created a class with the responsibility to generate a text file where each line represents the information of an object of 'MyDataClass' class. Below is a simplification of my code:
public class Generator
{
private readonly Stream _stream;
private readonly StreamWriter _streamWriter;
private readonly List<MyDataClass> _items;
public Generator(Stream stream)
{
_stream = stream;
_streamWriter = new StreamWriter(_stream, Encoding.GetEncoding("ISO-8859-1"));
}
public void Generate()
{
foreach (var item in _items)
{
var line = AnotherClass.GetLineFrom(item);
_streamWriter.WriteLine(line);
}
_streamWriter.Flush();
_stream.Position = 0;
}
}
And I call this class like this:
using (var file = new FileStream("name", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
new Generator(file).Generate();
}
When I run the application on visual studio (I test with run (Ctrl+F5), debug (F5), with debug and release mode) all goes according to the plan. But I publish the application in a IIS server and now StreamWriter class put an extra \r before the end of the line.
Check it out the hexadecimal reading of both generated files:
Running in Visual Studio:
http://www.jonataspiazzi.xpg.com.br/hex_vs.bmp
Running in IIS:
http://www.jonataspiazzi.xpg.com.br/hex_iis.bmp
Some things I already checked:
Write the line variable (in var line = AnotherClass.GetLineFrom(item);) in a log to see if an extra '\r' is uncluded by the class AnotherClass.
Didn't result in nothing, the last char in line is a regular char like expected (in example above is a space).
Write another code to see if the problem is general for all IIS StreamWriter instances.
I tried this:
var ms = new MemoryStream();
var sw = new StreamWriter(ms, Encoding.GetEncoding("ISO-8859-1"));
sw.WriteLine("Test");
sw.WriteLine("Of");
sw.WriteLine("Lines");
sw.Flush();
ms.Position = 0;
In this case the code works well for both visual studio and IIS.
I'm in this for 3 days, I already try everything my brain can think. Did anyone have any clue for what I can try?
UPDATE
Get weirder! I try to replace the line _streamWriter.WriteLine(line); with:
_streamWriter.Write(linhaTexto + Environment.NewLine);
And even worse:
_streamWriter.Write(linhaTexto + "\r\n");
Both keep generating the extra \r character.
I try replace with this:
_streamWriter.Write(linhaTexto + "#\r\n#");
And get:
http://www.jonataspiazzi.xpg.com.br/hex_sharp.bmp
According to MSDN, WriteLine
Writes data followed by a line terminator to the text string or stream.
your last line should be
_streamWriter.Write(line);
Put it outside of your loop and change your loop so it doesn't manage the last line.
My guess is that the extra \r is added during FTP (maybe try a binary transfer)
Like here
I've tested the code and the extra /r is not due to the code in the current question
I had a similar issue. Environment.NewLine and WriteLine gave me extra \r character. But this below worked for me:
StringBuilder sbFileContent = new StringBuilder();
sbFileContent.Append(line);
sbFileContent.Append("\n");
streamWriter.Write(sbFileContent.ToString());
I just now had a similar problem where the code below would randomly insert blank lines in the output file (outFile)
using (StreamWriter outFile = new StreamWriter(outFilePath, true)) {
foreach (string line in File.ReadLines(logPath)) {
string concatLine = parse(line, out bool shouldWrite);
if (shouldWrite) {
outFile.WriteLine(concatLine);
}
}
}
Using Antar's idea I changed my parse function so that it returned a line with Environment.NewLine appended, ie
return myStringBuilder.Append(Environment.NewLine).ToString();
and then in the foreach loop above, changed the
outFile.WriteLine(concatLine);
to
outFile.Write(concatLine);
and now it writes the file without a bunch of random new lines inserted. However, I still have absolutely no idea why I should have to do this.
Related
I'm trying to modify an .ini file, in C# with .NET 5.0, using FileStream and StreamReader / StreamWriter. I just need to modify the first line of the file so I read the entire file into a list of strings called strList, modify the first line, and then write it all back to the same file.
List<string> strList = new List<string>();
using (FileStream fs = File.OpenRead(#"C:\MyFolder\test.ini"))
{
using (StreamReader sr = new StreamReader(fs))
{
while (!sr.EndOfStream)
{
strList.Add(sr.ReadLine());
}
}
}
strList[0] = "test01";
using (FileStream fs = File.OpenWrite(#"C:\MyFolder\test.ini"))
{
using (StreamWriter sw = new StreamWriter(fs))
{
for (int x = 0; x < ewsLines.Count; x++)
{
sw.WriteLine(strList[x]);
}
}
}
The issue I'm running into is that I'll have new character(s) at the end of my file on new line(s). I verified that the number of lines I read from the file matches what is in the file and that the for loop only writes that same number of lines back into the file. I don't have any issues writing other strings except for "test01". This string is the only one that causes the issue that I just described. It seems to be grabbing characters from the last line like R or LAYER from MULTI_LAYER.
Ex 1: This
S10087_U1
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
Becomes this
test01
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
R
Ex 2: This
test01 - Copy
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
ER
Becomes this
test01
Cq4InEq=TRUE
XtrVer=5.5
IOCUPDATEMDB=TRUE
ARCHITECTURE=MULTI_LAYER
LAYER
Replacing the StreamWriter portion with the following seems to fix the issue but I'm trying to figure out why using StreamWriter doesn't work as I expect it to.
File.WriteAllLines(#"C:\MyFolder\test.ini", strList);
This is because you're using File.OpenWrite. From the remarks in the documentation:
The OpenWrite method opens a file if one already exists for the file path, or creates a new file if one does not exist. For an existing file, it does not append the new text to the existing text. Instead, it overwrites the existing characters with the new characters. If you overwrite a longer string (such as "This is a test of the OpenWrite method") with a shorter string (such as "Second run"), the file will contain a mix of the strings ("Second runtest of the OpenWrite method").
While you could just change your code to use File.Create instead, I'd suggest changing the code more significantly - not just the writing, but the reading too:
string path = #"C:\MyFolder\test.ini";
var lines = File.ReadAllLines(path);
lines[0] = "test01";
File.WriteAllLines(path, lines);
That's much simpler code to do the same thing.
The half-way house between the two would be to use File.OpenText (to return a StreamWriter) and File.CreateText (to return a StreamWriter). There's no need to do the wrapping yourself.
I am trying to modify a file-stream inline as the file has the potential to be very large and I don't want to load it into memory. The piece of information I'm editing will always be the same length so in theory I can just swap the content out using a stream reader but it doesn't seem to be writing to the correct place
I have created a section of code that using a stream reader will read line by line until it finds a regex match and will then attempt to swap the bytes out with the edited line. The code is as follows:
private void UpdateFile(string newValue, string path, string pattern)
{
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
int index = 0;
string line = "";
using (var fileStream = File.OpenRead(path))
using (var streamReader = new StreamReader(fileStream, Encoding.Default, true, 128))
{
while ((line = streamReader.ReadLine()) != null)
{
if (regex.Match(line).Success)
{
break;
}
index += Encoding.Default.GetBytes(line).Length;
}
}
if (line != null)
{
using (Stream stream = File.Open(path, FileMode.Open))
{
stream.Position = index + 1;
var newLine = regex.Replace(line, newValue);
var oldBytes = Encoding.Default.GetBytes(line);
var newBytes = Encoding.Default.GetBytes("\n" + newLine);
stream.Write(newBytes, 0, newBytes.Length);
}
}
}
The code almost works as expected, it inserts the updated line but it always does it a little early, just how early varies slightly based on the file I'm editing. I expect it is something to do with the way I am managing the stream position but I don't know the correct way to approach this.
Unfortunately the exact files I'm working on are under NDA.
The structure is as follows though:
A file will have an unkown amount of data followed by a line of a known format, for example:
Description: ABCDEF
I know the portion that follows "Description: " will always be 6 characters, so I do a replace on the line to replace with, for example, UVWXYZ.
The problem is that for example if a file read as
'...
UNIMPORTANT UNKNOWN DATA
DESCRIPTION: ABCDEF
MORE DATA
...'
it will come out as something like
'...
UNIMPORTANT UNKNOWN DDESCRIPTION: UVWXYZDEF
MORE DATA
...'
I think the problem here is that you are not considering the line feed ("\n") for each line you are getting and therefore your index is incorrectly setting the position of your stream. Try the following code:
private void UpdateFile(string newValue, string path, string pattern)
{
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
int index = 0;
string line = "";
using (var fileStream = File.OpenRead(path))
using (var streamReader = new StreamReader(fileStream, Encoding.Default, true, 128))
{
while ((line = streamReader.ReadLine()) != null)
{
if (regex.Match(line).Success)
{
break;
}
index += Encoding.ASCII.GetBytes(line + "\n").Length;
}
}
if (line != null)
{
using (Stream stream = File.Open(path, FileMode.Open))
{
stream.Position = index;
var newBytes = Encoding.Default.GetBytes(regex.Replace(line + "\n", newValue));
stream.Write(newBytes, 0, newBytes.Length);
}
}
}
In your example, you are "off" by 4 Characters. Not quite the common "off by one error", but close. But maybe a different pattern would help the most?
Programms nowadays rarely work "on the file" like that. There is just too much to go wrong, all the way to a power loss mid-process. Instead they:
create a empty new file at the same location. Often temporary named and hidden.
write the output to the new file
Once you are done and eveyrthing is good - all the caches are flushed and everything is on the disk (done by Stream.Close() or Dispose()) - just replace the old file with the new file using the OS move operation.
The advantage is that it is impossible to have data-loss. Even if the computer looses power mid-operation, at tops the temporary file is messed up. You still got the orignal file and yoou can just delte the temporary file and restart the work from scratch if you need too. Indeed recovery only makes sense in rare cases (Word Processors)
The replacement of old file by new file is done with a move order. If they are on the same partition, that is literally just a rename operation in the Filesytem. And as modern FS are basically designed like a topline, robust relational Databases there is no danger in this.
You can find that pattern in everything from your Word Porcessor of choice, to backup programms, the download manager of Firefox (as you might be overriding a file that was there befroe) and even zipping programms. Everytime you got a long writing phase and want to minimize the danger, it is to go to pattern.
And as you can work entirely in memory without having to deal with moving around the read/write head, it will get around your issue too.
Edit: I made some source code for it from memory/documentation. Might contain syntax errors
string sourcepath; //containts the source file path, set by other code
string temppath; //containts teh path of the tempfile. Should be in the same folder, and thus same partiion
//Open both Streams, can use a single using for this
//The supression of any Buffering on the output should be optional and will be detrimental to performance
using(var sourceStream = File.OpenRead(sourcepath),
outStream = File.Create(temppath, 0, FileOptions.WriteThrough )){
string line = "";
//itterte over the input
while((line = streamReader.ReadLine()) != null){
//do processing on line here
outStream.Write(line);
}
}
//replace the files. Pretty sure it will just overwrite without asking
File.Move(temppath, sourcepath);
I created a class with the responsibility to generate a text file where each line represents the information of an object of 'MyDataClass' class. Below is a simplification of my code:
public class Generator
{
private readonly Stream _stream;
private readonly StreamWriter _streamWriter;
private readonly List<MyDataClass> _items;
public Generator(Stream stream)
{
_stream = stream;
_streamWriter = new StreamWriter(_stream, Encoding.GetEncoding("ISO-8859-1"));
}
public void Generate()
{
foreach (var item in _items)
{
var line = AnotherClass.GetLineFrom(item);
_streamWriter.WriteLine(line);
}
_streamWriter.Flush();
_stream.Position = 0;
}
}
And I call this class like this:
using (var file = new FileStream("name", FileMode.OpenOrCreate, FileAccess.ReadWrite))
{
new Generator(file).Generate();
}
When I run the application on visual studio (I test with run (Ctrl+F5), debug (F5), with debug and release mode) all goes according to the plan. But I publish the application in a IIS server and now StreamWriter class put an extra \r before the end of the line.
Check it out the hexadecimal reading of both generated files:
Running in Visual Studio:
http://www.jonataspiazzi.xpg.com.br/hex_vs.bmp
Running in IIS:
http://www.jonataspiazzi.xpg.com.br/hex_iis.bmp
Some things I already checked:
Write the line variable (in var line = AnotherClass.GetLineFrom(item);) in a log to see if an extra '\r' is uncluded by the class AnotherClass.
Didn't result in nothing, the last char in line is a regular char like expected (in example above is a space).
Write another code to see if the problem is general for all IIS StreamWriter instances.
I tried this:
var ms = new MemoryStream();
var sw = new StreamWriter(ms, Encoding.GetEncoding("ISO-8859-1"));
sw.WriteLine("Test");
sw.WriteLine("Of");
sw.WriteLine("Lines");
sw.Flush();
ms.Position = 0;
In this case the code works well for both visual studio and IIS.
I'm in this for 3 days, I already try everything my brain can think. Did anyone have any clue for what I can try?
UPDATE
Get weirder! I try to replace the line _streamWriter.WriteLine(line); with:
_streamWriter.Write(linhaTexto + Environment.NewLine);
And even worse:
_streamWriter.Write(linhaTexto + "\r\n");
Both keep generating the extra \r character.
I try replace with this:
_streamWriter.Write(linhaTexto + "#\r\n#");
And get:
http://www.jonataspiazzi.xpg.com.br/hex_sharp.bmp
According to MSDN, WriteLine
Writes data followed by a line terminator to the text string or stream.
your last line should be
_streamWriter.Write(line);
Put it outside of your loop and change your loop so it doesn't manage the last line.
My guess is that the extra \r is added during FTP (maybe try a binary transfer)
Like here
I've tested the code and the extra /r is not due to the code in the current question
I had a similar issue. Environment.NewLine and WriteLine gave me extra \r character. But this below worked for me:
StringBuilder sbFileContent = new StringBuilder();
sbFileContent.Append(line);
sbFileContent.Append("\n");
streamWriter.Write(sbFileContent.ToString());
I just now had a similar problem where the code below would randomly insert blank lines in the output file (outFile)
using (StreamWriter outFile = new StreamWriter(outFilePath, true)) {
foreach (string line in File.ReadLines(logPath)) {
string concatLine = parse(line, out bool shouldWrite);
if (shouldWrite) {
outFile.WriteLine(concatLine);
}
}
}
Using Antar's idea I changed my parse function so that it returned a line with Environment.NewLine appended, ie
return myStringBuilder.Append(Environment.NewLine).ToString();
and then in the foreach loop above, changed the
outFile.WriteLine(concatLine);
to
outFile.Write(concatLine);
and now it writes the file without a bunch of random new lines inserted. However, I still have absolutely no idea why I should have to do this.
I'm trying to convert a file's encoding and replace some text along the way. Unfortunately, I'm getting an OutOfMemory exception. I'm not sure why. As I understand it, it streams the original file line by line into a var (str), completes a couple of string replacements, and then writes the converted line to the StreamWriter.
Can someone tell me what I'm doing wrong here?
EDIT 1
- I'm currently testing a single file - 1GB:2.5m rows.
- Replaced read and replace into a single line. Same results!
EDIT 2
???By the way, can anyone tell me why the question was downgraded? I'd like to know for future postings.???
The problem is with the file itself. It's output from SQL Server BCP where I explicitly flag the row terminator with a specific string. By default, when the row terminator flag is omitted, BCP adds a newline at the end of each row and the code below works perfectly.
What I still don't understand is: when I set the row terminator flag with a specific string, each record appears on a newline, so why doesn't streamreader see each record on a separate line? Instead, it appears it views the entire file as one long line. That still doesn't explain the OOM exception since I have well over a 100G of memory.
Unfortunately, explicitly setting the row terminator flag is a must. For now, I'll take this over to dba exchange.
Thanks
static void Main(string[] args)
{
String msg = String.Empty;
String str = String.Empty;
DirectoryInfo dInfo = new DirectoryInfo(#"\\server\share");
foreach (var f in dInfo.GetFiles())
{
using (StreamReader sr = new StreamReader(f.FullName, Encoding.Unicode, false))
{
using (StreamWriter sw = new StreamWriter(f.DirectoryName + "\\new\\" + f.Name, false, Encoding.UTF8))
{
try
{
while (!sr.EndOfStream)
{
str = sr.ReadLine().Replace("this","that");
sw.WriteLine(str);
}
}
catch (Exception e)
{
msg += f.Name + ": " + e.Message;
}
}
}
}
Console.WriteLine(msg);
Console.ReadLine();
}
Well, you're main reading and writing code needs just one line of data. Your msg string, on the other hand, keeps getting larger and larger with each exception.
You'll need to have many millions of files in the folder to get an OutOfMemory exception this way, though.
I need to be able to take a text file with unknown encoding (e.g., UTF-8, UTF-16, ...) and copy it line by line, making specific changes as I go. In this example, I am changing the encoding, however there are other uses for this kind of processing.
What I can't figure out is how to determine if the last line has a newline! Some programs care about the difference between a file with these records:
Rec1<newline>
Rec2<newline>
And a file with these:
Rec1<newline>
Rec2
How can I tell the difference in my code so that I can take appropriate action?
using (StreamReader reader = new StreamReader(sourcePath))
using (StreamWriter writer = new StreamWriter(destinationPath, false, outputEncoding))
{
bool isFirstLine = true;
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
if (isFirstLine)
{
writer.Write(line);
isFirstLine = false;
}
else
{
writer.Write("\r\n" + line);
}
}
//if (LastLineHasNewline)
//{
// writer.Write("\n");
//}
writer.Flush();
}
The commented out code is what I want to be able to do, but I can't figure out how to set the condition lastInputLineHadNewline! Remember, I have no a priori knowledge of the input file encoding.
Remember, I have no a priori knowledge of the input file encoding.
That's the fundamental problem to solve.
If the file could be using any encoding, then there is no concept of reading "line by line" as you can't possibly tell what the line ending is.
I suggest you first address this part, and the rest will be easy. Now, without knowing the context it's hard to say whether that means you should be asking the user for the encoding, or detecting it heuristically, or something else - but I wouldn't start trying to use the data before you can fully understand it.
As often happens, the moment you go to ask for help, the answer comes to the surface. The commented out code becomes:
if (LastLineHasNewline(reader))
{
writer.Write("\n");
}
And the function looks like this:
private static bool LastLineHasNewline(StreamReader reader)
{
byte[] newlineBytes = reader.CurrentEncoding.GetBytes("\n");
int newlineByteCount = newlineBytes.Length;
reader.BaseStream.Seek(-newlineByteCount, SeekOrigin.End);
byte[] inputBytes = new byte[newlineByteCount];
reader.BaseStream.Read(inputBytes, 0, newlineByteCount);
for (int i = 0; i < newlineByteCount; i++)
{
if (newlineBytes[i] != inputBytes[i])
return false;
}
return true;
}