StreamWriter is writing carriage returns? - c#

I have a very simple console application that creates a text file. Below is a recap of the code:
StreamWriter writer = File.CreateText("c:\\temp.txt");
foreach (blah...)
{
writer.Write(body.ToString() + "\n");
writer.Flush();
}
writer.Close();
The client is claiming there are carriage returns at the end of each line. Where are these carriage returns coming from?
Update: After opening in VS binary editor and Notepad++, there were no occurrences of 0d 0a. I'm going to go back to the client.

Open the file in the Visual Studio binary editor (File.Open.File, click down-arrow on Open button, choose Open With... and pick Binary Editor), and look for 0D bytes. If none are present, then either:
your client can't tell the the difference between a line feed and a carriage return,
your transmission method is modifying the file en-route. Is there any FTP binary/ascii mismatch going on?
If there are 0D bytes, then they are present in your body variable.

I tested your code.
alt text http://img830.imageshack.us/img830/5443/18414385.png
The code you posted does not have any carriage returns (0D) only new lines (0A). Something else is creating the carriage returns or the client does not know what a carriage return really is.

In your code you put a line feed (\n).
Your customer is talking about a carriage return (\r). Maybe your customer is taking a line feed per a carriage return ?

The "\n" at the end of each write call
EDIT: I know this is a new line, not a carriage return but I bet any money the client is getting confused between the two and it's actually this that is causing the problem

Does the client distinguish between a CR and LF? Is the flush() necessary? Are you overloading the buffer if you don't flush?
Unless you have a massive amount of text you might find more use out of creating a StringBuilder to format the text exactly as you want it with \n, \r, \t or whatever and then pumping that directly into a StreamWriter.

If each body string's first character was '\r', it would explain what you're seeing.

Have you checked whether body is also ending with characters you don't want printed? This is the other potential problem source.

Related

Write text to file in C# with 513 space characters

Here is a code that writes the string to a file
System.IO.File.WriteAllText("test.txt", "P ");
It's basically the character 'P' followed by a total of 513 space character.
When I open the file in Notepad++, it appears to be fine. However, when I open in windows Notepad, all I see is garbled characters.
If instead of 513 space character, I add 514 or 512, it opens fine in Notepad.
What am I missing?
What you are missing is that Notepad is guessing, and it is not because your length is specifically 513 spaces ... it is because it is an even number of bytes and the file size is >= 100 total bytes. Try 511 or 515 spaces ... or 99 ... you'll see the same misinterpretation of your file contents. With an odd number of bytes, Notepad can assume that your file is not any of the double-byte encodings, because those would all result in 2 bytes per character = even number of total bytes in the file. If you give the file a few more low-order ASCII characters at the beginning (e.g., "PICKLE" + spaces), Notepad does a much better job of understanding that it should treat the content as single-byte chars.
The suggested approach of including Encoding.UTF8 is the easiest fix ... it will write a BOM to the beginning of the file which tells Notepad (and Notepad++) what the format of the data is, so that it doesn't have to resort to this guessing behavior (you can see the difference between your original approach and the BOM approach by opening both in Notepad++, then look in the bottom-right corner of the app. With the BOM, it will tell you the encoding is UTF-8-BOM ... without it, it will just say UTF-8).
I should also say that the contents of your file are not 'wrong', per se... the weird format is purely due to Notepad's "guessing" algorithm. So unless it's a requirement that people use Notepad to read your file with 1 letter and a large, odd number of spaces ... maybe just don't sweat it. If you do change to writing the file with Encoding.UTF8, then you do need to ensure that any other system that reads your file knows how to honor the BOM, because it is a real change to the contents of your file. If you cannot verify that all consumers of your file can/will handle the BOM, then it may be safer to just understand that Notepad happens to make a bad guess for your specific use case, and leave the raw contents exactly how you want them.
You can verify the physical difference in your file with the BOM by doing a binary read and then converting them to a string (you can't "see" the change with ReadAllText, because it honors & strips the BOM):
byte[] contents = System.IO.File.ReadAllBytes("test.txt");
Console.WriteLine(Encoding.ASCII.GetString(contents));
Try passing in a different encoding:
i. System.IO.File.WriteAllText(filename , stringVariable, Encoding.UTF8);
ii. System.IO.File.WriteAllText(filename , stringVariable, Encoding.UTF32);
iii. etc.
Also You could try using another way to build your string, to make it be easier to read, change and count, instead of tapping the space bar 513 times;
i. Use the string constructor (like #Tigran suggested)
var result = "P" + new String(' ', 513);
ii. Use the stringBuilder
var stringBuilder = new StringBuilder();
stringBuilder.Append("P");
for (var i = 1; i <= 513; i++) { stringBuilder.Append(" "); }
iii. Or both
public string AppendSpacesToString(string stringValue, int numberOfSpaces)
{
var stringBuilder = new StringBuilder();
stringBuilder.Append(stringValue);
stringBuilder.Append(new String(' ', numberOfSpaces));
return stringBuilder.ToString();
}

Check for carriage return\line feed in tab delimited text file in C#

I have what I believed to be new line feed\carriage return in tab delimited file that I am reading using C# Stream Reader, please see an extract below, the second and third lines is actually a single line that contains what I believed to be carriage return after "NL" on the second line. I have tried using the code below to determine the presence of new line\carriage return, but no luck.
Could someone please help?
Code extract
string line = sr.ReadLine();
if (line.EndsWith(Environment.NewLine))
{
MessageBox.Show("New line detected");
}
File extract
1224 TX68176 FR123 0.2241 2788848 JP31650 B62G7K6 J7618E108 8630
----------
1225 TX68176 NL
----------
128 0.2241 2788848 JP3165000 B62G7K6 J7618E108 8630
Because you are reading the line with ReadLine, you will never get an Environment.NewLine at the end of the line. Your real problem is that you have a line of data, which you are probably expecting to be a single line, split into multiple lines. The exception you are getting does not come from having an newline in the line you read, and you are not going to fix it by trying to detect a newline character.
The problem probably comes from the rest of your code expecting fields in the line that are not there, because this part of the code read a line of text data that was only a partial line of data. The rest of your code chokes on not getting all the fields in that data line. To detect that you have only a partial line of data, you will need to probably detect on line length, since it seems to be a fixed length formatted file, or detect on the number of fields after you split it with tabs.

C# Reading files and encoding issue

I've searched everywhere for this answer so hopefully it's not a duplicate. I decided I'm just finally going to ask it here.
I have a file named Program1.exe When I drag that file into Notepad or Notepad++ I get all kinds of random symbols and then some readable text. However, when I try to read this file in C#, I either get inaccurate results, or just a big MZ. I've tried all supported encodings in C#. How can notepad programs read a file like this but I simply can't? I try to convert bytes to string and it doesn't work. I try to directly read line by line and it doesn't work. I've even tried binary and it doesn't work.
Thanks for the help! :)
Reading a binary file as text is a peculiar thing to do, but it is possible. Any of the 8-bit encodings will do it just fine. For example, the code below opens and reads an executable and outputs it to the console.
const string fname = #"C:\mystuff\program.exe";
using (var sw = new StreamReader(fname, Encoding.GetEncoding("windows-1252")))
{
var s = sw.ReadToEnd();
s = s.Replace('\x0', ' '); // replace NUL bytes with spaces
Console.WriteLine(s);
}
The result is very similar to what you'll see in Notepad or Notepad++. The "funny symbols" will differ based on how your console is configured, but you get the idea.
By the way, if you examine the string in the debugger, you're going to see something quite different. Those funny symbols are encoded as C# character escapes. For example, nul bytes (value 0) will display as \0 in the debugger, as NUL in Notepad++, and as spaces on the console or in Notepad. Newlines show up as \r in the debugger, etc.
As I said, reading a binary file as text is pretty peculiar. Unless you're just looking to see if there's human-readable data in the file, I can't imagine why you'd want to do this.
Update
I suspect the reason that all you see in the Windows Forms TextBox is "MZ" is that the Windows textbox control (which is what the TextBox ultimately uses), uses the NUL character as a string terminator, so won't display anything after the first NUL. And the first thing after the "MZ" is a NUL (shows as `\0' in the debugger). You'll have to replace the 0's in the string with spaces. I edited the code example above showing how you'd do that.
The exe is a binary file and if you try to read it as a text file you'll get the effect that you are describing. Try using something like a FileStream instead that does not care about the structure of the file but treats it just as a series of bytes.

What is the use of Environment.NewLine and TextWriter.NewLine properties

I have a weird situation that I don't understand relating to newlines '\n' that I am sending to a file. The newlines do not seem to be treated the according to the NewLine properties of TextWriter and Environment. This code demonstrates:
String baseDir = Environment.GetEnvironmentVariable("USERPROFILE") + '\\';
String fileName = baseDir + "deleteme5.txt";
FileInfo fi = new FileInfo(fileName);
fi.Delete();
FileStream fw = new FileStream(fileName, FileMode.CreateNew, FileAccess.Write);
StreamWriter sw = new StreamWriter(fw);
Console.WriteLine(Environment.NewLine.Length);
Console.WriteLine(sw.NewLine.Length);
sw.Write("1\uf0f1\n2\ue0e1\n3\ud0d1\n");
sw.Flush();
sw.Close();
When I run this the console output is
2
2
When I look at my file in hex mode I get:
00000000h: 31 EF 83 B1 0A 32 EE 83 A1 0A 33 ED 83 91 0A ;
1.2.3탑.
Clearly, the API says two characters and when you look in the file there is only one character. Now when I look to the description of the Write method in TextWriter it indicates that the Write method does not substitute 0A with the NewLine property. Well, if the Write method doesn't take that into account, what is the use of having not one but two NewLine properties? What are these things for?
Programmers have a very long history of not agreeing how text should be encoded when it is written to a file. ASCII and Unicode helped level the Tower of Babel to some degree. But what characters denote the ending of a line was never agreed upon.
Windows uses the carriage return + line feed control codes. "\r\n" in C# code.
Unix flavors use just a single line feed control code, '\n' in C# code.
Apple historically used just a single carriage return control code, '\r' in C# code.
.NET needed to be compatible with all these incompatible choices. So it added the Environment.NewLine property, it has the default line ending sequence for your operating system. Note how you can run .NET code on Unix and Apple machines with Mono or Silverlight.
The abstract TextWriter class needs to know what sequence to use since it writes text files. So it has a NewLine property, its default is the same as Environment.NewLine. Which you almost always use, but you might want to change it if you need to create a text file that's read by a program on another operating system.
The mistake you made in your program is that you hard-coded the line terminator. You used '\n' in your string. This completely bypasses the .NET properties, you'll only ever see the single line feed control code in the text file. 0x0A is a line feed. Your console output displays "2" since that just displays the string length of the NewLine property. Which is 2 on Windows for "\r\n".
The simplest way to use the .NET property is to use WriteLine() instead of Write():
sw.WriteLine("1\uf0f1");
sw.WriteLine("2\ue0e1");
sw.WriteLine("3\ud0d1");
Which makes your code nicely readable as well, it isn't any slower at runtime. If you want to keep the one-liner then you could use composite formatting:
sw.Write("1\uf0f1{0}2\ue0e1{0}3\ud0d1{0}", Environment.NewLine);
If choose to generate the 'linebreaks' your self by sending \n to the streamwriter there is no way the framework is going to interfer with that. If you want the framework to honor the NewLine property use the WriteLine method of the writer and set the NewLine property of the Writer.
Adapt your code like so:
sw.NewLine = Environment.NewLine; // StreamWriter uses \r\n by default
sw.WriteLine("1\uf0f1")
sw.WriteLine("2\ue0e1");
sw.WriteLine("3\ud0d1");
Or have a Custom StreamWriter that overrides the Write method:
public class MyStreamWriter:StreamWriter
{
public MyStreamWriter(Stream s):base(s)
{
}
public override void Write(string s)
{
base.Write(s.Replace("\n",Environment.NewLine));
}
}
Or if you only have one line that you want to handle:
sw.Write("1\uf0f1\n2\ue0e1\n3\ud0d1\n".Replace("\n", Environment.NewLine));
If you use implicitly, as in calling a WriteLine method or explicitly as in Write(String.Concat("Hello", Environment.NewLine), you get the end of line character(s) defined for your environment. If you don't use it and use say '\n' or even '$', then you are saying no matter what environment I'm in, lines will end like I say. If you want to compare behaviour, write a bit of code and run it under windows and linux (mono)
All newlines escaped as \n in a string are single-character ASCII newlines (0x0A) (not Windows newlines 0D0A) and output to streams in writers as 0x0A unless the programmer takes some explicit step to convert these within the string to the format 0D0A.
The TextWriter.NewLine property is used only by methods like WriteLine, and controls the formatting of the implicit newline that is appended as part of the invocation.
The distinction between Environment.NewLine and TextWriter.NewLine is that Environment.NewLine is readonly, only meant to be queried by programmers. (This is different from Java, for instance, where you can change the "system-wide" newline formatting default with System.setProperty("line.separator", x);
In C# you can modify the format of the implicit newline when writing using TextWriter.NewLine, which is initialized to Environment.NewLine. When using TextReader methods that read lines, there is no TextReader.NewLine property. The implicit newline behavior for readers is to break at any 0x0A, 0x0D, or 0D0A
As pointed out by rene the original problem could be resolved by writing:
sw.Write("1\uf0f1\n2\ue0e1\n3\ud0d1\n".Replace("\n", Environment.NewLine));

C# TextWriter inserting line break every 1024 characters

I'm using the textwriter to write data to a text file but if the line exceeds 1024 characters a line break is inserted and this is a problem for me. Any suggestions on how to work round this or increase the character limit?
textWriter.WriteLine(strOutput);
Many thanks
Use Write, not WriteLine
Well you're using TextWriter.WriteLine(string) which appends \r\n after strOutput. As the docs say:
Writes a string followed by a line terminator to the text stream.
(Emphasis mine.) That has nothing to do with 1024 characters though - my guess is that that's how you're reading it in (e.g. with a buffer of 1024 characters).
To avoid the extra line break, just use
textWriter.Write(strOutput);
EDIT: You say in the comment that you need a line break after "the full line has been written out" - but it sounds like strOutput isn't always the same line.
I suspect the easiest way of accomplishing what you want is to separate the "copying" side out from the "line break" side. Use Write for all the text you want to copy, and then just call
textWriter.WriteLine();
when you want a line break. If this doesn't help, I think we're going to need more context - please provide a code sample of exactly what you're doing.
I wrote a sample app that writes and read a 1025 character string. The size never changes. Although if I opened it with notepad.exe (Windows) I can see the extra character in the second line. These seems like a notepad limitation. Here is my sample code
static void Main(string[] args)
{
using (TextWriter streamWriter = new StreamWriter("lineLimit.txt")) {
String s=String.Empty;
for(int i=0;i<1025;i++){
s+= i.ToString().Substring(0,1);
}
streamWriter.Write(s);
streamWriter.Close();
}
using (TextReader streamReader = new StreamReader("lineLimit.txt"))
{
String s = streamReader.ReadToEnd();
streamReader.Close();
Console.Out.Write(s.Length);
}
}
if you need to add the line breaks at the end of your output just append them.
textWriter.Write(strOutput +"\r\n");

Categories