How to carriage return C# string without \r\n? - c#

This is my problem.
A user can enter text into a text area in the browser. Which is then emailed out to users.
What I want to know is that how do I handle carriage return? If I enter \r\n for carriage return, the email (which is a plain text email) has actual \r\n in it.
In other words:
On the SQL server end
Case 1:
if I do this before the email gets sent
(notice the line break after line 1)
update emails
set
body='line 1
line 2'
where
id=100
the email goes out correctly
Case 2:
update emails
set
body='line 1'+char(13) + char(10) +'line 2'
where
id=100
This email also goes out correctly
Case 3:
However if I do this
update emails
set
body='line 1 \r\n line 2',
where
id=100
the email would have the actual text \r\n in it.
How do I simulate case 1/2 through c# ?

SQL literals (at least those in SQL Server) do not support such escape sequences (although you can just hit enter within the string literal so that it spans multiple lines). See this answer for some alternatives if writing it as an SQL string is a requirement.
If running the SQL programmatically from C#, use parameters which will handle this just fine:
sqlCommand.CommandText = "update emails set body=#body where id=#id"
sqlCommand.Parameters.AddWithValue("#body", "line 1 \r\n line2");
Note that the handling of the string literal (and conversion of the \r and \n character escape sequences) happens in C# and the value (with CR and LF characters) is passed to SQL.
If the above didn't address the problem, keep reading.
4.10.13 The textarea element:
For historical reasons, the element's value is normalised in three different ways for three different purposes. The raw value is the value as it was originally set. It is not normalized. The API value is the value used in the value IDL attribute. It is normalized so that line breaks use "LF" (U+000A) characters. Finally, there is the form submission value. [Upon form submission the textarea] is normalized so that line breaks use U+000D CARRIAGE RETURN U+000A LINE FEED (CRLF) character pairs, and in addition, if necessary given the element's wrap attribute, additional line breaks are inserted to wrap the text at the given width.
Note that CR and LF represent characters and not the two-character sequence of \ followed by either the r or n characters - this form is often found in string literals. If it appears as such then something is doing the incorrect conversion and putting (or leaving) the \ there. Or, perhaps there is some misguided "add slashes" hack somewhere?
As pointed out, while URL decode is likely wrong, it won't directly do this conversion. However, if the conversion happened previously before being "URL Encoded", then it will (correctly) decode to (incorrect) values.
In either case, it's a bug. So find out where the incorrect data conversion is introduced and fix it (attach a debugger and/or monitor the network traffic for clues) - the required information to isolate where is simply not present in the post.

Use whatever c#'s string replace method is to replace "\\r\\n" with "\r\n" and that should fix it.

Related

Field and text delimiters within cells in csv files

This is likely a very basic question that I could not, despite trying, find a satsifying answer to. Feel free to skip to the question at the end if you aren't interested in the background.
The task:
I wish to create an easy localisation solution for my unity projects. After some initial research I concluded it would be best to use a .csv file read by a streamreader, so that translators would only ever have to interact with the csv table, where information is neatly organized.
The main problem:
Due to the nature of the text, I need to account for linebreaks and special characters in the actual fields. As such I could not use the normal readLine() method.
This I worked with by using Read() and checking if a linebreak is within a text delimiter bracket. But as I check for the text delimiter, I am afraid it might run into an un-escaped delimiter part of the normal in-cell text (since the normal text delimiter is quotation marks).
So I switched the delimiter to §. But now every time I open the file I have to re-enter § as a text delimiter in OpenOfficeCalc, probably due to encoding differences. Which is annoying but not the end of the world.
My question:
How does OpenOffice (or similar software) usually tell in-cell commas/quotation marks apart from the ones used as delimiters? If I knew that, I could probably incorporate a similar approach in my reading of the file.
I've tried to look at the files with NotePad++, revealing a difference in linebreaks (/r instead of /r/n) and obviously it's within a text delimiter bracket, but when it comes to how it seperates its delimiters from ones just entered in the text/field, I am drawing a blank.
Translation file in OpenOffice Calc:
Translation file in NotePad++, showing all characters:
I'd appreciate any insight or links on the topic.
From https://en.wikipedia.org/wiki/Comma-separated_values:
The CSV file format is not fully standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain commas or even embedded line breaks.
LibreOffice Calc has a reasonable way to handle these things.
Use LF for line breaks and CR at the end of each record. It seems your code already handles this.
Use quotes to delimit strings when needed. If the string contains one or more quotes, then duplicate the quote to make it literal.
From the example in your question, it looks like you told Calc not to use any quotes as string delimiters. Why did you do this? When I tried it, LibreOffice (or Apache OpenOffice) showed the fields in different columns after opening the file saved that way.
The following example CSV file has fields that contain commas, quotes and line breaks.
When viewed in Calc:
A B
--------- --
1 | 1,",2", 3
--------- --
2 | a c
| b
Calc correctly reads and saves the file as shown below. Settings when saving are Field delimiter , and String delimiter " which are the defaults.
"1,"",2"",",3[CR]
"a
b",c[CR]

Create multi line string object from string array - C#

I am trying to create multi-line string from string array
like
#"A minErr message has two parts:
the message itself and the url that contains the encoded message.
The message's parameters can contain other error messages which also include error urls.";
and I have string array of these line
string [] miltilines={"A minErr message has two parts:","the message itself and the url that contains the encoded message.","The message's parameters can contain other error messages which also include error urls."}
I have tried multiple approaches to get multiline string object but ended with \r\n in string object
1: String.Join(Environment.NewLine, miltilines)
2:
string multiline = #" ";
foreach (var item in miltilines)
{
multiline += #"" + item.Trim() + " ";
}
3:
StringBuilder stringBuilder = new StringBuilder();
foreach (var item in miltilines)
{
stringBuilder.AppendLine(item );
}
Is there any way to get multi line string object from string array
If you test your original code by assigning it into a variable:
var value = #"A minErr message has two parts:
the message itself and the url that contains the encoded message.
The message's parameters can contain other error messages which also include error
urls.";
Then go to the debugger window and inspect the value variable, you will find this:
"A minErr message has two parts: \r\n the message itself and the url that contains the encoded message.\r\n The message's parameters can contain other error messages which also include error urls."
\r and \n are escape sequences:
Escape sequences are typically used to specify actions such as
carriage returns and tab movements on terminals and printers. They are
also used to provide literal representations of nonprinting characters
and characters that usually have special meanings, such as the double
quotation mark (").
Approach #1 should work fine. Environment.NewLine represents nonprinting characters:
A string containing "\r\n" for non-Unix platforms, or a string
containing "\n" for Unix platforms.
So they will not print as \r or \n but rather will be interpreted as a carriage return and line feed operations, respectively.
With strings there can be a huge gap between actual data and how it is represented to the user. Including you, the user of a debugger. If you do not know how to interpret a string, you can not get the letters or even figure out where the dang thing ends:
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
It is not uncommen that the Debugger and Windows/the programm have very different ideas how any given string should be interpreted. Your example:
"A minErr message has two parts:/r/n the message itself and the url that contains the encoded message./r/n The message's parameters can contain other error messages which also include error urls." Is just one possible interpreation of what you expect the string to be.
The debugger uses tooltips. Those tooltips do not support multiple lines. This can have technical reasons (tooltips do not support newlines/did not when first written) or useage case reasons (it would make Tolltips hard to read with strings that have a lot of newlines).
/r/n are the "Carriage Return" and "Newline" characters on a Windows. Lacking the ability to actually display the content over multiple line (like a multiline textbox), the Debugger does the next best thing: It displays the newline Escape Characters as part of the string. Any code that actually can display text in multiple lines will properly interpret those characters as what they are supposed to represent.

String processing / CSV challenge

Having used SQL Server Bulk insert of CSV file with inconsistent quotes (CsvToOtherDelimiter option) as my basis, I discovered a few weirdnesses with the RemoveCSVQuotes part [it chopped the last char from quoted strings that contained a comma!]. So.. rewrote that bit (maybe a mistake?)
One wrinkle is that the client has asked 'what about data like this?'
""17.5179C,""
I assume if I wanted to keep using the CsvToOtherDelimiter solution, I'd have to amend the RegExp...but it's WAY beyond me... what's the best approach?
To clarify: we are using C# to pre-process the file into a pipe-delimited format prior to running a bulk insert using a format file. Speed is pretty vital.
The accepted answer from your link starts with:
You are going to need to preprocess the file, period.
Why not transform your csv to xml? Then you would be able to verify your data against an xsd before storing into a database.
To convert a CSV string into a list of elements, you could write a program that keeps track of state (in quotes or out of quotes) as it processes the string one character at a time, and emits the elements it finds. The rules for quoting in CSV are weird, so you'll want to make sure you have plenty of test data.
The state machine could go like this:
scan until quote (go to 2) or comma (go to 3)
if the next character is a quote, add only one of the two quotes to the field and return to 1. Otherwise, go to 4 (or report an error if the quote isn't the first character in the field).
emit the field, go to 1
scan until quote (go to 5)
if the next character is a quote, add only one of the two quotes to the field and return to 4. Otherwise, emit the field, scan for a comma, and go to 1.
This should correctly scan stuff like:
hello, world, 123, 456
"hello world", 123, 456
"He said ""Hello, world!""", "and I said hi"
""17.5179C,"" (correctly reports an error, since there should be a
separator between the first quoted string "" and the second field
17.5179C).
Another way would be to find some existing library that does it well. Surely, CSV is common enough that such a thing must exist?
edit:
You mention that speed is vital, so I wanted to point out that (so long as the quoted strings aren't allowed to include line returns...) each line may be processed independently in parallel.
I ended up using the csv parser that I don't know we had already (comes as part of our code generation tool) - and noting that ""17.5179C,"" is not valid and will cause errors.

Newlines escaped unexpectedly in C#/ASP.NET 1.1 code

Can someone explain to me why my code:
string messageBody = "abc\n" + stringFromDatabaseProcedure;
where valueFromDatabaseProcedure is not a value from the SQL database entered as
'line1\nline2'
results in the string:
"abc\nline1\\nline2"
This has resulted in me scratching my head somewhat.
I am using ASP.NET 1.1.
To clarify,
I am creating string that I need to go into the body of an email on form submit.
I mention ASP.NET 1.1 as I do not get the same result using .NET 2.0 in a console app.
All I am doing is adding the strings and when I view the messageBody string I can see that it has escaped the value string.
Update
What is not helping me at all is that Outlook is not showing the \n in a text email correctly (unless you reply of forward it).
An online mail viewer (even the Exchange webmail) shows \n as a new line as it should.
I just did a quick test on a test NorthwindDb and put in some junk data with a \n in middle. I then queried the data back using straight up ADO.NET and what do you know, it does in fact escape the backslash for you automatically. It has nothing to do with the n it just sees the backslash and escapes it for you. In fact, I also put this into the db: foo"bar and when it came back in C# it was foo\"bar, it escaped that for me as well. My point is, it's trying to preserve the data as is on the SQL side, so it's escaping what it thinks it needs to escape. I haven't found a setting yet to turn that off, but if I do I'll let you know...
ASP.NET would use <br /> to make linebreaks. \n would work with Console Applications or Windows Forms applications. Are you outputting it to a webpage?
Method #1
string value = "line1<br />line2";
string messageBody = "abc<br />" + value;
If that doesn't work, try:
string value = "line1<br>line2";
string messageBody = "abc<br>" + value;
Method #2
Use System.Environment.NewLine:
string value = "line1"+ System.Environment.NewLine + "line2";
string messageBody = "abc" System.Environment.NewLine + value;
One of these ways is guaranteed to work. If you're outputting a string to a Webpage (or an email, or a form submit), you'd have to use one of the ways I mentioned. The \n will never work there.
You need to set a watch and see where exactly your database result string gets double escaped.
Adding two strings together will never double escape strings, so its either happening before that, or after that.
When I get the string out of the database, .NET escapes it automagically. However, the little # symbol is appended to the string, which I did not notice.
So it appeared to be non-escaped to my "about to go on holiday" eye inside the ide.
Therefore when the non-escaped \n was added to the string (as the whole string is no longer escaped), it would remove the # and show the database portion of the string escaped.
Gah, it was all an illusion.
Perhaps that holiday is overdue.
Thanks for your input.
If the actual string stored in the database is (spaces added for emphasis): "l i n e 1 \ n l i n e 2", then whatever stored it there probably has a bug. But assuming that is the exact string there, then the "abc\nline1\nline2" string is what happens when you look at the string which would print as "abcline1\nline2" in a debugger which escapes it (this is a convenience, allowing you to copy-paste out of the debugger straight into code without errors).
Short answer: .NET is not escaping the string, your debugger is. The code which writes a literal "\n" into the database has a bug.

C#: How do you go upon constructing a multi-lined string during design time?

How would I accomplish displaying a line as the one below in a console window by writing it into a variable during design time then just calling Console.WriteLine(sDescription) to display it?
Options:
-t Description of -t argument.
-b Description of -b argument.
If I understand your question right, what you need is the # sign in front of your string. This will make the compiler take in your string literally (including newlines etc)
In your case I would write the following:
String sDescription =
#"Options:
-t Description of -t argument.";
So far for your question (I hope), but I would suggest to just use several WriteLines.
The performance loss is next to nothing and it just is more adaptable.
You could work with a format string so you would go for this:
string formatString = "{0:10} {1}";
Console.WriteLine("Options:");
Console.WriteLine(formatString, "-t", "Description of -t argument.");
Console.WriteLine(formatString, "-b", "Description of -b argument.");
the formatstring makes sure your lines are formatted nicely without putting spaces manually and makes sure that if you ever want to make the format different you just need to do it in one place.
Console.Write("Options:\n\tSomething\t\tElse");
produces
Options:
Something Else
\n for next line, \t for tab, for more professional layouts try the field-width setting with format specifiers.
http://msdn.microsoft.com/en-us/library/txafckwd.aspx
If this is a /? screen, I tend to throw the text into a .txt file that I embed via a resx file. Then I just edit the txt file. This then gets exposed as a string property on the generated resx class.
If needed, I embed standard string.Format symbols into my txt for replacement.
Personally I'd normally just write three Console.WriteLine calls. I know that gives extra fluff, but it lines the text up appropriately and it guarantees that it'll use the right line terminator for whatever platform I'm running on. An alternative would be to use a verbatim string literal, but that will "fix" the line terminator at compile-time.
I know C# is mostly used on windows machines, but please, please, please try to write your code as platform neutral. Not all platforms have the same end of line character. To properly retrieve the end of line character for the currently executing platform you should use:
System.Environment.NewLine
Maybe I'm just anal because I am a former java programmer who ran apps on many platforms, but you never know what the platform of the future is.
The "best" answer depends on where the information you're displaying comes from.
If you want to hard code it, using an "#" string is very effective, though you'll find that getting it to display right plays merry hell with your code formatting.
For a more substantial piece of text (more than a couple of lines), embedding a text resources is good.
But, if you need to construct the string on the fly, say by looping over the commandline parameters supported by your application, then you should investigate both StringBuilder and Format Strings.
StringBuilder has methods like AppendFormat() that accept format strings, making it easy to build up lines of format.
Format Strings make it easy to combine multiple items together. Note that Format strings may be used to format things to a specific width.
To quote the MSDN page linked above:
Format Item Syntax
Each format item takes the following
form and consists of the following
components:
{index[,alignment][:formatString]}
The matching braces ("{" and "}") are
required.
Index Component
The mandatory index component, also
called a parameter specifier, is a
number starting from 0 that identifies
a corresponding item in the list of
objects ...
Alignment Component
The optional alignment component is a
signed integer indicating the
preferred formatted field width. If
the value of alignment is less than
the length of the formatted string,
alignment is ignored and the length of
the formatted string is used as the
field width. The formatted data in
the field is right-aligned if
alignment is positive and left-aligned
if alignment is negative. If padding
is necessary, white space is used. The
comma is required if alignment is
specified.
Format String Component
The optional formatString component is
a format string that is appropriate
for the type of object being formatted
...

Categories