How do I write escape characters verbatim (without escaping) using StreamWriter? - c#

I'm writing a utility that takes in a .resx file and creates a javascript object containing properties for all the name/value pairs in the .resx file. This is all well and good, until one of the values in the .resx is
This dealer accepts electronic orders.
/r/nClick to order {0} from this dealer.
I'm adding the name/value pairs to the js object like this:
streamWriter.Write(string.Format("\n{0} : \"{1}\"", kvp.Key, kvp.Value));
When kvp.Value = "This dealer accepts electronic orders./r/nClick to order {0} from this dealer."
This causes StreamWriter.Write() to actually place a newline in between 'orders.' and 'Click', which naturally screws up my javascript output.
I've tried different things with # and without using string.Format, but I've had no luck. Any suggestions?
Edit: This application is run during build to get some javascript files deployed later, so at no point is it accessible to / run by anyone but the app developers. So while I obviously need a way to escape characters here, XSS as such is not really a concern.

Your problem has already happened by the time you get to this code. String.Format will not "expand" literal \n and \r in the substituted strings ({0} etc) into newline and CR, so it must have happened at some earlier point, possibly while reading the .resx file.
You have two possible solutions. One, as you discovered in the comments to DonaldRay's answer, is to explicitly reverse this replacement, and replace literal newlines with the two characters \n:
kvp.Value.Replace("\r", // <-- replaced by the C# compiler with a literal CR character
"\\r"); // <-- "\\" replaced by the C# compiler with a single "\",
// leaving the two-char string "\r"
You will need to do the same for every character that could appear in your strings. \n and \r are the most common, and then \t (tab); that's probably enough for most dev tools.
string formatted = kvp.Value.Replace("\r", "\\r")
.Replace("\n", "\\n")
.Replace("\t", "\\t");
Alternatively, you could look upstream at the .resx file reading code, and try to find and remove the part that's explicitly expanding these character sequences. This would be a better general solution, if it's possible.

You need to escape the strings, using Microsoft's Anti-XSS Library.

Just escape the backslashes.
kvp.Value = kvp.Value.Replace(#"\", #"\\");
You may need to do this when you are reading from the resx file.

Related

Field and text delimiters within cells in csv files

This is likely a very basic question that I could not, despite trying, find a satsifying answer to. Feel free to skip to the question at the end if you aren't interested in the background.
The task:
I wish to create an easy localisation solution for my unity projects. After some initial research I concluded it would be best to use a .csv file read by a streamreader, so that translators would only ever have to interact with the csv table, where information is neatly organized.
The main problem:
Due to the nature of the text, I need to account for linebreaks and special characters in the actual fields. As such I could not use the normal readLine() method.
This I worked with by using Read() and checking if a linebreak is within a text delimiter bracket. But as I check for the text delimiter, I am afraid it might run into an un-escaped delimiter part of the normal in-cell text (since the normal text delimiter is quotation marks).
So I switched the delimiter to §. But now every time I open the file I have to re-enter § as a text delimiter in OpenOfficeCalc, probably due to encoding differences. Which is annoying but not the end of the world.
My question:
How does OpenOffice (or similar software) usually tell in-cell commas/quotation marks apart from the ones used as delimiters? If I knew that, I could probably incorporate a similar approach in my reading of the file.
I've tried to look at the files with NotePad++, revealing a difference in linebreaks (/r instead of /r/n) and obviously it's within a text delimiter bracket, but when it comes to how it seperates its delimiters from ones just entered in the text/field, I am drawing a blank.
Translation file in OpenOffice Calc:
Translation file in NotePad++, showing all characters:
I'd appreciate any insight or links on the topic.
From https://en.wikipedia.org/wiki/Comma-separated_values:
The CSV file format is not fully standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain commas or even embedded line breaks.
LibreOffice Calc has a reasonable way to handle these things.
Use LF for line breaks and CR at the end of each record. It seems your code already handles this.
Use quotes to delimit strings when needed. If the string contains one or more quotes, then duplicate the quote to make it literal.
From the example in your question, it looks like you told Calc not to use any quotes as string delimiters. Why did you do this? When I tried it, LibreOffice (or Apache OpenOffice) showed the fields in different columns after opening the file saved that way.
The following example CSV file has fields that contain commas, quotes and line breaks.
When viewed in Calc:
A B
--------- --
1 | 1,",2", 3
--------- --
2 | a c
| b
Calc correctly reads and saves the file as shown below. Settings when saving are Field delimiter , and String delimiter " which are the defaults.
"1,"",2"",",3[CR]
"a
b",c[CR]

How do you correctly escape a document name in .NET?

We store a bunch of weird document names on our web server (people upload them) that have various characters like spaces, ampersands, etc. When we generate links to these documents, we need to escape them so the server can look up the file by its raw name in the database. However, none of the built in .NET escape functions will work correctly in all cases.
Take the document Hello#There.docx:
UrlEncode will handle this correctly:
HttpUtility.UrlEncode("Hello#There");
"Hello%23There"
However, UrlEncode will not handle Hello There.docx correctly:
HttpUtility.UrlEncode("Hello There.docx");
"Hello+There.docx"
The + symbol is only valid for URL parameters, not document names. Interestingly enough, this actually works on the Visual Studio test web server but not on IIS.
The UrlPathEncode function works fine for spaces:
HttpUtility.UrlPathEncode("Hello There.docx");
"Hello%20There.docx"
However, it will not escape other characters such as the # character:
HttpUtility.UrlPathEncode("Hello#There.docx");
"Hello#There.docx"
This link is invalid as the # is interpreted as a URL hash and never even gets to the server.
Is there a .NET utility method to escape all non-alphanumeric characters in a document name, or would I have to write my own?
Have a look at the Uri.EscapeDataString Method:
Uri.EscapeDataString("Hello There.docx") // "Hello%20There.docx"
Uri.EscapeDataString("Hello#There.docx") // "Hello%23There.docx"
I would approach it a different way: Do not use the document name as key in your look-up - use a Guid or some other id parameter that you can map to the document name on disk in your database. Not only would that guarantee uniqueness but you also would not have this problem of escaping in the first place.
You can use # character to escape strings. See the below pieces of code.
string str = #"\n\n\n\n";
Console.WriteLine(str);
Output: \n\n\n\n
string str1 = #"\df\%%^\^\)\t%%";
Console.WriteLine(str1);
Output: \df\%%^\^)\t%%
This kind of formatting is very useful for pathnames and for creating regexes.

c# equivalent to stripcslashes function?

I am working with a project that includes getting MMS from a mms-gateway and storing the image on disk.
This includes using a received base64encoded string and storing it as a zip to a web server. This zip is then opened, and the image is retrieved.
We have managed to store it as a zip file, but it is corrupted and cannot be opened.
The documentation from the gateway is pretty sparse, and we have only a php example to rely on. I think we have figured out how to "translate" most of it, except for the PHP function stripcslashes(inputvalue). Can anyone shed shed any light on how to do the same thing in c#?
We are thankful for any help!
stripcslashes() looks for "\x" type elements within longer strings (where 'x' could be any character, or perhaps, more than one). If the 'x' is not recognised as meaningful, it just removes the '\' but if it does recognise it as a valid C-style escape sequence (i.e. "\n" is newline; "\t" is tab, etc.), as I understand it, the recognised character is inserted instead: \t will be replaced by a tab character (0x09, I think) in your string.
I'm not aware of any simple way to get the .net framework to do the same thing without building a similar function yourself. This obviously isn't very hard, but you need to know which escape sequences to process.
If you happen to know (or find out by inspecting your base64 text) that the only thing in your input that will need processing is a particular one or two sequences (say, tab characters), it becomes very easy and the following snippet shows use of String.Replace():
string input = #"Some\thing"; // '#' means string stored without processing '\t'
Console.WriteLine(input);
string output = input.Replace(#"\t", "\t");
Console.WriteLine(output);
Of course, if you really do simply want to remove all the slashes:
string output = input.Replace(#"\", "");

C# Special Characters in String Crashing Program

I have a slight problem with a path:
"D:\\Music\\DJ Ti%C3%ABsto\\Tiesto\\Adagio For Strings (Spirit of London).mp3"
"D:\\Music\\Dj Tiësto\\Tiesto\\Adagio For Strings (Spirit of London).mp3"
Currently, when it sends that path to my Audio Library, it cannot open the path. (the reason for it crashing is trying to assign a -1 to a trackbar...but it's irrelevant).
So I'm wondering, is there anyway to prevent C# from switching special characters with %[code]? I've done a .Replace for "[" and "]", but I rather not have to look up every single special character, and add a line of code to prevent it. Is there anyway around this?
Call Uri.UnescapeDataString.
By the way, when putting paths in strings, you can put an # sign before the string to tell the compiler not to process escape codes, like this: #"D:\Music\DJ Tiësto\Tiesto\Adagio For Strings (Spirit of London).mp3". This way, you don't need to double up every backslash.

C#: How do you go upon constructing a multi-lined string during design time?

How would I accomplish displaying a line as the one below in a console window by writing it into a variable during design time then just calling Console.WriteLine(sDescription) to display it?
Options:
-t Description of -t argument.
-b Description of -b argument.
If I understand your question right, what you need is the # sign in front of your string. This will make the compiler take in your string literally (including newlines etc)
In your case I would write the following:
String sDescription =
#"Options:
-t Description of -t argument.";
So far for your question (I hope), but I would suggest to just use several WriteLines.
The performance loss is next to nothing and it just is more adaptable.
You could work with a format string so you would go for this:
string formatString = "{0:10} {1}";
Console.WriteLine("Options:");
Console.WriteLine(formatString, "-t", "Description of -t argument.");
Console.WriteLine(formatString, "-b", "Description of -b argument.");
the formatstring makes sure your lines are formatted nicely without putting spaces manually and makes sure that if you ever want to make the format different you just need to do it in one place.
Console.Write("Options:\n\tSomething\t\tElse");
produces
Options:
Something Else
\n for next line, \t for tab, for more professional layouts try the field-width setting with format specifiers.
http://msdn.microsoft.com/en-us/library/txafckwd.aspx
If this is a /? screen, I tend to throw the text into a .txt file that I embed via a resx file. Then I just edit the txt file. This then gets exposed as a string property on the generated resx class.
If needed, I embed standard string.Format symbols into my txt for replacement.
Personally I'd normally just write three Console.WriteLine calls. I know that gives extra fluff, but it lines the text up appropriately and it guarantees that it'll use the right line terminator for whatever platform I'm running on. An alternative would be to use a verbatim string literal, but that will "fix" the line terminator at compile-time.
I know C# is mostly used on windows machines, but please, please, please try to write your code as platform neutral. Not all platforms have the same end of line character. To properly retrieve the end of line character for the currently executing platform you should use:
System.Environment.NewLine
Maybe I'm just anal because I am a former java programmer who ran apps on many platforms, but you never know what the platform of the future is.
The "best" answer depends on where the information you're displaying comes from.
If you want to hard code it, using an "#" string is very effective, though you'll find that getting it to display right plays merry hell with your code formatting.
For a more substantial piece of text (more than a couple of lines), embedding a text resources is good.
But, if you need to construct the string on the fly, say by looping over the commandline parameters supported by your application, then you should investigate both StringBuilder and Format Strings.
StringBuilder has methods like AppendFormat() that accept format strings, making it easy to build up lines of format.
Format Strings make it easy to combine multiple items together. Note that Format strings may be used to format things to a specific width.
To quote the MSDN page linked above:
Format Item Syntax
Each format item takes the following
form and consists of the following
components:
{index[,alignment][:formatString]}
The matching braces ("{" and "}") are
required.
Index Component
The mandatory index component, also
called a parameter specifier, is a
number starting from 0 that identifies
a corresponding item in the list of
objects ...
Alignment Component
The optional alignment component is a
signed integer indicating the
preferred formatted field width. If
the value of alignment is less than
the length of the formatted string,
alignment is ignored and the length of
the formatted string is used as the
field width. The formatted data in
the field is right-aligned if
alignment is positive and left-aligned
if alignment is negative. If padding
is necessary, white space is used. The
comma is required if alignment is
specified.
Format String Component
The optional formatString component is
a format string that is appropriate
for the type of object being formatted
...

Categories