How do I escape with the #-sign when using variables?
File.Delete(#"c:\test"); // WORKS!
File.Delete(#path); // doesn't work :(
File.Delete(#"c:\test"+path); // WORKS
Anyone have any idea? It's the 2nd example I want to use!
Strings prefixed with # character are called verbatim string literals (whose contents do not need to be escaped).
Therefore, you can only use # with string literals, not string variables.
So, just File.Delete(path); will do, after you assign the path in advance of course (from a verbatim string or some other string).
Verbatim strings are just a syntactic nicety to be able to type strings containing backslashes (paths, regexes) easier. The declarations
string path = "C:\\test";
string path = #"C:\test";
are completely identical in their result. Both result in a string containing C:\test. Note that either option is just needed because the C# language treats \ in strings as special.
The # is not some magic pixie dust needed to make paths work properly, it has a defined meaning when prefixed to strings, in that the strings are interpreted without the usual \ escape sequences.
The reason your second example doesn't work like you expect is that # prefixed to a variable name does something different: It allows you to use reserved keywords as identifiers, so that you could use #class as an identifier, for example. For identifiers that don't clash with keywords the result is the same as without.
If you have a string variable containing a path, then you can usually assume that there is no escaping needed at all. After all it already is in a string. The things I mentioned above are needed to get text from source code correctly through the compiler into a string at runtime, because the compiler has different ideas. The string itself is just data that's always represented the same.
This still means that you have to initialise the string in a way that backslashes survive. If you read it from somewhere no special treatment should be necessary, if you have it as a constant string somewhere else in the code, then again, one of the options at the top has to be used.
string path = #"c:\test";
File.Delete(path);
This will work only on a string. The "real" string is "c:\\test".
Read more here.
There's a major problem with your understanding of the # indicator.
#"whatever string" is a literal string specifier verbatim string literal. What it does is tells the C# compiler to not look for escape sequences. Normally, "\" is an escape sequence in a string, and you can do things like "\n" to indicate a new line or "\t" to indicate a tab. However, if you have #"\n", it tells the compiler "no, I really want to treat the backslash as a backslash character, not an escape sequence."
If you don't like literal mode, the way to do it is to use "\\" anywhere you want a single backslash, because the compiler knows to treat an escaped backslash as the single character.
In either case, #"\n" and "\\n" will produce a 2-character string in memory, with the characters '\' and 'n'. It doesn't matter which way you get there; both are ways of telling the compiler you want those two characters.
In light of this, #path makes no sense, because you don't have any literal characters - just a variable. By the time you have the variable, you already have the characters you want in memory. It does compile ok, as explained by Joey, but it's not logically what you're looking for.
If you're looking for a way to get rid of occurrences of \\ within a variable, you simply want String.Replace:
string ugly = #"C:\\foo";
ugly = ugly.Replace(#"\\", #"\");
First and third are actual paths hence would work.
Second would not even compile and would work if
string path = #"c:\test";
File.Delete(path);
Is there any difference in how the C# compiler or .NET run-time handles verbatim string literals versus using escape sequences (i.e. performance) or is it just a matter of design time style? E.G.:
var pathA = "c:\\somewhere";
var pathB = #"c:\somewhere";
I would imagine they are compiled the same and it doesn't matter, but was just curious.
Any difference here is limited strictly to the compiler; the IL and runtime have no concept of verbatim vs escaped - it just has the string.
As for which to choose: whichever is more convenient ;p I almost always use verbatim string literals if there are unusual characters, as that allows for multi-line strings very easily and visually.
As an interesting case:
bool areSame = ReferenceEquals("c:\\somewhere", #"c:\somewhere"); // true
which tells are they are exactly the same string instance (thanks to "interning"). They aren't just equivalent; they are the same string instance to the runtime. It is therefore impossible that they can be (to the runtime) different in any way.
They are exactly the same. Try to decompile the two versions with a decompiler.
It's only a matter of convenience for developers when writing it in the code.
The # sign in front of a string tells the compiler to ignore any embeded
escape sequences.
string "\"" would yield a single double quote.
string "\" would yield a single back slash
string #"\" would yield two backslashes
I'm trying to extract information out of rc-files. In these files, "-chars in strings are escaped by doubling them ("") analog to c# verbatim strings. is ther a way to extract the string?
For example, if I have the following string "this is a ""test""" I would like to obtain this is a ""test"". It also must be non-greedy (very important).
I've tried to use the following regular expression;
"(?<text>[^""]*(""(.|""|[^"])*)*)"
However the performance was awful.
I'v based it on the explanation here: http://ad.hominem.org/log/2005/05/quoted_strings.php
Has anybody any idea to cope with this using a regular expression?
You've got some nested repetition quantifiers there. That can be catastrophic for the performance.
Try something like this:
(?<=")(?:[^"]|"")*(?=")
That can now only consume either two quotes at once... or non-quote characters. The lookbehind and lookahead assert, that the actual match is preceded and followed by a quote.
This also gets you around having to capture anything. Your desired result will simply be the full string you want (without the outer quotes).
I do not assert that the outer quotes are not doubled. Because if they were, there would be no way to distinguish them from an empty string anyway.
This turns out to be a lot simpler than you'd expect. A string literal with escaped quotes looks exactly like a bunch of simple string literals run together:
"Some ""escaped"" quotes"
"Some " + "escaped" + " quotes"
So this is all you need to match it:
(?:"[^"]*")+
You'll have to strip off the leading and trailing quotes in a separate step, but that's not a big deal. You would need a separate step anyway, to unescape the escaped quotes (\" or "").
Don't if this is better or worse than m.buettner's (guessing not - he seems to know his stuff) but I thought I'd throw it out there for critique.
"(([^"]+(""[^"]+"")*)*)"
Try this (?<=^")(.*?"{2}.*?"{2})(?="$)
it will be maybe more faster, than two previous
and without any bugs.
Match a " beginning the string
Multiple times match a non-" or two "
Match a " ending the string
"([^"]|(""))*?"
How to remove ,(comma) which is between "(double inverted comma) and "(double inverted comma). Like there is "a","b","c","d,d","e","f" and then from this, between " and " there is one comma which should be removed and after removing that comma it should be "a","b","c","dd","e","f" with the help of the regex in C# ?
EDIT: I forgot to specify that there may be double comma between quotes like "a","b","c","d,d,d","e","f" for it that regex does not work. and there can be any number of comma between quotes.
And there can be string like a,b,c,"d,d",e,f then there should be result like a,b,c,dd,e,f and if string like a,b,c,"d,d,d",e,f then result should be like a,b,c,ddd,e,f.
Assuming the input is as simple as your examples (i.e., not full-fledged CSV data), this should do it:
string input = #"a,b,c,""d,d,d"",e,f,""g,g"",h";
Console.WriteLine(input);
string result = Regex.Replace(input,
#",(?=[^""]*""(?:[^""]*""[^""]*"")*[^""]*$)",
String.Empty);
Console.WriteLine(result);
output: a,b,c,"d,d,d",e,f,"g,g",h
a,b,c,"ddd",e,f,"gg",h
The regex matches any comma that is followed by an odd number of quotation marks.
EDIT: If fields are quoted with apostrophes (') instead of quotation marks ("), the technique is exactly the same--except you don't have to escape the quotes:
string input = #"a,b,c,'d,d,d',e,f,'g,g',h";
Console.WriteLine(input);
string result = Regex.Replace(input,
#",(?=[^']*'(?:[^']*'[^']*')*[^']*$)",
String.Empty);
Console.WriteLine(result);
If some fields were quoted with apostrophes while others were quoted with quotation marks, a different approach would be needed.
EDIT: Probably should have mentioned this in the previous edit, but you can combine those two regexes into one regex that will handle either apostrophes or quotation marks (but not both):
#",(?=[^']*'(?:[^']*'[^']*')*[^']*$|[^""]*""(?:[^""]*""[^""]*"")*[^""]*$)"
Actually, it will handle simple strings like 'a,a',"b,b". The problem is that there would be nothing to stop you from using one of the quote characters in a quoted field of the other type, like '9" Nails' (sic) or "Kelly's Heroes". That's taking us into full-fledged CSV territory (if not beyond), and we've already established that we're not going there. :D
They're called regular expressions for a reason — they are used to process strings that meet a very specific and academic definition for what is "regular". It looks like you have some fairly typical csv data here, and it happens that csv strings are outside of that specific definition: csv data is not formally "regular".
In spite of this, it can be possible to use regular expressions to handle csv data. However, to do so you must either use certain extensions to normal regular expressions to make them Turing complete, know certain constraints about your specific csv data that is not promised in the general case, or both. Either way, the expressions required to do this are unwieldly and difficult to manage. It's often just not a good idea, even when it's possible.
A much better (and usually faster) solution is to use a dedicated CSV parser. There are two good ones hosted at code project (FastCSV and Linq-to-CSV), there is one (actually several) built into the .Net Framework (Microsoft.VisualBasic.TextFieldParser), and I have one here on Stack Overflow. Any of these will perform better and just plain work better than a solution based on regular expressions.
Note here that I'm not arguing it can't be done. Most regular expression engines today have the necessary extensions to make this possible, and most people parsing csv data know enough about the data they're handling to constrain it appropriately. I am arguing that it's slower to execute, harder to implement, harder to maintain, and more error-prone compared to a dedicated parser alternative, which is likely built into whichever platform you're using, and is therefore not in your best interests.
var input = "\"a\",\"b\",\"c\",\"d,d\",\"e\",\"f\"";
var regex = new Regex("(\"\\w+),(\\w+\")");
var output = regex.Replace(input,"$1$2");
Console.WriteLine(output);
You'd need to evaluate whether or not \w is what you want to use.
You can use this:
var result = Regex.Replace(yourString, "([a-z]),", "$1");
Sorry, after seeing your edits, regular expressions are not appropriate for this.
This should be very simple using Regex.Replace and a callback:
string pattern = #"
"" # open quotes
[^""]* # some not quotes
"" # closing quotes
";
data = Regex.Replace(data, pattern, m => m.Value.Replace(",", ""),
RegexOptions.IgnorePatternWhitespace);
You can even make a slight modification to allow escaped quotes (here I have \", and the comments explain how to use "":
string pattern = #"
\\. # escaped character (alternative is be """")
|
(?<Quotes>
"" # open quotes
(?:\\.|[^""])* # some not quotes or escaped characters
# the alternative is (?:""""|[^""])*
"" # closing quotes
)
";
data = Regex.Replace(data, pattern,
m => m.Groups["Quotes"].Success ? m.Value.Replace(",", "") : m.Value,
RegexOptions.IgnorePatternWhitespace);
If you need a single quote replace all "" in the pattern with a single '.
Something like the following, perhaps?
"(,)"
I have a helper class pulling a string from an XML file. That string is a file path (so it has backslashes in it). I need to use that string as it is... How can I use it like I would with the literal command?
Instead of this:
string filePath = #"C:\somepath\file.txt";
I want to do this:
string filePath = #helper.getFilePath(); //getFilePath returns a string
This isn't how I am actually using it; it is just to make what I mean a little clearer. Is there some sort of .ToLiteral() or something?
I don't think you have to worry about it if you already have the value. The # operator is for when you're specifying the string (like in your first code snippet).
What are you attempting to do with the path string that isn't working?
I'm not sure if I understand. In your example: if helper.getFilePath() returns "c:\somepath\file.txt", there will be no problem, since the # is only needed if you are explicitely specifying a string with "".
When Functions talk to each other, you will always get the literal path. If the XML contains c:\somepath\file.txt and your function returns c:\somepath\file.txt, then string filePath will also contain c:\somepath\file.txt as a valid path.
The #"" just makes it easier to write string literals.
string (C# Reference, MSDN)
Verbatim string literals start with # and are also enclosed in double quotation marks. For example:
#"good morning" // a string literal
The advantage of verbatim strings is that escape sequences are not processed, which makes it easy to write, for example, a fully qualified file name:
#"c:\Docs\Source\a.txt" // rather than "c:\\Docs\\Source\\a.txt"
One place where I've used it is in a regex pattern:
string pattern = #"\b[DdFf][0-9]+\b";
If you have a string in a variable, you do not need to make a "literal" out of it, since if it is well formed, it already has the correct contents.
In C# the # symbol combined with doubles quotes allows you to write escaped strings. E.g.
print(#"c:\mydir\dont\have\to\escape\backslashes\etc");
If you dont use it then you need to use the escape character in your strings.
http://msdn.microsoft.com/en-us/library/aa691090(VS.71).aspx
You dont need to specify it anywhere else in code. In fact doing so should cause a compiler error.
You've got it backwards. The #-operator is for turning literals into strings, while keeping all funky characters. Your path is already a string - you don't need to do anything at all to it. Just lose the #.
string filePath = helper.getFilePath();
The string returned from your helper class is not a literal string so you don't need to use the '#' character to remove the behaviour of the backslashes.