Although this is a basic example, there are so many questions around escaping quote marks that this basic question about string variables seems to get lost in the 'noise'.
For purposes of this question, C# is always in the context of Visual Studio C#, in this case Visual Studio-2019.
In C#, both the variable in the string that I want to test for a pattern match and the string that contains the pattern are surrounded by quote marks. These quote marks are present in the C# program-code and the debugger string variable values as well. This seems to be inevitable.
Since these quote marks are part of the C# string variables themselves, I would hope that they would just be ignored by regex as part of the standard syntax.
This appears to be the case.
However, I want to verify that this works correctly, and how it works.
Example:
string ourTestString = "Smith";
string ourRegexToMatch = "^(Sm)";
Regex ourRegexVar = new Regex(ourRegexToMatch, RegexOptions.Singleline);
var matchColleciton = ourRegexVar.Matches(ourTestString);
bool ourMatch = matchColleciton.Count == 1;
The intent is to match for Sm at the beginning of the line and it is currently case sensitive.
In the above code, ourMatch is indeed true, as expected/hoped for.
It appears in the debugger that the ourRegexVar itself does not have the quote marks that surround the C# variable. There are curly brackets around everything which I would suppose is standard for such Regex variables.
One could easily imagine complex scenarios that involve strings that really do have quotation marks and escaped quotation marks and so forth, so it could get much more complicated than the above rather simple example.
My question is:
For purposes of regex and C# variables is it ALWAYS the case that both for the ourTestString C# string variable and the ourRegexToMatch C# variable, it is exactly like the compiler-induced "" for a C# string variable are not there?
The quotes tell the compiler that it's a string. Strings in memory aren't delimited by quotes.
If you do have a quote in a string, like Sm"ith, you have to escape it, like so:
var s1 = "Sm\"ith";
var s2 = #"Sm""ith";
Related
How do I escape with the #-sign when using variables?
File.Delete(#"c:\test"); // WORKS!
File.Delete(#path); // doesn't work :(
File.Delete(#"c:\test"+path); // WORKS
Anyone have any idea? It's the 2nd example I want to use!
Strings prefixed with # character are called verbatim string literals (whose contents do not need to be escaped).
Therefore, you can only use # with string literals, not string variables.
So, just File.Delete(path); will do, after you assign the path in advance of course (from a verbatim string or some other string).
Verbatim strings are just a syntactic nicety to be able to type strings containing backslashes (paths, regexes) easier. The declarations
string path = "C:\\test";
string path = #"C:\test";
are completely identical in their result. Both result in a string containing C:\test. Note that either option is just needed because the C# language treats \ in strings as special.
The # is not some magic pixie dust needed to make paths work properly, it has a defined meaning when prefixed to strings, in that the strings are interpreted without the usual \ escape sequences.
The reason your second example doesn't work like you expect is that # prefixed to a variable name does something different: It allows you to use reserved keywords as identifiers, so that you could use #class as an identifier, for example. For identifiers that don't clash with keywords the result is the same as without.
If you have a string variable containing a path, then you can usually assume that there is no escaping needed at all. After all it already is in a string. The things I mentioned above are needed to get text from source code correctly through the compiler into a string at runtime, because the compiler has different ideas. The string itself is just data that's always represented the same.
This still means that you have to initialise the string in a way that backslashes survive. If you read it from somewhere no special treatment should be necessary, if you have it as a constant string somewhere else in the code, then again, one of the options at the top has to be used.
string path = #"c:\test";
File.Delete(path);
This will work only on a string. The "real" string is "c:\\test".
Read more here.
There's a major problem with your understanding of the # indicator.
#"whatever string" is a literal string specifier verbatim string literal. What it does is tells the C# compiler to not look for escape sequences. Normally, "\" is an escape sequence in a string, and you can do things like "\n" to indicate a new line or "\t" to indicate a tab. However, if you have #"\n", it tells the compiler "no, I really want to treat the backslash as a backslash character, not an escape sequence."
If you don't like literal mode, the way to do it is to use "\\" anywhere you want a single backslash, because the compiler knows to treat an escaped backslash as the single character.
In either case, #"\n" and "\\n" will produce a 2-character string in memory, with the characters '\' and 'n'. It doesn't matter which way you get there; both are ways of telling the compiler you want those two characters.
In light of this, #path makes no sense, because you don't have any literal characters - just a variable. By the time you have the variable, you already have the characters you want in memory. It does compile ok, as explained by Joey, but it's not logically what you're looking for.
If you're looking for a way to get rid of occurrences of \\ within a variable, you simply want String.Replace:
string ugly = #"C:\\foo";
ugly = ugly.Replace(#"\\", #"\");
First and third are actual paths hence would work.
Second would not even compile and would work if
string path = #"c:\test";
File.Delete(path);
I have built a parser in Sprache and C# for files using a format I don't control. Using it I can correctly convert:
a = "my string";
into
my string
The parser (for the quoted text only) currently looks like this:
public static readonly Parser<string> QuotedText =
from open in Parse.Char('"').Token()
from content in Parse.CharExcept('"').Many().Text().Token()
from close in Parse.Char('"').Token()
select content;
However the format I'm working with escapes quotation marks using "double doubles" quotes, e.g.:
a = "a ""string"".";
When attempting to parse this nothing is returned. It should return:
a ""string"".
Additionally
a = "";
should be parsed into a string.Empty or similar.
I've tried regexes unsuccessfully based on answers like this doing things like "(?:[^;])*", or:
public static readonly Parser<string> QuotedText =
from content in Parse.Regex("""(?:[^;])*""").Token()
This doesn't work (i.e. no matches are returned in the above cases). I think my beginners regex skills are getting in the way. Does anybody have any hints?
EDIT: I was testing it here - http://regex101.com/r/eJ9aH1
If I'm understanding you correctly, this is the kind of regex you're looking for:
"(?:""|[^"])*"
See the demo.
1. " matches an opening quote
2. (?:""|[^"])* matches two quotes or any chars that are not a quote (including newlines), repeating
3. " matches the closing quote.
But it's always going to boil down to whether your input is balanced. If not, you'll be getting false positives. And if you have a string such as "string"", which should be matched?"string"",""`, or nothing?... That's a tough decision, one that, fortunately, you don't have to make if you are sure of your input.
You can likely adapt your desired output from this pattern:
"(.+".+")"|(".+?")|("")
example:
http://regex101.com/r/lO1vZ4
If you only want to ignore consecutive double quotes, try this:
("{2,})
Live demo
This regex "("+) might help you to match extra unwanted double quotes.
here is the DEMO
Perhaps I didn't see or understand any of the answers I read but I am having trouble using verbatim string literal (#) with settings.Default.(mysetting). I am trying to do something like
Directory.GetFiles(#Setting.Default.(mysetting),"*.txt");
and cant seem to find the right syntax to make this work.
The # identifies a string constant literal where back slashes should not be interpreted as escape signs. You can not use it in front of method invocations as you attempt here.
A valid assignment might be
string path = #"c:\temp\example.txt";
Usually a \t would be interpreted as a tabulation character thus making the file reference illegal. It is exactly identical to
string path = "c:\\temp\\example.txt" ;
But bit easier to read.
# verbatim string is used with string literals. So your code should be:
Directory.GetFiles(Setting.Default.(mysetting),#"*.txt");
because "*.txt" is the string literal in your code.
(Although not related, but you can use # with variable names see C# Variable Naming and the # Symbol)
To use # as part of a verbatim string literal, the string literal must be right there - not just a property, method, etc. that returns a string.
string myStr = #"I'm verbatim, I contain a literal \n";
string myStr2 = "I'm not\nI have a newline";
string myStr3 = #myStr2; // still contains a newline, not a literal "\n"
Using # in front of an identifier allows you to use reserved keywords as identifiers. For example:
string #if = "hello!"; // valid
It also works on non-reserved words, where it has no real effect.
string #myVar = "hello!"; // valid
string newVar = myVar; // can be referred to either way
Unless I'm missing it, you still need to wrap the string within quotation marks.
I have description field which is:
16" Alloy Upgrade
In CSV format it appears like this:
"16"" Alloy Upgrade "
What would be the best use of regex to maintain the original format? As I'm learning I would appreciate it being broke down for my understanding.
I'm already using Regex to split some text separating 2 fields which are: code, description. I'm using this:
,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))
My thoughts are to remove the quotes, then remove the delimiter excluding use in sentences.
Thanks in advance.
If you don't want to/can't use a standard CSV parser (which I'd recommend), you can strip all non-doubled quotes using a regex like this:
Regex.Replace(text, #"(?!="")""(?!"")",string.Empty)
That regex will match every " character not preceded or followed by another ".
I wouldn't use regex since they are usually confusing and totally unclear what they do (like the one in your question for example). Instead this method should do the trick:
public string CleanField(string input)
{
if (input.StartsWith("\"") && input.EndsWith("\""))
{
string output = input.Substring(1,input.Length-2);
output = output.Replace("\"\"","\"");
return output;
}
else
{
//If it doesn't start and end with quotes then it doesn't look like its been escaped so just hand it back
return input;
}
}
It may need tweaking but in essence it checks if the string starts and ends with a quote (which it should if it is an escaped field) and then if so takes the inside part (with the substring) and then replaces double quotes with single quotes. The code is a bit ugly due to all the escaping but there is no avoiding that.
The nice thing is this can be used easily with a bit of Linq to take an existing array and convert it.
processedFieldArray = inputfieldArray.Select(CleanField).ToArray();
I'm using arrays here purely because your linked page seems to use them where you are wanting this solution.
I am using .NET (C#) code to write to a database that interfaces with a Perl application. When a single quote appears in a string, I need to "escape" it. IOW, the name O'Bannon should convert to O\'Bannon for the database UPDATE. However, all efforts at string manipulation (e.g. .Replace) generate an escape character for the backslash and I end up with O\\'Bannon.
I know it is actually generating the second backslash, because I can read the resulting database field's value (i.e. it is not just the IDE debug value for the string).
How can I get just the single backslash in the output string?
R
Well I did
"O'Bannon".Replace("'","\\'")
and result is
"O\'Bannon"
Is this what you want?
You can use "\\", which is the escape char followed by a backslash.
See the list of Escape Sequences here: http://msdn.microsoft.com/en-us/library/h21280bw.aspx
even better assign a var to the replace so that you can check it as well if needed
var RepName = "O'Bannon";
var Repstr = RepName.Replace("'","\\'");
You can also use a verbatim string
s = s.Replace("'", #"\'");