I have the following code using pattern matching with a property pattern that always succeeds:
var result = new { Message = "Hello!" } is { Message: string message };
Console.WriteLine(message);
The above match succeeds(result is true). However, when I try to print message, I get a:
CS0165: Use of unassigned local variable.
However, the following works:
var result = "Hello!" is { Length: int len };
Console.WriteLine(len);
Also, when I use it with an if, it just works fine:
if (new { Message = "Hello!"} is { Message: string message })
{
Console.WriteLine(message);
}
Why the discrepancies?
Why the discrepancies?
Well, it's a compile-time error, and at compile-time, it doesn't know that it would match. As far as the compiler is concerned, it's absolutely valid for result to be false, and message to be unassigned - therefore it's a compile-time error to try to use it.
In your second example, the compiler knows that it will always match because every string has a length, and in the third example you're only trying to use the pattern variable if it's matched.
The only oddity/discrepancy here is that the compiler doesn't know that new { Message = "Hello!" } will always match { Message: string message }. I can understand that, in that Message could be null for the same anonymous type... slightly more oddly, it looks like it doesn't know that it will always match { Message: var message } either (which should match any non-null reference to an instance of the anonymous type, even if Message is null).
I suspect that for your string example, the compiler is "more aware" that a string literal can't be null, therefore the pattern really, really will always match - whereas for some reason it doesn't do that for the anonymous type.
I note that if you extract the string literal to a separate variable, the compiler doesn't know it will always match:
string text = "hello";
var result = text is { Length: int length };
// Use of unassigned local variable
Console.WriteLine(length);
I suspect this is because a string literal is a constant expression, and that the compiler effectively has more confidence about what it can predict for constant expressions.
I agree that it does look a bit odd, but I'd be surprised if this actually affected much real code - because you don't tend to pattern match against string literals, and I'd say it's the string literal match that is the unexpected part here.
Related
the interpolated string is easy, just a string lead with $ sign. But what if the string template is coming from outside of your code. For example assume you have a XML file containing following line:
<filePath from="C:\data\settle{date}.csv" to="D:\data\settle{date}.csv"/>
Then you can use LINQ to XML read the content of the attributes in.
//assume the ele is the node <filePath></filePath>
string pathFrom = ele.Attribute("from").value;
string pathTo = ele.Attibute("to").value;
string date = DateTime.Today.ToString("MMddyyyy");
Now how can I inject the date into the pathFrom variable and pathTo variable?
If I have the control of the string itself, things are easy. I can just do var xxx=$"C:\data\settle{date}.csv";But now, what I have is only the variable that I know contains the placeholder date
String interpolation is a compiler feature, so it cannot be used at runtime. This should be clear from the fact that the names of the variables in the scope will in general not be availabe at runtime.
So you will have to roll your own replacement mechanism. It depends on your exact requirements what is best here.
If you only have one (or very few replacements), just do
output = input.Replace("{date}", date);
If the possible replacements are a long list, it might be better to use
output = Regex.Replace(input, #"\{\w+?\}",
match => GetValue(match.Value));
with
string GetValue(string variable)
{
switch (variable)
{
case "{date}":
return DateTime.Today.ToString("MMddyyyy");
default:
return "";
}
}
If you can get an IDictionary<string, string> mapping variable names to values you may simplify this to
output = Regex.Replace(input, #"\{\w+?\}",
match => replacements[match.Value.Substring(1, match.Value.Length-2)]);
You can't directly; the compiler turns your:
string world = "world";
var hw = $"Hello {world}"
Into something like:
string world = "world";
var hw = string.Format("Hello {0}", world);
(It chooses concat, format or formattablestring depending on the situation)
You could engage in a similar process yourself, by replacing "{date" with "{0" and putting the date as the second argument to a string format, etc.
SOLUTION 1:
If you have the ability to change something on xml template change {date} to {0}.
<filePath from="C:\data\settle{0}.csv" to="D:\data\settle{0}.csv" />
Then you can set the value of that like this.
var elementString = string.Format(element.ToString(), DateTime.Now.ToString("MMddyyyy"));
Output: <filePath from="C:\data\settle08092020.csv" to="D:\data\settle08092020.csv" />
SOLUTION 2:
If you can't change the xml template, then this might be my personal course to go.
<filePath from="C:\data\settle{date}.csv" to="D:\data\settle{date}.csv" />
Set the placeholder like this.
element.Attribute("to").Value = element.Attribute("to").Value.Replace("{date}", DateTime.Now.ToString("MMddyyyy"));
element.Attribute("from").Value = element.Attribute("from").Value.Replace("{date}", DateTime.Now.ToString("MMddyyyy"));
Output: <filePath from="C:\data\settle08092020.csv" to="D:\data\settle08092020.csv" />
I hope it helps. Kind regards.
If you treat your original string as a user-input string (or anything that is not processed by the compiler to replace the placeholder, then the question is simple - just use String.Replace() to replace the placehoder {date}, with the value of the date as you wish. Now the followup question is: are you sure that the compiler is not substituting it during compile time, and leaving it untouched for handling at the runtime?
String interpolation allows the developer to combine variables and text to form a string.
Example
Two int variables are created: foo and bar.
int foo = 34;
int bar = 42;
string resultString = $"The foo is {foo}, and the bar is {bar}.";
Console.WriteLine(resultString);
Output:
The foo is 34, and the bar is 42.
I'm looking for an efficient, case inventive string replace. If using Regex I don't want to call Regex.IsMatch and then Regex.Replace because that's unnecessary two searches through input instead of one. I could do the following but again this requires an additional local variable. Is there a way to do it in one line without a local variable? Something like Regex.TryReplace(ref string input, ...) that would return a bool.
string input = "string with pattern";
string replaced = Regex.Replace(input , Regex.Escape("pattern"), "replace value", RegexOptions.IgnoreCase);
if (!ReferenceEquals(replaced, input))
{
input = replaced;
// do something
}
You can do it with with a try/catch using the Replace(String, String, String, RegexOptions, TimeSpan)`overload.
try {
Console.WriteLine(Regex.Replace(words, pattern, evaluator,
RegexOptions.IgnorePatternWhitespace,
TimeSpan.FromSeconds(.25)));
}
catch (RegexMatchTimeoutException) {
Console.WriteLine("Returned words:");
}
}
Reference
But you are still performing two operations: trying to replace, and checking if it's replaced, which you'll always be doing. I'm courious on why such a concern of doing two operations in one line.
I'm trying to write a VBA parser; in order to create a ConstantNode, I need to be able to match all possible variations of a Const declaration.
These work beautifully:
Const foo = 123
Const foo$ = "123"
Const foo As String = "123"
Private Const foo = 123
Public Const foo As Integer = 123
Global Const foo% = 123
But I have 2 problems:
If there's a comment at the end of the declaration, I'm picking it up as part of the value:
Const foo = 123 'this comment is included as part of the value
If there's two or more constants declared in the same instruction, I'm failing to match the entire instruction:
Const foo = 123, bar = 456
Here is the regular expressions I'm using:
/// <summary>
/// Gets a regular expression pattern for matching a constant declaration.
/// </summary>
/// <remarks>
/// Constants declared in class modules may only be <c>Private</c>.
/// Constants declared at procedure scope cannot have an access modifier.
/// </remarks>
public static string GetConstantDeclarationSyntax()
{
return #"^((Private|Public|Global)\s)?Const\s(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&#!#$])?(?<as>\sAs\s(?<reference>(((?<library>[a-zA-Z][a-zA-Z0-9_]*))\.)?(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)))?\s\=\s(?<value>.*)$";
}
Obviously both issues are caused by the (?<value>.*)$ part, which matches anything up until the end of the line. I got VariableNode to support multiple declarations in one instruction by enclosing the whole pattern in a capture group and adding an optional comma, but because constants have this value group, doing that resulted in the first constant having all following declarations captured as part of its value... which brings me back to problem #1.
I wonder if it's at all possible to solve problem #1 with a regular expression, given that the value may be a string that contains an apostrophe, and possibly some escaped (doubled-up) double quotes.
I think I can solve it in the ConstantNode class itself, in the getter for Value:
/// <summary>
/// Gets the constant's value. Strings include delimiting quotes.
/// </summary>
public string Value
{
get
{
return RegexMatch.Groups["value"].Value;
}
}
I mean, I could implement some additional logic in here, to do what I can't do with a regex.
If problem #1 can be solved with a regex, then I believe problem #2 can be as well... or am I on the right track here? Should I ditch the [pretty complex] regex patterns and think of another way? I'm not too familiar with greedy subexpressions, backreferences and other more advanced regex features - is this what's limiting me, or it's just that I'm using the wrong hammer for this nail?
Note: it doesn't matter that the patterns potentially match illegal syntax - this code will only run against compilable VBA code.
Let me go ahead and add the disclaimer on this one. This is absolutely not a good idea (but it was a fun challenge). The regex(s) I'm about to present will parse the test cases in the question, but they obviously are not bullet proof. Using a parser will save you a lot of headache later. I did try to find a parser for VBA, but came up empty handed (and I'm assuming everyone else has too).
Regex
For this to work nicely, you need to have some control over the VBA code coming in. If you can't do this, then you truly need to be looking at writing a parser instead of using Regexes. However, judging from what you already said, you may have a little bit of control. So maybe this will help out.
So for this, I had to split the regex into two distinct regexes. The reason for this is the .Net Regex library cannot handle capturing groups within a repeating group.
Capture the line and start parsing, this will place the variables (with the values) into a single group, but the second Regex will parse them. Just fyi, the regexes make use of negative lookbehinds.
^(?:(?<Accessibility>Private|Public|Global)\s)?Const\s(?<variable>[a-zA-Z][a-zA-Z0-9_]*(?:[%&#!#$])?(?:\sAs)?\s(?:(?:[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s[^',]+(?:(?:(?!"").)+"")?(?:,\s)?){1,}(?:'(?<comment>.+))?$
Regex Demo
Here's the regex to parse the variables
(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&#!#$])?(?:\sAs)?\s(?:(?<reference>[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s(?<value>[^',]+(?:(?:(?!").)+")?),?
Regex Demo
And here's some c# code you can toss in and test everything out. This should make it easy to test any edge cases you have.
static void Main(string[] args)
{
List<String> test = new List<string> {
"Const foo = 123",
"Const foo$ = \"123\"",
"Const foo As String = \"1'2'3\"",
"Const foo As String = \"123\"",
"Private Const foo = 123",
"Public Const foo As Integer = 123",
"Global Const foo% = 123",
"Const foo = 123 'this comment is included as part of the value",
"Const foo = 123, bar = 456",
"'Const foo As String = \"123\"",
};
foreach (var str in test)
Parse(str);
Console.Read();
}
private static Regex parse = new Regex(#"^(?:(?<Accessibility>Private|Public|Global)\s)?Const\s(?<variable>[a-zA-Z][a-zA-Z0-9_]*(?:[%&#!#$])?(?:\sAs)?\s(?:(?:[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s[^',]+(?:(?:(?!"").)+"")?(?:,\s)?){1,}(?:'(?<comment>.+))?$", RegexOptions.Compiled | RegexOptions.Singleline, new TimeSpan(0, 0, 20));
private static Regex variableRegex = new Regex(#"(?<identifier>[a-zA-Z][a-zA-Z0-9_]*)(?<specifier>[%&#!#$])?(?:\sAs)?\s(?:(?<reference>[a-zA-Z][a-zA-Z0-9_]*)\s)?=\s(?<value>[^',]+(?:(?:(?!"").)+"")?),?", RegexOptions.Compiled | RegexOptions.Singleline, new TimeSpan(0, 0, 20));
public static void Parse(String str)
{
Console.WriteLine(String.Format("Parsing: {0}", str));
var match = parse.Match(str);
if (match.Success)
{
//Private/Public/Global
var accessibility = match.Groups["Accessibility"].Value;
//Since we defined this with atleast one capture, there should always be something here.
foreach (Capture variable in match.Groups["variable"].Captures)
{
//Console.WriteLine(variable);
var variableMatch = variableRegex.Match(variable.Value);
if (variableMatch.Success)
{
Console.WriteLine(String.Format("Identifier: {0}", variableMatch.Groups["identifier"].Value));
if (variableMatch.Groups["specifier"].Success)
Console.WriteLine(String.Format("specifier: {0}", variableMatch.Groups["specifier"].Value));
if (variableMatch.Groups["reference"].Success)
Console.WriteLine(String.Format("reference: {0}", variableMatch.Groups["reference"].Value));
Console.WriteLine(String.Format("value: {0}", variableMatch.Groups["value"].Value));
Console.WriteLine("");
}
else
{
Console.WriteLine(String.Format("FAILED VARIABLE: {0}", variable.Value));
}
}
if (match.Groups["comment"].Success)
{
Console.WriteLine(String.Format("Comment: {0}", match.Groups["comment"].Value));
}
}
else
{
Console.WriteLine(String.Format("FAILED: {0}", str));
}
Console.WriteLine("+++++++++++++++++++++++++++++++++++++++++++++");
Console.WriteLine("");
}
The c# code was just what I was using to test my theory, so I apologize for the craziness in it.
For completeness here's a small sample of the output. If you run the code you'll get more output, but this directly shows that it can handle the situations you were asking about.
Parsing: Const foo = 123 'this comment is included as part of the value
Identifier: foo
value: 123
Comment: this comment is included as part of the value
Parsing: Const foo = 123, bar = 456
Identifier: foo
value: 123
Identifier: bar
value: 456
What it handles
Here are the major cases I can think of that you're probably interested in. It should still handle everything you had before as I just added to the regex you provided.
Comments
Multiple variable declarations on a single line
The apostrophe (comment character) within a string value. Ie foo = "She's awesome"
If the line starts with a comment, the line should be ignored
What it doesn't handle
The one thing I didn't really handle was spacing, but it shouldn't be hard add that in yourself if you need it. So for instance if the declare multiple variables there MUST be a space after the comma. ie (VALID: foo = 123, foobar = 124) (INVALID: foo = 123,foobar = 124)
You won't get much leniency on the format from it, but there's not a whole lot you can do with that when using regexes.
Hope this helps you out, and if you need any more explanation on how any of this works just let me know. Just know this is a bad idea. You'll run into situations that the regex can't handle. If I was in your position, I'd be considering writing a simple parser which would give you greater flexibility in the long run. Good luck.
Suppose I want to ask a user what format they want a certain output to be in and the output will include fill-in fields. So they provide something like this string:
"Output text including some field {FieldName1Value} and another {FieldName2Value} and so on..."
Anything bound by the {} should be a column name in a table somewhere they will be replaced with the the stored value with the code I am writing. Seems simple, I could just do a string.Replace on any instance that matches the patter "{" + FieldName + "}". But, what if I also want to give the user the option of using an escape so they can use brackets like any other string. I was thinking they provide "{{" or "}}" to escape that bracket - nice and easy for them. So, they could provide something like:
"Output text including some field {FieldName1Value} and another {FieldName2Value} but not this {{FieldName2Value}}"
But now that "{{FieldName2Value}}" is to be treated like any other string and ignored by the by the Replace. Also, if they decided to put something like "{{{FieldName2Value}}}" with the triple brackets, that would be interpreted by the code as the field value wrapped with brackets and so on.
This is where I get stuck. I am trying with RegEx and came up with this:
public object Convert(object[] values, Type targetType, object parameter, CultureInfo culture)
{
string format = (string)values[0];
ObservableCollection<CalloutFieldAliasMap> oc = (ObservableCollection<CalloutFieldAliasMap>)values[1];
foreach (CalloutFieldMap map in oc)
format = Regex.Replace(format, #"(?<!{){" + map.FieldName + "(?<!})}", " " + map.FieldAlias + " ", RegexOptions.IgnoreCase);
return format;
}
This works in the situation with double brackets {{ }} but NOT if there are three, ie {{{ }}}. The triple brackets are treated like string when it should be treated as {FieldValue}.
Thanks for any help.
By expanding on your regular expression, the presence of literals can be accommodated.
format = Regex.Replace(format,
#"(?<!([^{]|^){(?:{{)*){" + Regex.Escape(map.FieldName) + "}",
String.Format(" {0} ", map.FieldAlias),
RegexOptions.IgnoreCase | RegexOptions.Compiled);
The first part of the expression, (?<!([^{]|^){(?:{{)*){, designates that the { must be preceded by an even number of { characters for it to mark the beginning of a field token. Thus, {FieldName} and {{{FieldName} will denote the start of a field name, whereas {{FieldName} and {{{{FieldName} would not.
The closing } simply requires that the end of the field be a simple }. There is some ambiguity in the syntax in that {FieldName1Value}}} could be parsed as a token with FieldName1Value (followed by the literal }) or FieldName1Value}. The regex assumes the former. (If the latter is intended, you could replace this with }(?!}(}})*) instead.
A couple of other notes. I added Regex.Escape(map.FieldName) so that all characters in the field name are treated as literals; and added the RegexOptions.Compiled flag. (Since this is both a complex expression and executed in a loop, it is a good candidate for compilation.)
After the loop executes, a simple:
format = format.Replace("{{", "{").Replace("}}", "}")
can be used to unescape the literal {{ and }} characters.
The simplest way would be to use String.Replace to replace the double brackets with a character sequence that the user can not (or almost certainly will not) enter. Then do the replacement of your fields, and finally convert replacement back to the double brackets.
For example, given:
string replaceOpen = "{x"; // 'x' should be something like \u00ff, for example
string replaceClose = "x}";
string template = "Replace {ThisField} but not {{ThatField}}";
string temp = template.Replace("{{", replaceOpen).Replace("}}", replaceClose);
string converted = temp.Replace("{ThisField}", "Foo");
string final = converted.Replace(replaceOpen, "{{").Replace(replaceClose, "}});
It's not particularly pretty, but it's effective.
How you go about it is going to depend in large part on how often you call this, and how fast you really need it to be.
I have an extension method I wrote that almost does what you ask, but, while it does escape using double braces, it doesn't do the triple braces like you suggested. Here is the method (also on GitHub at https://github.com/benallred/Icing/blob/master/Icing/Icing.Core/StringExtensions.cs):
private const string FormatTokenGroupName = "token";
private static readonly Regex FormatRegex = new Regex(#"(?<!\{)\{(?<" + FormatTokenGroupName + #">\w+)\}(?!\})", RegexOptions.Compiled);
public static string Format(this string source, IDictionary<string, string> replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
string replaced = replacements.Aggregate(source,
(current, pair) =>
FormatRegex.Replace(current,
new MatchEvaluator(match =>
(match.Groups[FormatTokenGroupName].Value == pair.Key
? pair.Value : match.Value))));
return replaced.Replace("{{", "{").Replace("}}", "}");
}
Usage:
"This is my {FieldName}".Format(new Dictionary<string, string>() { { "FieldName", "value" } });
Even easier if you add this:
public static string Format(this string source, object replacements)
{
if (string.IsNullOrWhiteSpace(source) || replacements == null)
{
return source;
}
IDictionary<string, string> replacementsDictionary = new Dictionary<string, string>();
foreach (PropertyDescriptor propertyDescriptor in TypeDescriptor.GetProperties(replacements))
{
string token = propertyDescriptor.Name;
object value = propertyDescriptor.GetValue(replacements);
replacementsDictionary.Add(token, (value != null ? value.ToString() : String.Empty));
}
return Format(source, replacementsDictionary);
}
Usage:
"This is my {FieldName}".Format(new { FieldName = "value" });
Unit tests for this method are at https://github.com/benallred/Icing/blob/master/Icing/Icing.Tests/Core/TestOf_StringExtensions.cs
If this doesn't work, what would your ideal solution do for more than three braces? In other words, if {{{FieldName}}} becomes {value}, what does {{{{FieldName}}}} become? What about {{{{{FieldName}}}}} and so on? While those cases are unlikely, they still need to be handled purposefully.
RegEx will not do what you want because it only knows it's current state and what transitions are available. It has no concept of memory. The language you're trying parse is not regular so you will never be able to write a RegEx to handle the general case. You would need i expressions where i is the number of matching braces.
There is a lot of theory behind this and I'll provide some links at the bottom if you're curious. But basically the language you're trying to parse is context-free and to implement a general solution you'll need model a push down automaton, which uses a stack to ensure that an opening brace has a matching closing brace (yes, this is why most languages have matching braces).
Each time you encounter { you put it on the stack. If you encounter } you pop from the stack. When you empty the stack you will know that you've reached the end of a field. Of course that's a major simplification of the problem, but if you're looking for a general solution it should get you moving in the right direction.
http://en.wikipedia.org/wiki/Regular_language
http://en.wikipedia.org/wiki/Context-free_language
http://en.wikipedia.org/wiki/Pushdown_automaton
I have the code below.
The line string content = twitterMsg.text; is creating the error 'Use of unassigned local variable' for twitterMsg. I don't seem able to access my TwitterSearchResponse.results.text fields in my DataContractJsonSerializer<TwitterMain> collection.
TwitterSearchResponse.results is an array (set of object properties) with several string fields attached with names like text and user_info.
Can anyone help with this??
Updated code below. I am still highly confused about why I am not able to iterate over my TwitterSearchResponse.results properly and assign content = twitterMsg.text
For what it's worth, here is my DataContractJsonSerializer method:
String url = String.Format("http://search.twitter.com/search.json?q={0}&rpp=20", Server.UrlEncode(txtSearchFor.Text));
// parse the JSON data
using (MemoryStream ms = new MemoryStream(wc.DownloadData(url)))
{
DataContractJsonSerializer jsonSerializer =
new DataContractJsonSerializer(typeof(TwitterMain));
TwitterSearchResponse = jsonSerializer.ReadObject(ms) as TwitterMain; // read as JSON and map as TwitterOut
}
And here is the original posted code where the issue lies.
public List<MatchCollection> returnMatches(DataContractJsonSerializer<TwitterMain> TwitterSearchResponse)
{
List<MatchCollection> messageLinks = new List<MatchCollection>();
foreach (TwitterResult twitterMsg in TwitterSearchResponse.results)
{
string content = twitterMsg.text;
// capture internet protocol pre-fixed words from message
string pattern = #"...";
messageLinks.Add(Regex.Matches(content, pattern, RegexOptions.IgnoreCase));
// capture #username twitter users from message
string atUsernamePattern = #"#([a-zA-Z0-9-_]+)";
MatchCollection PeopleMatches = Regex.Matches(content, atUsernamePattern, RegexOptions.IgnoreCase);
}
return messageLinks;
}
I suspect it's actually reporting the use of the unassigned local variable MessageLinks. Your use of twitterMsg looks fine.
So, the big question is: what do you want to return if there aren't any results? If you're happy returning null, just assign the value when you declare MessageLinks.
Next question: do you really only want to return the last MatchCollection you find? That's what the current behaviour is: you're looping over all the variables, setting the same local variable each time (i.e. replacing the previous value) and then returning that last value.
Final question: any reason why you've got a camel-cased method name (returnMatches), a Pascal-cased local variable (MessageLinks), a Pascal-cased parameter name (TwitterSearchResponse) and a camel-cased property (text)? I would assume that text is due to it coming from JSON that way - but it's a good idea to follow normal .NET naming conventions otherwise.