Is there any function to query the expected inputs and formats from a format string - i.e. one intended as the first argument to the String.Format function?
e.g. given:
"On {0:yyyyy-MM-dd} do {1} and earn {2:C2}"
I'd like to get back something like:
{"yyyyy-MM-dd", null, "C2"}
I guess a regex is one possibility but is there anything precanned that hooks into the same logic as String.Format?
String.Format itself doesn't parse the format string. It ends up calling the internal StringBuilder.AppendFormatHelper method which treats the format strings only as delimited strings. It doesn't try to parse them. The format is passed directly to each argument type's formatter method. String formatting performance is critical, both for the runtime and applications.
You can use a regular expression to parse the format string. You'd need to take care of escaped braces ({{, {}) and alignment strings.
The regex {(?<index>\d+)(,(?<algn>-?\d+?))?(:(?<fmt>.*?))?} takes extracts the index, alignment and format segments as named groups. It doesn't take care of escaped braces *explicitly. It will avoid {{ {} but not {{2,20:N{}:
var regex=new System.Text.RegularExpressions.Regex(#"{(?<index>\d+)(,(?<algn>-?\d+?))?(:(?<fmt>.*?))?}");
var matches=regex.Matches("asdf{0:d2} {1:yyyy-MM-dd} {2,-20:N2}");
foreach(Match match in matches)
{
Console.WriteLine("{0,-5} {1,-15} {2,-15}",
match.Groups["index"].Value,
match.Groups["algn"].Value,
match.Groups["fmt"].Value);
}
This will return :
0 d2
1 yyyy-MM-dd
2 -20 N2
The (?<name>...) syntax captures a pattern and exposes it as a named group. (?<index>\d+) captures a sequence of digits and exposes it as the group index.
The ? in .*? specifies a non-greedy match. Typically a regex is greedy - it will capture as many characters match a pattern as possible. By using .*? the regex will capture as few characters as possible before the next pattern starts. That's why the optional algn group stops at :.
Chances are no standard means for that. Use Regex, it's easy:
var args = new List<string>();
var str = "On {0:yyyyy-MM-dd} do {1} and earn {2:C2}";
MatchCollection matches = Regex.Matches(str, #"\{\d+[^\{\}]*\}");
foreach (Match match in matches)
{
string obj = null;
var split = match.ToString().Split(':');
if (split.Length == 2) obj = split.Last().Trim(' ', '}', '{');
args.Add(obj);
}
// Result: args = {"yyyyy-MM-dd", null, "C2"}
Related
We have a requirement to extract and manipulate strings in C#. Net. The requirement is - we have a string
($name$:('George') AND $phonenumer$:('456456') AND
$emailaddress$:("test#test.com"))
We need to extract the strings between the character - $
Therefore, in the end, we need to get a list of strings containing - name, phonenumber, emailaddress.
What would be the ideal way to do it? are there any out of the box features available for this?
Regards,
John
The simplest way is to use a regular expression to match all non-whitespace characters between $ :
var regex=new Regex(#"\$\w+\$");
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var matches=regex.Matches(input);
This will return a collection of matches. The .Value property of each match contains the matching string. \$ is used because $ has special meaning in regular expressions - it matches the end of a string. \w means a non-whitespace character. + means one or more.
Since this is a collection, you can use LINQ on it to get eg an array with the values:
var values=matches.OfType<Match>().Select(m=>m.Value).ToArray();
That array will contain the values $name$,$phonenumer$,$emailaddress$.
Capture by name
You can specify groups in the pattern and attach names to them. For example, you can group the field name values:
var regex=new Regex(#"\$(?<name>\w+)\$");
var names=regex.Matches(input)
.OfType<Match>()
.Select(m=>m.Groups["name"].Value);
This will return name,phonenumer,emailaddress. Parentheses are used for grouping. (?<somename>pattern) is used to attach a name to the group
Extract both names and values
You can also capture the field values and extract them as a separate field. Once you have the field name and value, you can return them, eg as an object or anonymous type.
The pattern in this case is more comples:
#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)"
Parentheses are escaped because we want them to match the values. Both ' and " characters are used in values, so ['"] is used to specify a choice of characters. The pattern is a literal string (ie starts with #) so the double quotes have to be escaped: ['""] . Any character has to be matched .+ but only up to the next character in the pattern .+?. Without the ? the pattern .+ would match everything to the end of the string.
Putting this together:
var regex = new Regex(#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)");
var myValues = regex.Matches(input)
.OfType<Match>()
.Select(m=>new { Name=m.Groups["name"].Value,
Value=m.Groups["value"].Value
})
.ToArray()
Turn them into a dictionary
Instead of ToArray() you could convert the objects to a dictionary with ToDictionary(), eg with .ToDictionary(it=>it.Name,it=>it.Value). You could omit the select step and generate the dictionary from the matches themselves :
var myDict = regex.Matches(input)
.OfType<Match>()
.ToDictionary(m=>m.Groups["name"].Value,
m=>m.Groups["value"].Value);
Regular expressions are generally fast because they don't split the string. The pattern is converted to efficient code that parses the input and skips non-matching input immediatelly. Each match and group contain only the index to their starting and ending character in the input string. A string is only generated when .Value is called.
Regular expressions are thread-safe, which means a single Regex object can be stored in a static field and reused from multiple threads. That helps in web applications, as there's no need to create a new Regex object for each request
Because of these two advantages, regular expressions are used extensively to parse log files and extract specific fields. Compared to splitting, performance can be 10 times better or more, while memory usage remains low. Splitting can easily result in memory usage that's multiple times bigger than the original input file.
Can it go faster?
Yes. Regular expressions produce parsing code that may not be as efficient as possible. A hand-written parser could be faster. In this particular case, we want to start capturing text if $ is detected up until the first $. This can be done with the following method :
IEnumerable<string> GetNames(string input)
{
var builder=new StringBuilder(20);
bool started=false;
foreach(var c in input)
{
if (started)
{
if (c!='$')
{
builder.Append(c);
}
else
{
started=false;
var value=builder.ToString();
yield return value;
builder.Clear();
}
}
else if (c=='$')
{
started=true;
}
}
}
A string is an IEnumerable<char> so we can inspect one character at a time without having to copy them. By using a single StringBuilder with a predetermined capacity we avoid reallocations, at least until we find a key that's larger than 20 characters.
Modifying this code to extract values though isn't so easy.
Here's one way to do it, but certainly not very elegant. Basically splitting the string on the '$' and taking every other item will give you the result (after some additional trimming of unwanted characters).
In this example, I'm also grabbing the value of each item and then putting both in a dictionary:
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var inputParts = input.Replace(" AND ", "")
.Trim(')', '(')
.Split(new[] {'$'}, StringSplitOptions.RemoveEmptyEntries);
var keyValuePairs = new Dictionary<string, string>();
for (int i = 0; i < inputParts.Length - 1; i += 2)
{
var key = inputParts[i];
var value = inputParts[i + 1].Trim('(', ':', ')', '"', '\'', ' ');
keyValuePairs[key] = value;
}
foreach (var kvp in keyValuePairs)
{
Console.WriteLine($"{kvp.Key} = {kvp.Value}");
}
// Wait for input before closing
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
Output
I have the following string
{token1;token2;token3#somewhere.com;...;tokenn}
I need a Regex pattern, that would give a result in array of strings such as
token1
token2
token3#somewhere.com
...
...
...
tokenn
Would also appreciate a suggestion if can use the same pattern to confirm the format of the string, means string should start and end in curly braces and at least 2 values exist within the anchors.
You may use an anchored regex with named repeated capturing groups:
\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z
See the regex demo
\A - start of string
{ - a {
(?<val>[^;]*) - Group "val" capturing 0+ (due to * quantifier, if the value cannot be empty, use +) chars other than ;
(?:;(?<val>[^;]*))+ - 1 or more occurrences (thus, requiring at least 2 values inside {...}) of the sequence:
; - a semi-colon
(?<val>[^;]*) - Group "val" capturing 0+ chars other than ;
} - a literal }
\z - end of string.
.NET regex keeps each capture in a CaptureCollection stack, that is why all the values captured into "num" group can be accessed after a match is found.
C# demo:
var s = "{token1;token2;token3;...;tokenn}";
var pat = #"\A{(?<val>[^;]*)(?:;(?<val>[^;]*))+}\z";
var caps = new List<string>();
var result = Regex.Match(s, pat);
if (result.Success)
{
caps = result.Groups["val"].Captures.Cast<Capture>().Select(t=>t.Value).ToList();
}
Read it(similar to your problem): How to keep the delimiters of Regex.Split?.
For your RegEx testing use this: http://www.regexlib.com/RETester.aspx?AspxAutoDetectCookieSupport=1.
But RegEx is a very resource-intensive, slow operation.
In your case will be better to use the Split method of string class, for example : "token1;token2;token3;...;tokenn".Split(';');. It will return to you a collection of strings, that you want to obtain.
I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.
Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.
depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"
I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"
Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>
Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/
i have a problem with replacing characters after specific character. For example i want to replace first 'aa' to '33' with this code.
string str = "dc1aaaafg";
string pattern = #"a{2}(?!(1))";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(str, "33");
but the result is 'dc13333fg'. It replaced the second group after '1'. I need to replace only first group like 'dc133aafg'. How can i achive this. I have a large string and it can be many replacing, this is just example.
Regex.Replace() is global. It will replace as many times as the pattern matches*.
You could use Regex.Replace(String, String, Int32) to limit the number of operations.
string result = rgx.Replace(str, "33", 1);
Or you change the pattern to a look-behind.
Regex rgx = new Regex(#"(?<=1)a{2}");
string result = rgx.Replace(str, "33");
* Note that Replace() is global, but not incremental. Using the expression a{2} on "aaaaaa" to with the replacement "ba" will result in "bababa", not in "bbbbba".
There is an overload to the Replace method in which you can specify the number of times. Specify 1 and it shall do only the first match.
string result = rgx.Replace(str, "33", 1);
A regex pattern cannot express that only the first match is relevant.
Use Regex.Match to get the position and length of the first match. Then use Substring (or Remove followed by Insert) to construct a new string from the old string, that has the replacement you want.
Try with a negative look behind : (a{2})(?<!\1{2})
(a{2}) # 'a' two times
(?<! # negative look behind
\1{2} # '\1' is the captured group 'a' twice to "jump" over the captured group
)
I'm trying to figure out a pattern where I run a regex match on a long string, and each time it finds a match, it runs a replace on it. The thing is, the replace will vary depending on the matched value. This new value will be determined by a method. For example:
var matches = Regex.Match(myString, myPattern);
while(matches.Success){
Regex.Replace(myString, matches.Value, GetNewValue(matches.Groups[1]));
matches = matches.NextMatch();
}
The problem (i think) is that if I run the Regex.Replace, all of the match indexes get messed up so the result ends up coming out wrong. Any suggestions?
If you replace each pattern with a fixed string, Regex.replace does that for you. You don't need to iterate the matches:
Regex.Replace(myString, myPattern, "replacement");
Otherwise, if the replacement depends upon the matched value, use the MatchEvaluator delegate, as the 3rd argument to Regex.Replace. It receives an instance of Match and returns string. The return value is the replacement string. If you don't want to replace some matches, simply return match.Value:
string myString = "aa bb aa bb";
string myPattern = #"\w+";
string result = Regex.Replace(myString, myPattern,
match => match.Value == "aa" ? "0" : "1" );
Console.WriteLine(result);
// 0 1 0 1
If you really need to iterate the matches and replace them manually, you need to start replacement from the last match towards the first, so that the index of the string is not ruined for the upcoming matches. Here's an example:
var matches = Regex.Matches(myString, myPattern);
var matchesFromEndToStart = matches.Cast<Match>().OrderByDescending(m => m.Index);
var sb = new StringBuilder(myString);
foreach (var match in matchesFromEndToStart)
{
if (IsGood(match))
{
sb.Remove(match.Index, match.Length)
.Insert(match.Index, GetReplacementFor(match));
}
}
Console.WriteLine(sb.ToString());
Just be careful, that your matches do not contain nested instances. If so, you either need to remove matches which are inside another match, or rerun the regex pattern to generate new matches after each replacement. I still recommend the second approach, which uses the delegates.
If I understand your question correctly, you want to perform a replace based on a constant Regular Expression, but the replacement text you use will change based on the actual text that the regex matches on.
The Captures property of the Match Class (not the Match method) returns a collection of all the matches with your regex within the input string. It contains information like the position within the string, the matched value and the length of the match. If you iterate over this collection with a foreach loop you should be able to treat each match individually and perform some string manipulations where you can dynamically modify the replacement value.
I would use something like
Regex regEx = new Regex("some.*?pattern");
string input = "someBLAHpattern!";
foreach (Match match in regEx.Matches(input))
{
DoStuffWith(match.Value);
}