Replacing the character between angle brackets - c#

Story:
I have a list box that shows all the methods of the current application. I need to colorize the method parameters' data type to blue.
Solution: FIRST, I extract the content between the parenthesis. SECOND, I want to split them by COMMA
Problem:
If the arguments entail something like IDictionary<string, string> which occurs multiple times, then the above solution faces problem!! Thus, I decided to FIRST grab all the content between the angle brackets and then Replace their comma with "#COMMA#" and after performing the task using the above solution simply replace the "#COMMA#" with ",". BUT, based on the solution found HERE, it is not possible to set any value to match.value. Here is my code:
if (methodArgumentType.Contains("<") && methodArgumentType.Contains(">"))
{
var regex = new Regex("(?<=<).*?(?=>)");
foreach (Match match in regex.Matches(methodArgumentType))
{
match.Value = match.Value.Replace(",", "#COMMA#");
}
}
Any suggestion highly appreciated.

You need to replace the matched value inside the match evaluator within Regex.Replace:
var methodArgumentType = "IDictionary<string, string>";
if (methodArgumentType.Contains("<") && methodArgumentType.Contains(">"))
{
methodArgumentType = Regex.Replace(methodArgumentType, #"<([^<>]+)>",
m => string.Format("<{0}>", m.Groups[1].Value.Replace(",", "#COMMA#")));
}
Console.WriteLine(methodArgumentType);
// => IDictionary<string#COMMA# string>
Here, m.Groups[1].Value will hold string, string and the replacement will be done on the input string itself, not the Regex.Match object.

Related

Trim().Split causes a problem in Contains()

I am getting a string and trimming it first, then splitting it and assigning it to a string[]. Then, I am using every element in the array for a string.Contains() or string.StartsWith() method. Interesting thing is that even if the string contains element, Contains() doesn't work properly. And situation is same for StartsWith(), too. Does anyone have any idea about the problem?
P.S.: I trimmed strings after splitting and problem was solved.
string inputTxt = "tasklist";
string commands = "net, netsh, tasklist";
string[] maliciousConsoleCommands = commands.Trim(' ').Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i])) {
return false;
}
}
//this code works but no idea why previous code didn't work.
string[] maliciousConsoleCommands = commands.Split(',');
for (int i = 0; i < maliciousConsoleCommands.Length; i++) {
if (inputTxt.StartsWith(maliciousConsoleCommands[i].Trim(' '))) {
return false;
}
}
I expected to work properly but it is solved by trimming after splitting.
Your delimiter is not a comma char, it's a comma followed by a white-space - so instead of splitting by ',', simply split by ", ":
string[] maliciousConsoleCommands = commands.Split(new string[] {", "});
This will return the items without the leading space so the trim will be redundant.
It seems, you should Trim each item :
// ["net", "netsh, "tasklist"]
string[] maliciousConsoleCommands = commands
.Split(',') // "net" " netsh", " tasklist" - note leading spaces
.Select(item => item.Trim()) // removing leading spaces from each item
.ToArray();
Finally, if you want to test if inputTxt is malicious:
if (commands
.Split(',')
.Select(item => item.Trim()) // You can combine Select and Any
.Any(item => inputTxt.StartsWith(item))
return false;
First code you presented won't work because you want to trim initial string, so "net, netsh, tasklist" will stay unchanged after trimming (no leading and trailing spaces), then splitting it by comma will produce entries, that have leading space. Thus, you will get unexpected results. You should be trimming after splitting the string.
Second code also won't work, because you use Trim after StartsWith, which return bool value. You can't apply Trim to bool, this code should not even compile.
Yet another way to split if the commands themselves have no spaces is to use ' ' itself as a delimiter, and discard empty entries :
var maliciousConsoleCommands = commands.Split(new[]{',',' '},StringSplitOptions.RemoveEmptyEntries)
.ToArray();
This avoids the temporary strings generated by every string manipulation command.
For your code to work though, you'd have use Contains for each command, instead of using StartWith :
var isSuspicious = maliciousCommands.Any(cmd=>input.Contains(cmd));
Or even :
var isSuspicious = maliciousCommands.Any(input.Contains);
This can get rather slow if you have multiple commands, or if the input text is large
Regular expression alternative
A far faster technique would be to use a Regular expression. This performs a lot faster than searching individual keywords :
var regex=new Regex("net|netsh|tasklist");
var isSuspicious=regex.IsMatch(inputTxt);
Regular expressions are thread-safe which means they can be created once and reused by different threads/requests.
By using Match/Matches instead of IsMatch the regex could return the actual keywords that were detected :
var detection=regex.Match(inputTxt);
if (detection.Success)
{
var detectedKeyword=detection.Value;
....
}
Converting the original comma-separated list to a regular expression can be performed with a single String.Replace(", ") or another regular expression that can handle any whitespace character :
string commands = "net , netsh, \ttasklist";
var pattern=Regex.Replace(commands,#"\s*,\s*","|").Dump();
var regex=new Regex(pattern);
Detecting whole words only
Both Contains and the original regular expression would match tasklist1 as well as tasklist. It's possible to match whole words only, if the pattern is surrounded by the word delimiter, \b :
#"\b(" + pattern + #")\b"
This will match tasklist and net but reject tasklist1

Extracting and Manipulating Strings in C#.Net

We have a requirement to extract and manipulate strings in C#. Net. The requirement is - we have a string
($name$:('George') AND $phonenumer$:('456456') AND
$emailaddress$:("test#test.com"))
We need to extract the strings between the character - $
Therefore, in the end, we need to get a list of strings containing - name, phonenumber, emailaddress.
What would be the ideal way to do it? are there any out of the box features available for this?
Regards,
John
The simplest way is to use a regular expression to match all non-whitespace characters between $ :
var regex=new Regex(#"\$\w+\$");
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var matches=regex.Matches(input);
This will return a collection of matches. The .Value property of each match contains the matching string. \$ is used because $ has special meaning in regular expressions - it matches the end of a string. \w means a non-whitespace character. + means one or more.
Since this is a collection, you can use LINQ on it to get eg an array with the values:
var values=matches.OfType<Match>().Select(m=>m.Value).ToArray();
That array will contain the values $name$,$phonenumer$,$emailaddress$.
Capture by name
You can specify groups in the pattern and attach names to them. For example, you can group the field name values:
var regex=new Regex(#"\$(?<name>\w+)\$");
var names=regex.Matches(input)
.OfType<Match>()
.Select(m=>m.Groups["name"].Value);
This will return name,phonenumer,emailaddress. Parentheses are used for grouping. (?<somename>pattern) is used to attach a name to the group
Extract both names and values
You can also capture the field values and extract them as a separate field. Once you have the field name and value, you can return them, eg as an object or anonymous type.
The pattern in this case is more comples:
#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)"
Parentheses are escaped because we want them to match the values. Both ' and " characters are used in values, so ['"] is used to specify a choice of characters. The pattern is a literal string (ie starts with #) so the double quotes have to be escaped: ['""] . Any character has to be matched .+ but only up to the next character in the pattern .+?. Without the ? the pattern .+ would match everything to the end of the string.
Putting this together:
var regex = new Regex(#"\$(?<name>\w+)\$:\(['""](?<value>.+?)['""]\)");
var myValues = regex.Matches(input)
.OfType<Match>()
.Select(m=>new { Name=m.Groups["name"].Value,
Value=m.Groups["value"].Value
})
.ToArray()
Turn them into a dictionary
Instead of ToArray() you could convert the objects to a dictionary with ToDictionary(), eg with .ToDictionary(it=>it.Name,it=>it.Value). You could omit the select step and generate the dictionary from the matches themselves :
var myDict = regex.Matches(input)
.OfType<Match>()
.ToDictionary(m=>m.Groups["name"].Value,
m=>m.Groups["value"].Value);
Regular expressions are generally fast because they don't split the string. The pattern is converted to efficient code that parses the input and skips non-matching input immediatelly. Each match and group contain only the index to their starting and ending character in the input string. A string is only generated when .Value is called.
Regular expressions are thread-safe, which means a single Regex object can be stored in a static field and reused from multiple threads. That helps in web applications, as there's no need to create a new Regex object for each request
Because of these two advantages, regular expressions are used extensively to parse log files and extract specific fields. Compared to splitting, performance can be 10 times better or more, while memory usage remains low. Splitting can easily result in memory usage that's multiple times bigger than the original input file.
Can it go faster?
Yes. Regular expressions produce parsing code that may not be as efficient as possible. A hand-written parser could be faster. In this particular case, we want to start capturing text if $ is detected up until the first $. This can be done with the following method :
IEnumerable<string> GetNames(string input)
{
var builder=new StringBuilder(20);
bool started=false;
foreach(var c in input)
{
if (started)
{
if (c!='$')
{
builder.Append(c);
}
else
{
started=false;
var value=builder.ToString();
yield return value;
builder.Clear();
}
}
else if (c=='$')
{
started=true;
}
}
}
A string is an IEnumerable<char> so we can inspect one character at a time without having to copy them. By using a single StringBuilder with a predetermined capacity we avoid reallocations, at least until we find a key that's larger than 20 characters.
Modifying this code to extract values though isn't so easy.
Here's one way to do it, but certainly not very elegant. Basically splitting the string on the '$' and taking every other item will give you the result (after some additional trimming of unwanted characters).
In this example, I'm also grabbing the value of each item and then putting both in a dictionary:
var input = "($name$:('George') AND $phonenumer$:('456456') AND $emailaddress$:(\"test#test.com\"))";
var inputParts = input.Replace(" AND ", "")
.Trim(')', '(')
.Split(new[] {'$'}, StringSplitOptions.RemoveEmptyEntries);
var keyValuePairs = new Dictionary<string, string>();
for (int i = 0; i < inputParts.Length - 1; i += 2)
{
var key = inputParts[i];
var value = inputParts[i + 1].Trim('(', ':', ')', '"', '\'', ' ');
keyValuePairs[key] = value;
}
foreach (var kvp in keyValuePairs)
{
Console.WriteLine($"{kvp.Key} = {kvp.Value}");
}
// Wait for input before closing
Console.WriteLine("\nDone!\nPress any key to exit...");
Console.ReadKey();
Output

Replace word between two sets of brackets

I have a string:
"You have just received {{PowerUpName}} from {{PlayerName}}"
I then receive a set of parameters (in JSON format):
{"PowerUpName":"Super Boost","PlayerName":"John"}
I'm trying to work out how I replace the words within double brackets i.e. {{PowerUpName}} with a parameter. I guess I need to use regex but I have no idea what the expression should be. I'm coding in C# by the way (and can't use LINQ).
Any help would be much appreciated.
If the string "You have just received {{PowerUpName}} from {{PlayerName}}" is always the same you don't need regex.
You can simply use the String.Replace method for every parameter from JSON.
If you want to replace any words inside {{ and }} symbols, you do not need LINQ:
// Input string
string str = "You have just received {{PowerUpName}} from {{PlayerName}}";
// Initializing sample dictionary object
var obj = new Dictionary<string,string>();
// Filling it out
obj.Add("PowerUpName", "Super Boost");
obj.Add("PlayerName", "John");
// Replacing the values with those in the dictionary
string output = Regex.Replace(str, "(?<=\\{\\{)(.*?)(?=\\}\\})", match => obj[match.Groups[1].Value]);
// Display result
Console.WriteLine(output);
Result:
You have just received {{Super Boost}} from {{John}}
Provided that the brackets are correctly matched and there are no nested brackets, you can do this
var obj = {"PowerUpName":"Super Boost","PlayerName":"John"};
Regex.Replace(input, #"(?<=\{\{.*?(?=\}\})", delegate(match){
return obj[match];
});

what is a good pattern to processes each individual regex match through a method

I'm trying to figure out a pattern where I run a regex match on a long string, and each time it finds a match, it runs a replace on it. The thing is, the replace will vary depending on the matched value. This new value will be determined by a method. For example:
var matches = Regex.Match(myString, myPattern);
while(matches.Success){
Regex.Replace(myString, matches.Value, GetNewValue(matches.Groups[1]));
matches = matches.NextMatch();
}
The problem (i think) is that if I run the Regex.Replace, all of the match indexes get messed up so the result ends up coming out wrong. Any suggestions?
If you replace each pattern with a fixed string, Regex.replace does that for you. You don't need to iterate the matches:
Regex.Replace(myString, myPattern, "replacement");
Otherwise, if the replacement depends upon the matched value, use the MatchEvaluator delegate, as the 3rd argument to Regex.Replace. It receives an instance of Match and returns string. The return value is the replacement string. If you don't want to replace some matches, simply return match.Value:
string myString = "aa bb aa bb";
string myPattern = #"\w+";
string result = Regex.Replace(myString, myPattern,
match => match.Value == "aa" ? "0" : "1" );
Console.WriteLine(result);
// 0 1 0 1
If you really need to iterate the matches and replace them manually, you need to start replacement from the last match towards the first, so that the index of the string is not ruined for the upcoming matches. Here's an example:
var matches = Regex.Matches(myString, myPattern);
var matchesFromEndToStart = matches.Cast<Match>().OrderByDescending(m => m.Index);
var sb = new StringBuilder(myString);
foreach (var match in matchesFromEndToStart)
{
if (IsGood(match))
{
sb.Remove(match.Index, match.Length)
.Insert(match.Index, GetReplacementFor(match));
}
}
Console.WriteLine(sb.ToString());
Just be careful, that your matches do not contain nested instances. If so, you either need to remove matches which are inside another match, or rerun the regex pattern to generate new matches after each replacement. I still recommend the second approach, which uses the delegates.
If I understand your question correctly, you want to perform a replace based on a constant Regular Expression, but the replacement text you use will change based on the actual text that the regex matches on.
The Captures property of the Match Class (not the Match method) returns a collection of all the matches with your regex within the input string. It contains information like the position within the string, the matched value and the length of the match. If you iterate over this collection with a foreach loop you should be able to treat each match individually and perform some string manipulations where you can dynamically modify the replacement value.
I would use something like
Regex regEx = new Regex("some.*?pattern");
string input = "someBLAHpattern!";
foreach (Match match in regEx.Matches(input))
{
DoStuffWith(match.Value);
}

Regex questions

I'm trying to get some text from a large text file, the text I'm looking for is:
Type:Production
Color:Red
I pass the whole text in the following method to get (Type:Production , Color:Red)
private static void FindKeys(IEnumerable<string> keywords, string source)
{
var found = new Dictionary<string, string>(10);
var keys = string.Join("|", keywords.ToArray());
var matches = Regex.Matches(source, #"(?<key>" + #"\B\s" + keys + #"\B\s" + "):",
RegexOptions.Singleline);
foreach (Match m in matches)
{
var key = m.Groups["key"].ToString();
var start = m.Index + m.Length;
var nx = m.NextMatch();
var end = (nx.Success ? nx.Index : source.Length);
found.Add(key, source.Substring(start, end - start));
}
foreach (var n in found)
{
Console.WriteLine("Key={0}, Value={1}", n.Key, n.Value);
}
}
}
My problems are the following:
The search returns _Type: as well, where I only need Type:
The search return Color:Red/n/n/n/n/n (with the rest of the text, where I only need Color:Red
So, basically:
- How can I force Regex to get the exact match for Type and ignore _Type
- How to get only the text after : and ignore /n/n/ and any other text
I hope this is clear
Thanks,
Your regex currently looks like this:
(?<key>\B\sWord1|Word2|Word3\B\s):
I see the following issues here:
First, Word1|Word2|Word3 should be put in parenthesis. Otherwise, it will search for \B\sWord1 or Word2 or Word3\B\s, which is not what you want (I guess).
Why \B\s? A non-boundary followed by a whitespace? That doesn't make sense. I guess you want just \b (= word boundary). There's no need to use it in the end, because the colon already constitutes a word boundary.
So, I would suggest to use the following. It will fix the _Type problem, because there is no word boundary between _ and Type (since _ is considered to be a word character).
\b(?<key>Word1|Word2|Word3):
If the text following the key is always just a single word, I'd match it in the regex as well: (\s* allows for whitespace after the colon, I don't know if you need this. \w+ ensures that only word characters -- i.e. no line breaks etc. -- are matched as the value.)
\b(?<key>Word1|Word2|Word3):\s*(?<value>\w+)
Then you just need to iterate through all the matches and extract the key and value groups. No need for any string operations or index arithmetic.
So if I understand correctly, you have:
Pairs of key:values
Each pair is separated by a space
Within each pair, the key and value is separated by “:”
Then I would not use regex at all. I would:
use String.Split(' ') to get an array of pairs
loop over all the pairs
use String.Split(':') to get the key and value from each pair

Categories