regex replace matches with function and delete other matches - c#

I have a string like the one below and I want to replace the FieldNN instances with the ouput from a function.
So far I have been able to replace the NN instances with the output from the function. But I am not sure how I can delete the static "field" portion with the same regex.
input string:
(Field30="2010002257") and Field1="yuan" not Field28="AAA"
required output:
(IncidentId="2010002257") and Author="yuan" not Recipient="AAA"
This is the code I have so far:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"(?<=field).*?(?=\=)", delegate(Match Match) {
string fieldId = Match.ToString();
return String.Format("_{0}", getFieldName(Convert.ToInt64(fieldId)));
});
log.Info(String.Format("result={0}", result));
return result;
}
which gives:
(field_IncidentId="2010002257") and field_Author="yuan" not field_Recipient="aaa"
The issues I would like to resolve are:
Remove the static "field" prefixes from the output.
Make the regex case-insenitive on the "FieldNN" parts and not lowercase the quoted text portions.
Make the regex more robust so that the quoted string parts an use either double or single quotes.
Make the regex more robust so that spaces are ignored: FieldNN = "AAA" vs. FieldNN="AAA"
I really only need to address the first issue, the other three would be a bonus but I could probably fix those once I have discovered the right patterns for whitespace and quotes.
Update
I think the pattern below solves issues 2. and 4.
result = Regex.Replace(searchTerm, #"(?<=\b(?i:field)).*?(?=\s*\=)", delegate(Match Match)

To fix first issue use groups instead of positive lookbehind:
public string translateSearchTerm(string searchTerm) {
string result = "";
result = Regex.Replace(searchTerm.ToLower(), #"field(.*?)(?=\=)", delegate(Match Match) {
string fieldId = Match.Groups[1].Value;
return getFieldName(Convert.ToInt64(fieldId));
});
log.Info(String.Format("result={0}", result));
return result;
}
In this case "field" prefix will be included in each match and will be replaced.

Related

Remove list of words from string

I have a list of words that I want to remove from a string I use the following method
string stringToClean = "The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam";
string[] BAD_WORDS = {
"720p", "web-dl", "hevc", "x265", "Rmteam", "."
};
var cleaned = string.Join(" ", stringToClean.Split(' ').Where(w => !BAD_WORDS.Contains(w, StringComparer.OrdinalIgnoreCase)));
but it is not working And the following text is output
The.Flash.2014.S07E06.720p.WEB-DL.HEVC.x265.RMTeam
For this it would be a good idea to create a reusable method that splits a string into words. I'll do this as an extension method of string. If you are not familiar with extension methods, read extension methods demystified
public static IEnumerable<string> ToWords(this string text)
{
// TODO implement
}
Usage will be as follows:
string text = "This is some wild text!"
List<string> words = text.ToWords().ToList();
var first3Words = text.ToWords().Take(3);
var lastWord = text.ToWords().LastOrDefault();
Once you've got this method, the solution to your problem will be easy:
IEnumerable<string> badWords = ...
string inputText = ...
IEnumerable<string> validWords = inputText.ToWords().Except(badWords);
Or maybe you want to use Except(badWords, StringComparer.OrdinalIgnoreCase);
The implementation of ToWords depends on what you would call a word: everything delimited by a dot? or do you want to support whitespaces? or maybe even new-lines?
The implementation for your problem: A word is any sequence of characters delimited by a dot.
public static IEnumerable<string> ToWords(this string text)
{
// find the next dot:
const char dot = '.';
int startIndex = 0;
int dotIndex = text.IndexOf(dot, startIndex);
while (dotIndex != -1)
{
// found a Dot, return the substring until the dot:
int wordLength = dotIndex - startIndex;
yield return text.Substring(startIndex, wordLength;
// find the next dot
startIndex = dotIndex + 1;
dotIndex = text.IndexOf(dot, startIndex);
}
// read until the end of the text. Return everything after the last dot:
yield return text.SubString(startIndex, text.Length);
}
TODO:
Decide what you want to return if text starts with a dot ".ABC.DEF".
Decide what you want to return if the text ends with a dot: "ABC.DEF."
Check if the return value is what you want if text is empty.
Your split/join don't match up with your input.
That said, here's a quick one-liner:
string clean = BAD_WORDS.Aggregate(stringToClean, (acc, word) => acc.Replace(word, string.Empty));
This is basically a "reduce". Not fantastically performant but over strings that are known to be decently small I'd consider it acceptable. If you have to use a really large string or a really large number of "words" you might look at another option but it should work for the example case you've given us.
Edit: The downside of this approach is that you'll get partials. So for example in your token array you have "720p" but the code I suggested here will still match on "720px" but there are still ways around it. For example instead of using string's implementation of Replace you could use a regex that will match your delimiters something like Regex.Replace(acc, $"[. ]{word}([. ])", "$1") (regex not confirmed but should be close and I added a capture for the delimiter in order to put it back for the next pass)

check for a substring(of a string) in the dictionary and return the key's(substring) value

I have a dictionary like below,
PropStreetSuffixDict.Add("ROAD", "RD");
PropStreetSuffixDict.Add("STREET","ST"); and many more.
Now my requirement says when a string contains a substring of either ROAD or STREET i want to return the related value for that substring.
For example..CHURCH ACROSS ROAD should return RD
This is what i tried, which only works if the input string is exactly same as key of the dict.
private string GetSuffix(string input)
{
string suffix=string.Empty;
suffix = PropStreetSuffixDict.Where(x => x.Key.ToUpper().Trim() ==
input.ToUpper().Trim()).FirstOrDefault().Value;
return suffix;
}
Note:
In case a string contains more than one of such substrings, then it should return the value of the first occurence of the any of the substrings.
i.e. if STREET CHURCH ACROSS ROAD is the input, it should return ST not RD
You can try something like this
private string GetSuffix(string input)
{
string suffix=string.Empty;
string[] test =input.ToUpper().Split(' ');
suffix =(from dic in PropStreetSuffixDict
join inp in test on dic.Key equals inp
select dic.Value).LastOrDefault();
return suffix;
}
Split the input and then use linq
If you want it to return first occurrence in the input string (GetSuffix("CHURCH STREET ACROSS ROAD) ==> "STREET") it becomes a little tricky.
Code below will find where in the input string all keys occur, and return value of first found position.
private string GetSuffix(string input)
{
var suffix = PropStreetSuffixDict
.Select(kvp => new
{
Position = input.IndexOf(kvp.Key.Trim(), StringComparison.CurrentCultureIgnoreCase),
Value = kvp.Value
})
.OrderBy(x => x.Position)
.FirstOrDefault(x => x.Position > -1)?.Value;
return suffix ?? string.Empty;
}
If you didn't care about the order of occurrence in input string you could simplify it to this:
private string GetSuffix(string input)
{
var suffix = PropStreetSuffixDict.FirstOrDefault(kvp => input.Containts(kvp.Key.Trim(), StringComparison.CurrentCultureIgnoreCase))?.Value;
return suffix ?? string.Empty;
}
I would recommend using using RegEx to split apart your words, that way you can efficiently split on multiple characters, not just spaces, if required. This solution also allows replacing the individual words very easily, without having to deal with tracking the position and length of the matched word, vs the length of the replacement value.
You could use a function like this:
public string ReplaceWords(string input, Dictionary<string,string> dictionary)
{
var result = Regex.Replace(input, #"\w*", (match) =>
{
if (dictionary.TryGetValue(match.Value, out var replacement))
{
return replacement;
}
return match.Value;
});
return result;
}
It will take an input string, split it up, and replace the individual words with those in the supplied dictionary. The particular RegEx of \w* will match any continuous run of "word" characters, so it will break on spaces, commas, dashes, and anything else that isn't part of a "word".
This code does use some newer C# language features that you may not have access too (inline out parameters). Just let me know if you can't use those and I'll update it to work without them.
You can use it like this:
Console.WriteLine(ReplaceWords("CHURCH ACROSS ROAD", PropStreetSuffixDict));
Console.WriteLine(ReplaceWords("CHURCH ACROSS STREET", PropStreetSuffixDict));
Console.WriteLine(ReplaceWords("CHURCH ACROSS ROAD, LEFT AT THE OTHER STREET", PropStreetSuffixDict));
For the following results:
CHURCH ACROSS RD
CHURCH ACROSS ST
CHURCH ACROSS RD, LEFT AT THE OTHER ST

Replace any string between quotes

Problem:
Cannot find a consistent way to replace a random string between quotes with a specific string I want. Any help would be greatly appreciated.
Example:
String str1 = "test=\"-1\"";
should become
String str2 = "test=\"31\"";
but also work for
String str3 = "test=\"foobar\"";
basically I want to turn this
String str4 = "test=\"antyhingCanGoHere\"";
into this
String str4 = "test=\"31\"";
Have tried:
Case insensitive Regex without using RegexOptions enumeration
How do you do case-insensitive string replacement using regular expressions?
Replace any character in between AnyText: and <usernameredacted#example.com> with an empty string using Regex?
Replace string in between occurrences
Replace a String between two Strings
Current code:
Regex RemoveName = new Regex("(?VARIABLE=\").*(?=\")", RegexOptions.IgnoreCase);
String convertSeccons = RemoveName.Replace(ruleFixed, "31");
Returns error:
System.ArgumentException was caught
Message=parsing "(?VARIABLE=").*(?=")" - Unrecognized grouping construct.
Source=System
StackTrace:
at System.Text.RegularExpressions.RegexParser.ScanGroupOpen()
at System.Text.RegularExpressions.RegexParser.ScanRegex()
at System.Text.RegularExpressions.RegexParser.Parse(String re, RegexOptions op)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options, Boolean useCache)
at System.Text.RegularExpressions.Regex..ctor(String pattern, RegexOptions options)
at application.application.insertGroupID(String rule) in C:\Users\winserv8\Documents\Visual Studio 2010\Projects\application\application\MainFormLauncher.cs:line 298
at application.application.xmlqueryDB(String xmlSaveLocation, TextWriter tw, String ruleName) in C:\Users\winserv8\Documents\Visual Studio 2010\Projects\application\application\MainFormLauncher.cs:line 250
InnerException:
found answer
string s = Regex.Replace(ruleFixed, "VARIABLE=\"(.*)\"", "VARIABLE=\"31\"");
ruleFixed = s;
I found this code sample at Replace any character in between AnyText: and with an empty string using Regex? which is one of the links i previously posted and just had skipped over this syntax because i thought it wouldnt handle what i needed.
var str1 = "test=\"foobar\"";
var str2 = str1.Substring(0, str1.IndexOf("\"") + 1) + "31\"";
If needed add check for IndexOf != -1
I don't know if I understood you correct, but if you want to replace all chars inside string, why aren't you using simple regular expresission
String str = "test=\"-\"1\"";
Regex regExpr = new Regex("\".*\"", RegexOptions.IgnoreCase);
String result = regExpr.Replace(str , "\"31\"");
Console.WriteLine(result);
prints:
test="31"
Note: You can take advantage of plain old XAttribute
String ruleFixed = "test=\"-\"1\"";
var splited = ruleFixed.Split('=');
var attribute = new XAttribute(splited[0], splited[1]);
attribute.Value = "31";
Console.WriteLine(attribute);//prints test="31"
var parts = given.Split('=');
return string.Format("{0}=\"{1}\"", parts[0], replacement);
In the case that your string has other things in it besides just the key/value pair of key="value", then you need to make the value-match part not match quote marks, or it will match all the way from the first value to the last quote mark in the string.
If that is true, then try this:
Regex.Replace(ruleFixed, "(?<=VARIABLE\s*=\s*\")[^\"]*(?=\")", "31");
This uses negative look-behind to match the VARIABLE=" part (with optional white space around it so VARIABLE = " would work as well, and negative look-ahead to match the ending ", without including the look-ahead/behind in the final match, enabling you to just replace the value you want.
If not, then your solution will work, but is not optimal because you have to repeat the value and the quote marks in the replace text.
Assuming that the string within the quotes does not contain quotes itself, you can use this general pattern in order to find a position between a prefix and a suffix:
(?<=prefix)find(?=suffix)
In your case
(?<=\w+=").*?(?=")
Here we are using the prefix \w+=" where \w+ denotes word characters (the variable) and =" are the equal sign and the quote.
We want to find anything .*? until we encounter the next quote.
The suffix is simply the quote ".
string result = Regex.Replace(input, "(?<=\\w+=\").*?(?=\")", replacement);
Try this:
[^"\r\n]*(?:""[\r\n]*)*
var pattern = "\"(.*)?\"";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var replacement = regex.Replace("test=\"hereissomething\"", "\"31\"");
string s = Regex.Replace(ruleFixed, "VARIABLE=\"(.*)\"", "VARIABLE=\"31\"");
ruleFixed = s;
I found this code sample at Replace any character in between AnyText: and <usernameredacted#example.com> with an empty string using Regex? which is one of the links i previously posted and just had skipped over this syntax because i thought it wouldnt handle what i needed.
String str1 = "test=\"-1\"";
string[] parts = str1.Split(new[] {'"'}, 3);
string str2 = parts.Length == 3 ? string.Join(#"\", parts.First(), "31", parts.Last()) : str1;
String str1 = "test=\"-1\"";
string res = Regex.Replace(str1, "(^+\").+(\"+)", "$1" + "31" + "$2");
Im pretty bad at RegEx but you could make a simple ExtensionMethod using string functions to do this.
public static class StringExtensions
{
public static string ReplaceBetweenQuotes(this string str, string replacement)
{
if (str.Count(c => c.Equals('"')) == 2)
{
int start = str.IndexOf('"') + 1;
str = str.Replace(str.Substring(start, str.LastIndexOf('"') - start), replacement);
}
return str;
}
}
Usage:
String str3 = "test=\"foobar\"";
str3 = str3.ReplaceBetweenQuotes("31");
returns: "test=\"31\""

A string replace function with support of custom wildcards and escaping these wildcards in C#

I need to write a string replace function with custom wildcards support. I also should be able to escape these wildcards. I currently have a wildcard class with Usage, Value and Escape properties.
So let's say I have a global list called Wildcards. Wildcards has only one member added here:
Wildcards.Add(new Wildcard
{
Usage = #"\Break",
Value = Enviorement.NewLine,
Escape = #"\\Break"
});
So I need a CustomReplace method to do the trick. I should replace the specified parameter in a given string with another one just like the string.Replace. The only difference here that it must use my custom wildcards.
string test = CustomReplace("Hi there! What's up?", "! ", "!\\Break");
// Value of the test variable should be: "Hi there!\r\nWhat's up?"
// Because \Break is specified in a custom wildcard in Wildcards
// But if I use the value of the wildcard's Escape member,
// it should be replaced with the value of Usage member.
test = CustomReplace("Hi there! What's up?", "! ", "!\\\\Break");
// Value of the test variable should be: "Hi there!\\BreakWhat's up?"
My current method doesn't support escape strings.
It also can't be good when it comes to performance since I call string.Replace two times and each one searches the whole string, I guess.
// My current method. Has no support for escape strings.
CustomReplace(string text, string oldValue, string newValue)
{
string done = text.Replace(oldValue, newValue);
foreach (Wildcard wildcard in Wildcards)
{
// Doing this:
// done = done.Replace(wildcard.Escape, wildcard.Usage);
// ...would cause trouble when Escape contains Usage.
done = done.Replace(wildcard.Usage, wildcard.Value);
}
return done;
}
So, do I have to write a replace method which searches the string char by char with the logic to find and seperate both Usage and Escape values, then replace Escape with Usage while replacing Usage with another given string?
Or do you know an already written one?
Can I use regular expressions in this scenerio?
If I can, how? (Have no experience in this, a pattern would be nice)
If I do, would it be faster or slower than char by char searching?
Sorry for the long post, I tried to keep it clear and sorry for any typos and such; it's not my primary language. Thanks in advance.
You can try this:
public string CustomReplace(string text, string oldValue, string newValue)
{
string done = text.Replace(oldValue, newValue);
var builder = new StringBuilder();
foreach (var wildcard in Wildcards)
{
builder.AppendFormat("({0}|{1})|", Regex.Escape(wildcard.Usage),
Regex.Escape(wildcard.Escape));
}
builder.Length = builder.Length - 1; // Remove the last '|' character
return Regex.Replace(done, builder.ToString(), WildcardEvaluator);
}
private string WildcardEvaluator(Match match)
{
var wildcard = Wildcards.Find(w => w.Usage == match.Value);
if (wildcard != null)
return wildcard.Value;
else
return match.Value;
}
I think this is the easiest and fastest solution as there is only one Replace method call for all wildcards.
So if you are happy to just use Regex to fulfil your needs then you should check out this link. It has some great info for using in .Net. The website also has loads of examples on who to construct Regex patterns for many different needs.
A basic example of a Replace on a string with wildcards might look like this...
string input = "my first regex replace";
string result = System.Text.RegularExpressions.Regex.Replace(input, "rep...e", "result");
//result is now "my first regex result"
notice how the second argument in the Replace function takes a regex pattern string. In this case, the dots are acting as a wildcard character, they basically mean "match any single character"
Hopefully this will help you get what you need.
If you define a pattern for both your wildcard and your escape method, you can create a Regex which will find all the wildcards in your text. You can then use a MatchEvaluator to replace them.
class Program
{
static Dictionary<string, string> replacements = new Dictionary<string, string>();
static void Main(string[] args)
{
replacements.Add("\\Break", Environment.NewLine);
string template = #"This is an \\Break escaped newline and this should \Break contain a newline.";
// (?<=($|[^\\])(\\\\){0,}) will handle double escaped items
string outcome = Regex.Replace(template, #"(?<=($|[^\\])(\\\\){0,})\\\w+\b", ReplaceMethod);
}
public static string ReplaceMethod(Match m)
{
string replacement = null;
if (replacements.TryGetValue(m.Value, out replacement))
{
return replacement;
}
else
{
//return string.Empty?
//throw new FormatException()?
return m.Value;
}
}
}

Regular expression to use which matches text before .html and after /

With this string
http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html
I need to get sdf-as
with this
hellow-1/yo-sdf.html
I need yo-sdf
This should get you want you need:
Regex re = new Regex(#"/([^/]*)\.html$");
Match match = re.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
Console.WriteLine(match.Groups[1].Value); //Or do whatever you want with the value
This needs using System.Text.RegularExpressions; at the top of the file to work.
There are many ways to do this. The following uses lookarounds to match only the filename portion. It actually allows no / if such is the case:
string[] urls = {
#"http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html",
#"hellow-1/yo-sdf.html",
#"noslash.html",
#"what-is/this.lol",
};
foreach (string url in urls) {
Console.WriteLine("[" + Regex.Match(url, #"(?<=/|^)[^/]*(?=\.html$)") + "]");
}
This prints:
[sdf-as]
[yo-sdf]
[noslash]
[]
How the pattern works
There are 3 parts:
(?<=/|^) : a positive lookbehind to assert that we're preceded by a slash /, or we're at the beginning of the string
[^/]* : match anything but slashes
(?=\.html$): a positive lookahead to assert that we're followed by ".html" (literally on the dot)
References
regular-expressions.info/Lookarounds, Anchors
A non-regex alternative
Knowing regex is good, and it can do wonderful things, but you should always know how to do basic string manipulations without it. Here's a non-regex solution:
static String getFilename(String url, String ext) {
if (url.EndsWith(ext)) {
int k = url.LastIndexOf("/");
return url.Substring(k + 1, url.Length - ext.Length - k - 1);
} else {
return "";
}
}
Then you'd call it as:
getFilename(url, ".html")
API links
String.Substring, EndsWith, and LastIndexOf
Attachments
Source code and output on ideone.com
Try this:
string url = "http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html";
Match match = Regex.Match(url, #"/([^/]+)\.html$");
if (match.Success)
{
string result = match.Groups[1].Value;
Console.WriteLine(result);
}
Result:
sdf-as
However it would be a better idea to use the System.URI class to parse the string so that you correctly handle things like http://example.com/foo.html?redirect=bar.html.
using System.Text.RegularExpressions;
Regex pattern = new Regex(".*\/([a-z\-]+)\.html");
Match match = pattern.Match("http://sfsdf.com/sdfsdf-sdfsdf/sdf-as.html");
if (match.Success)
{
Console.WriteLine(match.Value);
}
else
{
Console.WriteLine("Not found :(");
}
This one makes the slash and dot parts optional, and allows the file to have any extension:
new Regex(#"^(.*/)?(?<fileName>[^/]*?)(\.[^/.]*)?$", RegexOptions.ExplicitCapture);
But I still prefer Substring(LastIndexOf(...)) because it is far more readable.

Categories