I am trying to ensure that a list of phrases start on their own line by finding them and replacing them with \n + the phrase. eg
your name: joe your age: 28
becomes
my name: joe
your age: 28
I have a file with phrases that i pull and loop through and do the replace. Except as there are 2 words in some phrases i use \b to signify where the phrase starts and ends.
This doesn't seem to work, anybody know why?
example - String is 'Name: xxxxxx' does not get edited.
output = output.Replace('\b' + "Name" + '\b', "match");
Using regular expressions, accounts for any number of words with any number of spaces:
using System.Text.RegularExpressions;
Regex re = new Regex("(?<key>\\w+(\\b\\s+\\w+)*)\\s*:\\s*(?<value>\\w+)");
MatchCollection mc = re.Matches("your name: joe your age: 28 ");
foreach (Match m in mc) {
string key = m.Groups("key").Value;
string value = m.Groups("value").Value;
//accumulate into a list, but I'll just write to console
Console.WriteLine(key + " : " + value);
}
Here is some explanation:
Suppose what you want to the left of the colon (:) is called a key, and what is to the right - a value.
These key/value pairs are separated by at least once space. Because of this, value has be exactly one word (otherwise we'd have ambiguity).
The above regular expression uses named groups, to make code more readable.
got it
for (int headerNo=0; headerNo<headersArray.Length; headerNo++)
{
string searchPhrase = #"\b" + PhraseArray[headerNo] + #"\b";
string newPhrase = "match";
output = Regex.Replace(output, searchPhrase, newPhrase); }
Following the example you can do that :
output = output.Replace("your", "\nyour");
Related
I currently have a string which looks like this when it is returned :
//This is the url string
// the-great-debate---toilet-paper-over-or-under-the-roll
string name = string.Format("{0}",url);
name = Regex.Replace(name, "-", " ");
And when I perform the following Regex operation it becomes like this :
the great debate toilet paper over or under the roll
However, like I mentioned in the question, I want to be able to apply regex to the url string so that I have the following output:-
the great debate - toilet paper over or under the roll
I would really appreciate any assistance.
[EDIT] However, not all the strings look like this, some of them just have a single hyphen so the above method work
world-water-day-2016
and it changes to
world water day 2016
but for this one:
the-great-debate---toilet-paper-over-or-under-the-roll
I need a way to check if the string has 3 hyphens than replace those 3 hyphens with [space][hyphen][space]. And than replace all the remaining single hyphens between the words with space.
First of all, there is always a very naive solution to this kind of problem: you replace your specific matches in context with some chars that are not usually used in the current environment and after replacing generic substrings you may replace the temporary substrings with the necessary exception.
var name = url.Replace("---", "[ \uFFFD ]").Replace("-", " ").Replace("[ \uFFFD ]", " - ");
You may also use a regex based replacement that matches either a 3-hyphen substring capturing it, or just match a single hyphen, and then check if Group 1 matched inside a match evaluator (the third parameter to Regex.Replace can be a Match evaluator method).
It will look like
var name = Regex.Replace(url, #"(---)|-", m => m.Groups[1].Success ? " - " : " ");
See the C# demo.
So, when (---) part matches, the 3 hyphens are put into Group 1 and the .Success property is set to true. Thus, m => m.Groups[1].Success ? " - " : " " replaces 3 hyphens with space+-+space and 1 hyphen (that may be actually 1 of the 2 consecutive hyphens) with a space.
Here's a solution using LINQ rather than Regex:
var str = "the-great-debate---toilet-paper-over-or-under-the-roll";
var result = str.Split(new string[] {"---"}, StringSplitOptions.None)
.Select(s => s.Replace("-", " "))
.Aggregate((c,n) => $"{c} - {n}");
// result = "the great debate - toilet paper over or under the roll"
Split the string up based on the ---, then remove hyphens from each substring, then join them back together.
The easy way:
name = Regex.Replace(name, "\b-|-\b", " ");
The show-off way:
name = Regex.Replace(name, "(\b)?-(?(1)|\b)", " ");
I want to retrieve characters separated by a specific delimiter.
Example :
Here, I want to access the string between the " " delimiters. But I want the 2nd set of characters between "".
abc"def"ghi"jklm // Output : ghi
"hello" yes "world" // output : world
How can I get that?
I know we can use split. But sometimes the string might not start with " character.
Can anyone please help me with this?
You can just find the first quote, and use your approach from there:
var firstQuote = str.IndexOf('"');
var startsWithQuote = str.Substring(firstQuote);
string valueStr = "abc\"def\"ghi\"jklm";
var result = valueStr.Split('"')[2];
Console.WriteLine(result);
https://dotnetfiddle.net/T3fMof
Obviously check for the array elements before accessing them
You can use regular expressions to match them:
var test = "abc\"def\"ghi\"jklm";
var test2 = "\"hello\" yes \"world\"";
var match1 = Regex.Matches(test, ".+\"(.+)\"");
var match2 = Regex.Matches(test2, ".+\"(.+)\"");
Console.WriteLine("Match1: " + match1[0].Groups[1].Captures[0]);
Console.WriteLine("Match2: " + match2[0].Groups[1].Captures[0]);
// Match1: ghi
// Match2: world
I'm trying to parse some source files for some standard information.
The source files could look like this:
// Name: BoltBait
// Title: Some cool thing
or
// Name :
// Title : Another thing
or
// Title:
// Name:
etc.
The code I'm using to parse for the information looks like this:
Regex REName = new Regex(#"\/{2}\s*Name\s*:\s*(?<nlabel>.*)\n", RegexOptions.IgnoreCase);
Match mname = REName.Match(ScriptText); // entire source code file
if (mname.Success)
{
Name.Text = mname.Groups["nlabel"].Value.Trim();
}
Which works fine if the field has information. It doesn't work if the field is left blank.
For example, in the third example above, the Title field returns a match of "// Name:" and I want it to return the empty string.
I need help from a regex expert.
I thought the regex was too greedy, so I tried the following expression:
#"\/{2}\s*Name\s*:\s*(?<nlabel>.*?)\n"
However, it didn't help.
You can also use a class subtraction to avoid matching newline symbols:
//[\s-[\r\n]]*Name[\s-[\r\n]]*:[\s-[\r\n]]*(?<nlabel>.*)(?=\r?\n|$)
Note that:
[\s-[\r\n]]* - Matches any whitespace excluding newline symbols (a character class subtraction is used)
(?=\r?\n|$) - A positive look-ahead that checks if there is a line break or the end of the string.
See regex demo, output:
\s includes line breaks, which is not wanted here.
It should suffice to match tabs and spaces explicitly after :
\/{2}\s*Name\s*:[\t ]*(?<nlabel>.*?)\n
This returns the empty string correctly in your third example (for both name and title).
My approach is to use an alternate in a non-capturing group to match the label from the colon to the end of the line. This matches either anything to the end of the line, or nothing.
var text1 = "// Name: BoltBait" + Environment.NewLine + "// Title: Some cool thing" + Environment.NewLine;
var text2 = "// Name :" + Environment.NewLine + "// Title : Another thing" + Environment.NewLine;
var text3 = "// Title:" + Environment.NewLine + "// Name:" + Environment.NewLine;
var texts = new List<string>() { text1, text2, text3 };
var options = RegexOptions.IgnoreCase | RegexOptions.Multiline;
var regex = new Regex("^//\\s*?Name\\s*?:(?<nlabel>(?:.*$|$))", options );
foreach (var text in texts){
var match = regex.Match( text );
Console.WriteLine( "|" + match.Groups["nlabel"].Value.Trim() + "|" );
}
Produces:
|BoltBait|
||
||
In the middle of a long string, I am looking for "No. 1234. "
The number (1234) in my example above can be any length whole number. It also has to match on the space at the end.
So I am looking for examples:
1) This is a test No. 42. Hello Nice People
2) I have no idea wtf No. 1234412344124. I am doing.
I have figured out a way to match on this pattern with the following regex:
(No. [\d]{1,}. )'
What I cannot figure out, though, is how to do one simple thing when finding a match: Replace that last period with a darn comma!
So, with the two examples up above, I want to transform them into:
1) This is a test No. 42, Hello Nice People
2) I have no idea wtf No. 1234412344124, I am doing.
(Notice the commas now after the numbers)
How might one do this in C# and RegEx? Thank you!
EDIT:
Another way of looking at this is...
I can do this easily and have for years:
str = Replace(str, "Find this", "Replace it with this")
However, how can I do that by combining regex and the unknown portion of the string in the middle to replace the last period (not to be confused with the last character since the last character still needs to be a space)
This is a test No. 42. Hello Nice People
This is a test No. (some unknown length number). Hello Nice People
becomes
This is a test No. 42, Hello Nice People
This is a test No. (some unknown length number), Hello Nice People
(Notice the comma)
So you are essentially trying to match two adjacent groups, "\d+" and ". " then replace the second with ", ".
var r = new Regex(#"(\d+)(\. )");
var input = "This is a test No. 42. Hello Nice People";
var output = r.Replace(input, "$1, ");
Use the parenthesis to match two groups then with replace keep the first group and dump in the ", ".
Edit: derp, escape that period.
Edit - #1:
neilh's way is much better!
Ok, i know the code looks ugly.. i don't know how to edit the last char of a match directly in a regex
string[] stringhe = new string[5] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!",
"Nope"
};
Regex reg = new Regex(#"No.\s*([0-9]+)");
Match match;
int idx = 0;
StringBuilder builder;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + stringhe[idx]);
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
++idx;
}
Console.ReadKey(true);
If you want to edit the char after the number of if it's a ',':
int indexEnd = match.Groups[1].Index + match.Groups[1].Length;
if (stringa[indexEnd] == ',')
{
builder = new StringBuilder(stringa);
builder[indexEnd] = '.';
stringhe[idx] = builder.ToString();
Console.WriteLine("New String: " + stringhe[idx]);
}
Or, we can edit the Regex to detect only if the number is followed by a comma with (better anyway)
No.\s*([0-9]+),
I'm not the best at Regex, but this should do what you want.
No.\s+([0-9]+)
If you except zero or more whitespaces between No. {NUMBER} this Regex should do the work:
No.\s*([0-9]+)
An example of how can look C# code:
string[] stringhe = new string[4] {
"This is a test No. 42, Hello Nice People",
"I have no idea wtf No. 1234412344124. I am doing.",
"Very long No. 74385748957348957893458934; Hello World",
"Nope No. 48394839!!!"
};
Regex reg = new Regex(#"No.\s+([0-9]+)");
Match match;
int idx = 0;
foreach(string stringa in stringhe)
{
match = reg.Match(stringa);
if (match.Success)
{
Console.WriteLine("No. Stringa #" + idx + ": " + match.Groups[1].Value);
}
++idx;
}
Here is the code :
private string Format(string input)
{
Match m = new Regex("No. [0-9]*.").Match(input);
int targetIndex = m.Index + m.Length - 1;
return input.Remove(targetIndex, 1).Insert(targetIndex, ",");
}
Doing a search for different strings using wildcards, such as doing a search for test0? (there is a space after the ?). The strings the search produces are:
test01
test02
test03
(and so on)
The replacement text should be for example:
test0? -
The wildcard above in test0? - represents the 1, 2, or 3...
So, the replacement strings should be:
test01 -
test02 -
test03 -
string pattern = WildcardToRegex(originalText);
fileName = Regex.Replace(originalText, pattern, replacementText);
public string WildcardToRegex(string pattern)
{
return "^" + System.Text.RegularExpressions.Regex.Escape(pattern).
Replace("\\*", ".*").Replace("\\?", ".") + "$";
}
The problem is saving the new string with the original character(s) plus the added characters. I could search the string and save the original with some string manipulation, but that seems like too much overhead. There has to be an easier way.
Thanks for any input.
EDIT:
Search for strings using the wildcard ?
Possible string are:
test01 someText
test02 someotherText
test03 moreText
Using Regex, the search string patter will be:
test0? -
So, each string should then read:
test01 - someText
test02 - someotherText
test03 - moreText
How to keep the character that was replaced by the regex wildcard '?'
As my code stands, it will come out as test? - someText
That is wrong.
Thanks.
EDIT Num 2
First, thanks everyone for their answers and direction.
It did help and lead me to the right track and now I can better ask the exact question:
It has to do with substitution.
Inserting text after the Regex.
The sample string I gave, they may not always be in that format. I have been looking into substitution but just can't seem to get the syntax right. And I am using VS 2008.
Any more suggestions?
Thanks
If you want to replace "test0? " with "test0? -", you would write:
string bar = Regex.Replace(foo, "^test0. ", "$0- ");
The key here is the $0 substitution, which will include the matched text.
So if I understand your question correctly, you just want your replacementText to be "$0- ".
If I understand the question correctly, couldn't you just use a match?
//Convert pattern to regex (I'm assuming this can be done with your "originalText")
Regex regex = pattern;
//For each match, replace the found pattern with the original value + " -"
foreach (Match m in regex.Matches)
{
RegEx.Replace(pattern, m.Groups[0].Value + " -");
}
So I'm not 100% clear on what you're doing, but I'll give it a try.
I'm going with the assumption that you want to use "file wildcards" (?/*) and search for a set of values that match (while retaining the values stored using the placeholder itself), then replace it with the new value (re-inserting those placeholders). given that, and probably a lot of overkill (since your requirement is kind of weird) here's what I came up with:
// Helper function to turn the file search pattern in to a
// regex pattern.
private Regex BuildRegexFromPattern(String input)
{
String pattern = String.Concat(input.ToCharArray().Select(i => {
String c = i.ToString();
return c == "?" ? "(.)"
: c == "*" ? "(.*)"
: c == " " ? "\\s"
: Regex.Escape(c);
}));
return new Regex(pattern);
}
// perform the actual replacement
private IEnumerable<String> ReplaceUsingPattern(IEnumerable<String> items, String searchPattern, String replacementPattern)
{
Regex searchRe = BuildRegexFromPattern(searchPattern);
return items.Where(s => searchRe.IsMatch(s)).Select (s => {
Match match = searchRe.Match(s);
Int32 m = 1;
return String.Concat(replacementPattern.ToCharArray().Select(i => {
String c = i.ToString();
if (m > match.Groups.Count)
{
throw new InvalidOperationException("Replacement placeholders exceeds locator placeholders.");
}
return c == "?" ? match.Groups[m++].Value
: c == "*" ? match.Groups[m++].Value
: c;
}));
});
}
Then, in practice:
String[] samples = new String[]{
"foo01", "foo02 ", "foo 03",
"bar0?", "bar0? ", "bar03 -",
"test01 ", "test02 ", "test03 "
};
String searchTemplate = "test0? ";
String replaceTemplate = "test0? -";
var results = ReplaceUsingPattern(samples, searchTemplate, replaceTemplate);
Which, from the samples list above, gives me:
matched: & modified to:
test01 test01 -
test02 test02 -
test03 test03 -
However, if you really want to save headaches you should be using replacement references. there's no need to re-invent the wheel. The above, with replacements, could have been changed to:
Regex searchRe = new Regex("test0(.*)\s");
samples.Select(x => searchRe.Replace(s, "test0$1-"));
You can catch any piece of your matched string and place anywhere in the replace statement, using symbol $ followed by the index of catched element (it starts at index 1).
You can catch element with parenthesis "()"
Example:
If I have several strings with testXYZ, being XYZ a 3-digit number, and I need to replace it, say, with testZYX, inverting the 3 digits, I would do:
string result = Regex.Replace(source, "test([0-9])([0-9])([0-9])", "test$3$2$1");
So, in your case, it can be done:
string result = Regex.Replace(source, "test0([0-9]) ", "test0$1 - ");