C# Regex, extract strings by reference string

C# Regex, extract strings by reference string - c#

Edit: Solution by #Heinzi
https://stackoverflow.com/a/1731641/87698
I got two strings, for example someText-test-stringSomeMoreText? and some kind of pattern string like this one {0}test-string{1}?.
I'm trying to extract the substrings from the first string that match the position of the placeholders in the second string.
The resulting substrings should be: someText- and SomeMoreText.
I tried to extract with Regex.Split("someText-test-stringSomeMoreText?", "[.]*test-string[.]*\?". However this doesn't work.
I hope somebody has another idea...

One option you have is to use named groups:
(?<prefix>.*)test-string(?<suffix>.*)\?
This will return 2 groups containing the wanted prefix and the suffix.
var match = Regex.Match("someText-test-stringSomeMoreText?",
#"(?<prefix>.*)test-string(?<suffix>.*)\?");
Console.WriteLine(match.Groups["prefix"]);
Console.WriteLine(match.Groups["suffix"]);

I got a solution, at least its a bit dynamical.
First I split up the pattern string {0}test-string{1}? with
string[] patternElements = Regex.Split(inputPattern, #"(\\\{[a-zA-Z0-9]*\})");
Then I spit up the input string someText-test-stringSomeMoreText? with
string[] inputElements = inputString.Split(patternElements, StringSplitOptions.RemoveEmptyEntries);
Now the inputElements are the text pieces corresponding to the placeholders {0},{1},...

Related

Replacing Characters in String C#

I need to replace a series of characters in a file name in C#. After doing many searches, I can't find a good example of replacing all characters between two specific ones. For example, the file name would be:
"TestExample_serialNumber_Version_1.0_.pdf"
All I want is the final product to be "serialNumber".
Is there a special character I can use to replace all characters up to and including the first underscore? Then I can run the the replace method again to replace everything after the and including the next underscore? I've heard of using regex but I've done something similar to this in Java and it seemed much easier to accomplish. I must not be understanding the string formats in C#.
I would imagine it would look something like:
name.Replace("T?_", "");//where ? equals any characters between
name.Replace("_?", "");

Rather than "replace", just use a regex to extract the part you want. Something like:
(?:TestExample_)(.*)(?:_Version)
Would give you the serialnumber part in a capture group.
Or if TestExample is variable (in which case, you need your question to be more specific about exactly what patten you are matching) you could probably just do:
(?:_)(.*)(?:_Version)
Assuming the Version part is constant.
In C#, you could do something like:
var regex1 = new Regex("(?:TestExample_)(.*)(?:_Version)");
string testString = "TestExample_serialNumber_Version_1.0_.pdf";
string serialNum = regex1.Match(testString).Groups[1].Value;

As an alternative to regex, you could find the first instance of an underscore then find the next instance of an underscore and take the substring between those indices.
string myStr = "TestExample_serialNumber_Version_1.0_.pdf";
string splitStr = "_";
int startIndex = myStr.IndexOf(splitStr) + 1;
string serialNum = myStr.Substring(startIndex, myStr.IndexOf(splitStr, startIndex) - startIndex);

Regex.Split command in c#

I am trying to use Regex.SPlit to split a a string in order to keep all of its contents, including the delimiters i use. The string is a math problem. For example, 5+9/2*1-1. I have it working if the string contains a + sign but I don't know how to add more then one to the delimiter list. I have looked online at multiple pages but everything I try gives me errors. Here is the code for the Regex.Split line I have: (It works for the plus, Now i need it to also do -,*, and /.
string[] everything = Regex.Split(inputBox.Text, #"(\+)");

Use a character class to match any of the math operations: [*/+-]
string input = "5+9/2*1-1";
string pattern = #"([*/+-])";
string[] result = Regex.Split(input, pattern);
Be aware that character classes allow ranges, such as [0-9], which matches any digit from 0 up to 9. Therefore, to avoid accidental ranges, you can escape the - or place it at either the beginning or end of the character class.

C# Trouble with Regex.Replace

Been scratching my head all day about this one!
Ok, so I have a string which contains the following:
?\"width=\"1\"height=\"1\"border=\"0\"style=\"display:none;\">');
I want to convert that string to the following:
?\"width=1height=1border=0style=\"display:none;\">');
I could theoretically just do a String.Replace on "\"1\"" etc. But this isn't really a viable option as the string could theoretically have any number within the expression.
I also thought about removing the string "\"", however there are other occurrences of this which I don't want to be replaced.
I have been attempting to use the Regex.Replace method as I believe this exists to solve problems along my lines. Here's what I've got:
chunkContents = Regex.Replace(chunkContents, "\".\"", ".");
Now that really messes things up (It replaces the correct elements, but with a full stop), but I think you can see what I am attempting to do with it. I am also worrying that this will only work for single numbers (\"1\" rather than \"11\").. So that led me into thinking about using the "*" or "+" expression rather than ".", however I foresaw the problem of this picking up all of the text inbetween the desired characters (which are dotted all over the place) whereas I obviously only want to replace the ones with numeric characters in between them.
Hope I've explained that clearly enough, will be happy to provide any extra info if needed :)

Try this
var str = "?\"width=\"1\"height=\"1234\"border=\"0\"style=\"display:none;\">');";
str = Regex.Replace(str , "\"(\\d+)\"", "$1");
(\\d+) is a capturing group that looks for one or more digits and $1 references what the group captured.

This works
String input = #"?\""width=\""1\""height=\""1\""border=\""0\""style=\""display:none;\"">');";
//replace the entire match of the regex with only what's captured (the number)
String result = Regex.Replace(input, #"\\""(\d+)\\""", match => match.Result("$1"));
//control string for excpected result
String shouldBe = #"?\""width=1height=1border=0style=\""display:none;\"">');";
//prints true
Console.WriteLine(result.Equals(shouldBe).ToString());

Spliting string into 3 parts

I'm trying to learn to split strings. I have a string, for example, Adams, John - 22.6.2001. What is the easiest way to split each of the following pieces of information into particular strings? I need the name, surname, and date.
This is the solution that I tried myself:
string st = "Adams, John - 22.6.2001"
st = st.Trim(); // To replace all possible white spaces? But I don't know how can I cut each of details into string.

what is the easiest way to trim each of these information into
particular strings: name, surname, date ?
Looks like you want to split sting based on , and space.
string[] splitArray = st.Split(new string[] { ",", " "}, StringSplitOptions.RemoveEmptyEntries);
EDIT:
As far as parsing names is concerned, you have to define some kind of rules that how your string would have names. For example your string could have multiple names, First, Last, Middle separated by commas, in that case the above statement will not give you the result that you need. You have to define some rules to make your input string consistent, and based on that you can use string.Split to get values.

You can do it using String.Split() to break the parts of the string into a string array. Trim() is used to remove white-space from the start and end of a string, so this can be used to tidy up the resulting strings.
string st = "Adams, John - 22.6.2001";
// first split on dash, to seperate name and date
string[] partsArray = st.Split('-');
// now split first part to get first and surname (trim surrounding whitespace)
string[] nameArray = partsArray[0].Split(',');
string firstName = nameArray[1].Trim();
string lastName = nameArray[0].Trim();
// get date from other part (again trim whitespace)
string dateAsString = partsArray[1].Trim();
Parsing text is a complex topic, but I think the question was just looking for an introduction. There are many edge cases and issues which you'd need to add to a parser to get close to 100% results for different name and date formats. If you were importing data like this in bulk, you would use a CSV file or similar to break up the parts before importing.

Use String.Split method and use separator as ,

Regex Word splitting in C#

I know similar questions have been asked before, but I can't find one that is like mine, or enough like mine to help me out :). So essentially I want to split up a string which contains a bunch of words, and I don't want to return any characters that are not words (this is the key problem I am struggling with, ignoring characters). This is how I define the problem:
What constitutes a word is a string of any character a-zA-Z only
(no numbers or anything else)
In between any word, there can be any number of random other characters
I want to get back a string[] containing only the words
eg: text: "apple^&**^orange1247pear"
I want to return: apple, orange, pear in an array.
The closest I have found I suppose is this:
Regex.Split("apple^orange7pear",#"([a-zA-Z]*)")
Which splits out the apple/orange/pear, but also returns a bunch of other junk and blank strings.
Anyone know how to stop the split function from returning certain parts of the string, or is that not possible?
Thanks in advance for any help you give me :)

Split should match the tokens between your words. In your regex you've added a group around the word, so it is included in the result, but that isn't desired in this case. Note that this regex matches anything besides valid words - anything that isn't an ASCII letter:
string[] words = Regex.Split(str, "[^a-zA-Z]+");
Another option is to match the words directly:
MatchCollection matches = Regex.Matches(str, "[a-zA-Z]+");
string[] words2 = matches.Cast<Match>().Select(m => m.Value).ToArray();
The second option is probably clearer, and will not include blank elements on the start or end of the array.

var splits = Regex.Split("aaa $$$bbb ccc", #"[^A-Za-z]+");
But to include non-latin letters, I would use this:
var splits = Regex.Split("aaa $$$bbb ccc", #"\P{L}+");

Try this:
Regex.Matches("kalle kula(/()&//()nisse8978971", #"[A-Za-z]+")
Using Matches() will collect only the words, Split() will divide the string which is not what you want.

The second option Kobi listed is better and easier to control. I use the following regular expression to locate common entities such as words, numbers, email addresses in a string it will.
var regex = new Regex(#"[\p{L}\p{N}\p{M}]+(?:[-.'´_#][\p{L}|\p{N}|\p{M}]+)*", RegexOptions.Compiled);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Regex, extract strings by reference string - c#

Related

Replacing Characters in String C#

Regex.Split command in c#

C# Trouble with Regex.Replace

Spliting string into 3 parts

Regex Word splitting in C#

Categories

Resources