C# Regex: Capture everything up to - c#

I want to capture everything up to (not including) a # sign in a string. The # character may or may not be present (if it's not present, the whole string should be captured).
What would the RegEx and C# code for this by? I've tried: ([^#]+)(?:#) but it doesn't seem to work.

Not a regex but an alternative to try. A regex can be used though, but for this particular situation I prefer this method.
string mystring = "DFASDFASFASFASFAF#322323";
int length = (mystring.IndexOf('#') == -1) ? mystring.Length : mystring.IndexOf('#');
string new_mystring = mystring.Substring(0, length);

Try:
.*(?=#)
I think that should work
EDIT:
^[^#]*
In code:
string match = Regex.Match(input,"^[^#]*").Value;

What's wrong with something as simple as:
[^#]*
Just take the first match?

Related

Replacing Characters in String C#

I need to replace a series of characters in a file name in C#. After doing many searches, I can't find a good example of replacing all characters between two specific ones. For example, the file name would be:
"TestExample_serialNumber_Version_1.0_.pdf"
All I want is the final product to be "serialNumber".
Is there a special character I can use to replace all characters up to and including the first underscore? Then I can run the the replace method again to replace everything after the and including the next underscore? I've heard of using regex but I've done something similar to this in Java and it seemed much easier to accomplish. I must not be understanding the string formats in C#.
I would imagine it would look something like:
name.Replace("T?_", "");//where ? equals any characters between
name.Replace("_?", "");
Rather than "replace", just use a regex to extract the part you want. Something like:
(?:TestExample_)(.*)(?:_Version)
Would give you the serialnumber part in a capture group.
Or if TestExample is variable (in which case, you need your question to be more specific about exactly what patten you are matching) you could probably just do:
(?:_)(.*)(?:_Version)
Assuming the Version part is constant.
In C#, you could do something like:
var regex1 = new Regex("(?:TestExample_)(.*)(?:_Version)");
string testString = "TestExample_serialNumber_Version_1.0_.pdf";
string serialNum = regex1.Match(testString).Groups[1].Value;
As an alternative to regex, you could find the first instance of an underscore then find the next instance of an underscore and take the substring between those indices.
string myStr = "TestExample_serialNumber_Version_1.0_.pdf";
string splitStr = "_";
int startIndex = myStr.IndexOf(splitStr) + 1;
string serialNum = myStr.Substring(startIndex, myStr.IndexOf(splitStr, startIndex) - startIndex);

Why isn't this C# Regex working?

I have the following string (from a large HTML string):
href="/cgi-bin/pin.cgi?pin=94841&sid=9548.1386389012.v1"><
And here is my code:
var sids = Regex.Matches( htmlCode, "sid=(.)\">" );
I'm not pulling back any results. Is my Regex correct?
This is what it should be:
var str = #"href=""/cgi-bin/pin.cgi?pin=94841&sid=9548.1386389012.v1"">";
var sid = Regex.Match(str, #"sid=([^""]*)");
Console.WriteLine (sid.Groups[1].Value);
What you originally posted was wrong because "." acts as a wildcard, and the way you presented it meant that it would only capture 1 character, the problem with wildcards is that they're difficult to stop till you reach the end of a line, so never use them unless you have to.
. match only single character. To match multiple character you should use * or + modifier: (.+); or more preferably non-greedy version: (.+?)
Use #"verbatim string literal" if possible for regular expression.
var sids = Regex.Matches(htmlCode, #"sid=(.+?)""");
See demo run.
I think you are pretty close. Consider the following minor change to your regex...
sid=.*?\">
Good Luck!

Replacing text with RegEx and C# isn't working the way I need it to

I’m looking for a way to go through a string and replace all instances where the second and third characters will always be different but the rest will be the same. For example, if I had:
"ú07ú" to be replaced with "ú07 ú"
"ú1Eú" to be replaced with "ú1E ú"
"ú12ú" to be replaced with "ú12 ú"
I know I should use Regular Expressions, but they baffle me. I’m pretty sure the syntax will be something like:
Content = Regex.Replace(Content, #"ú...", “ú.. ú");
But obviously this isn’t working. Can any RegEx gurus lend a hand please?
Thanks
Looks like you want:
Content = Regex.Replace(Content, #"ú([^ú]+)ú", #"ú$1 ú");
This regex:
ú([^ú]+)ú
Means: match ú, then at least one character that isn't ú (and capture this part), then another ú. If you want it to only match exactly two characters in the middle, then change [^ú]+ to [^ú]{2}
Then we replace the whole thing by:
ú$1 ú
Which is: ú, then the captured part of the string, then a space and ú again.
I'm totally unfamiliar with C#, but from a regex perspective you need capturing groups.
"ú..." needs to be "(ú...)(.)" and “ú.. ú" needs to be "$1 $2" assuming C# uses the standard regex notation for capturing groups.
[TestMethod]
public void regex_test()
{
string expr = #"(?<firstThree>.{3})(?<lastOne>.{1})";
string replace = "${firstThree} ${lastOne}";
string first = "u84u";
string firstResult = "u84 u";
Assert.AreEqual<string>(firstResult, Regex.Replace(first, expr, replace));
}

How to check if a string starts and ends with specific strings?

I have a string like:
string str = "https://abce/MyTest";
I want to check if the particular string starts with https:// and ends with /MyTest.
How can I acheive that?
This regular expression:
^https://.*/MyTest$
will do what you ask.
^ matches the beginning of the string.
https:// will match exactly that.
.* will match any number of characters (the * part) of any kind (the . part). If you want to make sure there is at least one character in the middle, use .+ instead.
/MyTest matches exactly that.
$ matches the end of the string.
To verify the match, use:
Regex.IsMatch(str, #"^https://.*/MyTest$");
More info at the MSDN Regex page.
Try the following:
var str = "https://abce/MyTest";
var match = Regex.IsMatch(str, "^https://.+/MyTest$");
The ^ identifier matches the start of the string, while the $ identifier matches the end of the string. The .+ bit simply means any sequence of chars (except a null sequence).
You need to import the System.Text.RegularExpressions namespace for this, of course.
I want to check if the particular string starts with "https://" and ends with "/MyTest".
Well, you could use regex for that. But it's clearer (and probably quicker) to just say what you mean:
str.StartsWith("https://") && str.EndsWith("/MyTest")
You then don't have to worry about whether any of the characters in your match strings need escaping in regex. (For this example, they don't.)
In .NET:
^https://.*/MyTest$
Try Expresso, good for building .NET regexes and teaching you the syntax at the same time.
HAndy tool for genrating regular expressions
http://txt2re.com/

Matching an (easy??) regular expression using C#'s regex

Ok sorry this might seem like a dumb question but I cannot figure this thing out :
I am trying to parse a string and simply want to check whether it only contains the following characters : '0123456789dD+ '
I have tried many things but just can't get to figure out the right regex to use!
Regex oReg = new Regex(#"[\d dD+]+");
oReg.IsMatch("e4");
will return true even though e is not allowed...
I've tried many strings, including Regex("[1234567890 dD+]+")...
It always works on Regex Pal but not in C#...
Please advise and again i apologize this seems like a very silly question
Try this:
#"^[0-9dD+ ]+$"
The ^ and $ at the beginning and end signify the beginning and end of the input string respectively. Thus between the beginning and then end only the stated characters are allowed. In your example, the regex matches if the string contains one of the characters even if it contains other characters as well.
#comments: Thanks, I fixed the missing + and space.
Oops, you forgot the boundaries, try:
Regex oReg = new Regex(#"^[0-9dD +]+$");
oReg.IsMatch("e4");
^ matches the begining of the text stream, $ matches the end.
It is matching the 4; you need ^ and $ to terminate the regex if you want a full match for the entire string - i.e.
Regex re = new Regex(#"^[\d dD+]+$");
Console.WriteLine(re.IsMatch("e4"));
Console.WriteLine(re.IsMatch("4"));
This is because regular expressions can also match parts of the input, in this case it just matches the "4" of "e4". If you want to match a whole line, you have to surround the regex with "^" (matches line start) and "$" (matches line end).
So to make your example work, you have to write is as follows:
Regex oReg = new Regex(#"^[\d dD+]+$");
oReg.IsMatch("e4");
I believe it's returning True because it's finding the 4. Nothing in the regex excludes the letter e from the results.
Another option is to invert everything, so it matches on characters you don't want to allow:
Regex oReg = new Regex(#"[^0-9dD+]");
!oReg.IsMatch("e4");

Categories