Why isn't this C# Regex working?

Why isn't this C# Regex working? - c#

I have the following string (from a large HTML string):
href="/cgi-bin/pin.cgi?pin=94841&sid=9548.1386389012.v1"><
And here is my code:
var sids = Regex.Matches( htmlCode, "sid=(.)\">" );
I'm not pulling back any results. Is my Regex correct?

This is what it should be:
var str = #"href=""/cgi-bin/pin.cgi?pin=94841&sid=9548.1386389012.v1"">";
var sid = Regex.Match(str, #"sid=([^""]*)");
Console.WriteLine (sid.Groups[1].Value);
What you originally posted was wrong because "." acts as a wildcard, and the way you presented it meant that it would only capture 1 character, the problem with wildcards is that they're difficult to stop till you reach the end of a line, so never use them unless you have to.

. match only single character. To match multiple character you should use * or + modifier: (.+); or more preferably non-greedy version: (.+?)
Use #"verbatim string literal" if possible for regular expression.
var sids = Regex.Matches(htmlCode, #"sid=(.+?)""");
See demo run.

I think you are pretty close. Consider the following minor change to your regex...
sid=.*?\">
Good Luck!

Related

Replacing text with RegEx and C# isn't working the way I need it to

I’m looking for a way to go through a string and replace all instances where the second and third characters will always be different but the rest will be the same. For example, if I had:
"ú07ú" to be replaced with "ú07 ú"
"ú1Eú" to be replaced with "ú1E ú"
"ú12ú" to be replaced with "ú12 ú"
I know I should use Regular Expressions, but they baffle me. I’m pretty sure the syntax will be something like:
Content = Regex.Replace(Content, #"ú...", “ú.. ú");
But obviously this isn’t working. Can any RegEx gurus lend a hand please?
Thanks

Looks like you want:
Content = Regex.Replace(Content, #"ú([^ú]+)ú", #"ú$1 ú");
This regex:
ú([^ú]+)ú
Means: match ú, then at least one character that isn't ú (and capture this part), then another ú. If you want it to only match exactly two characters in the middle, then change [^ú]+ to [^ú]{2}
Then we replace the whole thing by:
ú$1 ú
Which is: ú, then the captured part of the string, then a space and ú again.

I'm totally unfamiliar with C#, but from a regex perspective you need capturing groups.
"ú..." needs to be "(ú...)(.)" and “ú.. ú" needs to be "$1 $2" assuming C# uses the standard regex notation for capturing groups.

[TestMethod]
public void regex_test()
{
string expr = #"(?<firstThree>.{3})(?<lastOne>.{1})";
string replace = "${firstThree} ${lastOne}";
string first = "u84u";
string firstResult = "u84 u";
Assert.AreEqual<string>(firstResult, Regex.Replace(first, expr, replace));
}

Why is "$1" ending up in my Regex.Replace() result?

I am trying to write a regular expression to rewrite URLs to point to a proxy server.
bodystring = Regex.Replace(bodystring, "(src='/+)", "$1" + proxyStr);
The idea of this expression is pretty simple, basically find instances of "src='/" or "src='//" and insert a PROXY url at that point. This works in general but occasionally I have found cases where a literal "$1" will end up in the result string.
This makes no sense to me because if there was no match, then why would it replace anything at all?
Unfortunately I can't give a simple example of this at it only happens with very large strings so far, but I'd like to know conceptually what could make this sort of thing happen.
As an aside, I tried rewriting this expression using a positive lookbehind as follows:
bodystring = Regex.Replace(bodystring, "(?<=src='/+)", proxyStr);
But this ends up with proxyStr TWICE in the output if the input string contains "src='//". This also doesn't make much sense to me because I thought that "src=" would have to be present in the input twice in order to get proxyStr to end up twice in the output.

When proxyStr = "10.15.15.15:8008/proxy?url=http://", the replacement string becomes "$110.15.15.15:8008/proxy?url=http://". It contains a reference to group number 110, which certainly does not exist.
You need to make sure that your proxy string does not start in a digit. In your case you can do it by not capturing the last slash, and changing the replacement string to "$1/"+proxyStr, like this:
bodystring = Regex.Replace(bodystring, "(src='/*)/", "$1/" + proxyStr);
Edit:
Rawling pointed out that .NET's regexp library addresses this issue: you can enclose 1 in curly braces to avoid false aliasing, like this:
bodystring = Regex.Replace(bodystring, "(src='/+)", "${1}" + proxyStr);

What you are doing can't be done. .NET has trouble when interpolating variable like this. Your problem is that your Proxy string starts with a number : proxyStr = "10.15.15.15:8008/proxy?url=http://"
When you combine this with your $1, the regex thing it has to look for backreference $110 which doesn't exist.
See what I mean here.
You can remedy this by matching something else, or by matching and constructing the replacement string manually etc. Use what suits you best.

Based on dasblinkenlights answer (already +1) the solution is this:
bodystring = Regex.Replace(bodystring, "(src='/+)", "${1}" + proxyStr);
This ensures that the group 1 is used and not a new group number is build.

In the second version, I guess proxyStr appears twice because you're inserting it once more. Try
string s2 = Regex.Replace(s, "((?<=src='/+))", proxyStr);

How to get numbers from http://www.example.com/images/business/113.jpg

Using regex I need to get the numbers between the last "/" and ".jpg" (this actually might be .png, .gif, etc) in this:
http://www.example.com/images/business/113.jpg
Any ideas?
Thank you

Easy enough using split:
var fileName = myUrl.Split('/')[myUrl.Split('/').Length - 1];
var justTheFileName = fileName.Split('.')[0];

Regular expression are absolute unnecessary here.
Just do:
using System.IO;
var fileName = Path.GetFileNameWithoutExtension("http://www.example.com/images/business/113.jpg");
Take a look at the documentation of the method GetFileNameWithoutExtension:
Returns the file name of the specified path string without the extension.
Edit:
If you still want to use regex for this purpose, the following one will work:
//Both regexes will work here
var pattern = #"/([^/]*)\.jpg"
var pattern2 = #".*/(.*)\.jpg"
var matches = Regex.Matches(pattern, "http://www.example.com/images/business/113.jpg");
if (matches.Count > 0)
Console.WriteLine(matches[0].Groups[1].Count);
Note:
I didn't compile the regex. This was a small & fast example.

I see that you found a solution matches a single digit in your URL 3 times, but not the entire number. You may want to go with something more "readable" (heh) like this:
(?<=\/)\d+(?=\.\w+$)
If you're trying to capture the number and use it, throw it into a group:
(?<=\/)(\d+)(?=\.\w+$)

Got it!! (?=[\s\S]*?\\.)(?![\s\S]+?/)[0-9]
PS: The regular expression workbench by microsoft KICKS ASS

You could use the following regular expression:
/(?<number>\d+)\.jpg$
It will capture the number into the named group 'number'. The regular expression works as follows:
Search for /
Capture 1 or more times a digit (0-9) to the named group 'number'
Check for .jpg
$ matches the end of the string.
Matching the end makes stuff a lot easier. I don't believe look-ahead or look-behind is necessary.

C# Regex: Capture everything up to

I want to capture everything up to (not including) a # sign in a string. The # character may or may not be present (if it's not present, the whole string should be captured).
What would the RegEx and C# code for this by? I've tried: ([^#]+)(?:#) but it doesn't seem to work.

Not a regex but an alternative to try. A regex can be used though, but for this particular situation I prefer this method.
string mystring = "DFASDFASFASFASFAF#322323";
int length = (mystring.IndexOf('#') == -1) ? mystring.Length : mystring.IndexOf('#');
string new_mystring = mystring.Substring(0, length);

Try:
.*(?=#)
I think that should work
EDIT:
^[^#]*
In code:
string match = Regex.Match(input,"^[^#]*").Value;

What's wrong with something as simple as:
[^#]*
Just take the first match?

How to check if a string starts and ends with specific strings?

I have a string like:
string str = "https://abce/MyTest";
I want to check if the particular string starts with https:// and ends with /MyTest.
How can I acheive that?

This regular expression:
^https://.*/MyTest$
will do what you ask.
^ matches the beginning of the string.
https:// will match exactly that.
.* will match any number of characters (the * part) of any kind (the . part). If you want to make sure there is at least one character in the middle, use .+ instead.
/MyTest matches exactly that.
$ matches the end of the string.
To verify the match, use:
Regex.IsMatch(str, #"^https://.*/MyTest$");
More info at the MSDN Regex page.

Try the following:
var str = "https://abce/MyTest";
var match = Regex.IsMatch(str, "^https://.+/MyTest$");
The ^ identifier matches the start of the string, while the $ identifier matches the end of the string. The .+ bit simply means any sequence of chars (except a null sequence).
You need to import the System.Text.RegularExpressions namespace for this, of course.

I want to check if the particular string starts with "https://" and ends with "/MyTest".
Well, you could use regex for that. But it's clearer (and probably quicker) to just say what you mean:
str.StartsWith("https://") && str.EndsWith("/MyTest")
You then don't have to worry about whether any of the characters in your match strings need escaping in regex. (For this example, they don't.)

In .NET:
^https://.*/MyTest$

Try Expresso, good for building .NET regexes and teaching you the syntax at the same time.

HAndy tool for genrating regular expressions
http://txt2re.com/

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why isn't this C# Regex working? - c#

I have the following string (from a large HTML string): href="/cgi-bin/pin.cgi?pin=94841&sid=9548.1386389012.v1">< And here is my code: var sids = Regex.Matches( htmlCode, "sid=(.)\">" ); I'm not pulling back any results. Is my Regex correct?

. match only single character. To match multiple character you should use * or + modifier: (.+); or more preferably non-greedy version: (.+?) Use #"verbatim string literal" if possible for regular expression. var sids = Regex.Matches(htmlCode, #"sid=(.+?)"""); See demo run.

I think you are pretty close. Consider the following minor change to your regex... sid=.*?\"> Good Luck!

Related

Replacing text with RegEx and C# isn't working the way I need it to

Why is "$1" ending up in my Regex.Replace() result?

How to get numbers from http://www.example.com/images/business/113.jpg

C# Regex: Capture everything up to

How to check if a string starts and ends with specific strings?

Categories

Resources