How to replace entire URL using Regex? [duplicate] - c#

This question already has answers here:
C# regex pattern to extract urls from given string - not full html urls but bare links as well
(3 answers)
Closed 9 years ago.
So far I have
messageText1 = Regex.Replace(messageText1, "(www|http|https)*?(com|.co.uk|.org)", "[URL OMITTED]");
With only the www, and without the bracks or http or https it works as intended
For example and input of Hey check out this site, www.google.com, it's really cool would output hey check out this site, [URL OMITTED], it's really cool
But if I put back in the or operators for the start of the URL, it only replaces the .com part of the input
Why won't it work?
Thanks

(www|http|https)*?(com|.co.uk|.org)
means www or http or https 0 to many times immediately followed by com .co.uk or .org.
So it would match for example httphttphttp.co.uk
Your intention was probably just to have a . before the *. Which then means it only looks for (www|http|https) once, then it matchs . (any character) 0 to many times.
You are also missing the . in .com. However, if you want to match a literal . you need to use \., since a . on its own means 'any character'.
With that in mind, the regex I think you were going for is:
(www|http|https).*?(\.com|\.co\.uk|\.org)

This should work better. It will also work for other TLDs that don't end with .com, .co.uk or .org:
messageText1 = Regex.Replace(messageText1, #"\b(?:http://|https://|www\.)\S+", "[URL OMITTED]");

Your expression is missing a . somewhere or (possibly better) a \S+
(www|http|https)\S*(com|\.co\.uk|\.org)
In C#:
Regex.Replace(messageText1, #"(www|http|https)\S*(com|\.co\.uk|\.org)", "[URL OMITTED]");
Note: you probably want to escape the .'s as well.

A simple version which i tried is as follows.
messageText1 = Regex.Replace(messageText1, #"(www)?(.)?[a-z]*.(com)", "[URL OMITTED]");
i tried this with
string messageText1 = " Hey check this out, http:\www.google.com,its cool";
string messageText1 = " Hey check this out, www.google.com,its cool";
string messageText1 = " Hey check this out, google.com,its cool";

Related

addslashes C# equivalent [duplicate]

This question already has answers here:
Regex escape with \ or \\?
(5 answers)
Convert C# string to JavaScript String
(3 answers)
Closed 4 years ago.
Salaam
I am looking for a proper version of a C# or Razor equivalent of PHP's addSlashes. That would add
\ to some\string => some\\string
Please provide help
Why I needed this
In my application a user entered Sometext in textbox was accidently pressed next time when page when data was populated though Razor it was like this
...append('<span>'+'#Model.value'+'</span>')
=> after compiling it becomes like this
...append('<span>'+'sometext\'+'</span>')
so with this scenario my javascript code broke at '\' because now single quote has started but not ending due to ``. So i thought instead of limiting characters i would rather add slashes through C# code
Thank You
You don't show any code you've already written, but this can be done by using [string.replace()] ( https://www.w3schools.com/jsref/jsref_replace.asp ) :
var str = "This is \\a test";
var replaced = str.replace("\\", "\\\\");
Whoops - you want the answer in C# , I misread your "javascript" tag. It's mostly the same:
string str = "This is \\a test";
string replaced = str.Replace("\\", "\\\\");
Also see C# String Replace
After the update, https://stackoverflow.com/a/27574931/34092 is most likely a much better answer.

Regex hangs trying to find match

I am trying to match an assignment string in VB code (as in I'm passing in text that is VB code into my program that's written in C#). The assignment string that I'm trying to match is something for example like
CustomClassInitializer(someParameter, anotherParameter, someOtherClassAsParameterWithInitialization()).SomeProperty = 7
and I realize that's rather complex, but it actually isn't far off from some of the real text I'm trying to match.
In order to do so I wrote a Regex. This Regex:
#"[\w,.]+\(([\w,.]*\(*,* *\)*)+ = "
which correctly matches. The problem is it becomes VERY slow (with timeouts), which I've researched and found is probably because of "backtracking". One of the suggested solutions to help with backtracking in general was to add "?>" to the regex, which I think would go in this position:
[\w,.]+\(?>([\w,.]*\(*,* *\)*)+ =
but this no longer matches properly.
I'm fairly new to Regex, so I imagine that there is a much better pattern. What is it please? Or how can I improve my times in general?
Helpful notes:
I'm only interested in position 0 of the string I'm searching for a
match in. My code is "if (isMatch && match.index == 0) { ... }. Can
I tell it to only check position 0 and if it's not a match move on?
The reason I use all the 0 or more things is the match could be as simple as CustomClass() = new CustomClass(), and as complicated as the above or perhaps a bit worse. I'm trying to get as many cases as possible.
This Regex is interested in "[\w,.]+(" and then "whatever may be inside the parentheses" (I tried to think of what all could be inside them based on the fact that it's valid VB code) until you get to the close parenthesis and then " = ". Perhaps I can use a wildcard for literally anything until it get's to ") = " in the string? - Like I said, fairly new to Regex.
Thanks in advance!
This seems to do what you want. Normally, I like to be more specific than .*, but it is working correctly. Note that I am using the Multi-line option.
^.*=\s*.+$
Here is a working example in RegExStorm.net example

Is there a better way to check if an entire string was matched? [duplicate]

This question already has answers here:
Match exact string
(3 answers)
Closed 3 years ago.
I'm parsing a text file line by line and for each line I have a special regex. However in one case a pattern is matching two lines. One that is a correct match and another line only partialy because a couple of values are optional.
Invalid match:
BNE1010/1000 HKG1955/2005 7/PLD/CLD/YLD
matches patial string (shouln't match this at all):
BNE1010/1000
Correct match (matches the entire string):
RG878A/21AUG15 GIG/BOG 1/RG/AV 3/AV 4/AV 5/RG 6/AV081C/22 7/CDC/YD 9/TP
The regex for this is quite long and contains several optionl groups:
^(?<FlightDesignator>([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))(?<OperationalSuffix>[A-Z])?(?<FlightIdentifierDate>\/(\d{2})([A-Z]{3})?(\d{2})?)?(\s(?<FlightLegsChangeIdentifier>(\/?[A-Z]{3})+)(?=(\s|$)))?(\s1(?<JointOperationAirlineDesignators>(\/.{2}[A-Z]?)+))?(\s3\/(?<AircraftOwner>([A-Z]{2}|.)))?(\s4\/(?<CockpitCrewEmployer>(.+?)(?=(?: \d\/|$))))?(\s5\/(?<CabinCrewEmployer>([A-Z]{2}|.)))?(?<OnwardFlight>\s6\/(([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))([A-Z])?(\/(\d{2})([A-Z]{3})?(\d{2})?)?)?(\s7\/(?<MealServiceNote>(\/?[A-Z]{0,3})+))?(\s9\/(?<OperatingAirlineDisclosure>(.{2}[A-Z]?)))?
I think there is no need to study the entire regex becasue it's build dynamically from smaller patterns at runtime and all the parts work correctly. Also lots of combinations are tested with unit tests and they all work... as long as I try to parse ony the line that should be matched by the pattern.
Currently I'm checking if the entire string is matched by
match.Group[0].Value == line
but I find it's quite ugly. I know from JavaScript the regex engine provides an Index property where the regex engine stopped. So my idea was to compare the index with the length of the string. Unfortunatelly I wasn't able to find such a property in C#.
Another idea would be to modify the regex so that it matches only one line and no partial lines.
Example: https://regex101.com/r/dM5wU4/1
The example contains only two cases because there aren't actually any combinations that would change its behavior. I could remove some parameters but it wouldn't change anything.
EDIT:
I've edited my question. Sorry to every for not providing all the information at the first time. I won't ask any more questions when writing on the phone :) It wasn't a good idea. Hopefully it won't get closed now.
You asked whether I could simplify the regex. I would do it if I could and knew how. If it was easy I wouldn't have asked. The problem started as the regex ans string became bigger during development. Now they are at the production length and I can't actually make them shorter even for the sake of the quesion, sorry.
EDIT-2:
I found the reason why I couldn't find the inherited Index and Length properties of the Match class.
For some strange reason when selecting the Match class and pressing F1 Visual Studio opened the wrong help page (Match Properties) even though I'm not working with the Micro Framework. I didn't notice that but I was indeed wondering why there is very little information. Thx to #Jamiec for the correct link. I won't trust Visual Studio anymore when hitting F1.
Disclaimer: Im going to add this, but I doubt its the solution. If it's not this part will get deleted in short order
You can add a $ at the end of your regular expression. This stops your first example matching but continues to match the second example.
As you've not provided any more than 2 examples, its unclear if this actually solves all your cases or just that one specific false positive.
My question is whether it is possible to check if a regular expression matched the entire sting without checking the first group against the original line?
If you're not adverse to checking the entire match to the length of the string you can do that too:
var regex = new Regex(#"^(?<FlightDesignator>([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))(?<OperationalSuffix>[A-Z])?(?<FlightIdentifierDate>\/(\d{2})([A-Z]{3})?(\d{2})?)?(\s(?<FlightLegsChangeIdentifier>(\/?[A-Z]{3})+)(?=(\s|$)))?(\s1(?<JointOperationAirlineDesignators>(\/.{2}[A-Z]?)+))?(\s3\/(?<AircraftOwner>([A-Z]{2}|.)))?(\s4\/(?<CockpitCrewEmployer>(.+?)(?=(?: \d\/|$))))?(\s5\/(?<CabinCrewEmployer>([A-Z]{2}|.)))?(?<OnwardFlight>\s6\/(([A-Z0-9]{2}[A-Z]?)([0-9]{3,4}))([A-Z])?(\/(\d{2})([A-Z]{3})?(\d{2})?)?)?(\s7\/(?<MealServiceNote>(\/?[A-Z]{0,3})+))?(\s9\/(?<OperatingAirlineDisclosure>(.{2}[A-Z]?)))?");
var input1 = #"BNE1010/1000 HKG1955/2005 7/PLD/CLD/YLD";
var input2 = #"RG878A/21AUG15 GIG/BOG 1/RG/AV 3/AV 4/AV 5/RG 6/AV081C/22 7/CDC/YD 9/TP";
var match1 = regex.Match(input1);
var match2 = regex.Match(input2);
Console.WriteLine(match1.Length == input1.Length); // False
Console.WriteLine(match2.Length == input2.Length); // True
Live example: http://rextester.com/NIBE6349

extract all URLs in a free text block using RegEx [duplicate]

This question already has answers here:
Extract Url using Regex
(2 answers)
Closed 8 years ago.
I'm attempting to detect all URLs listed in a free text block. I'm using the .nets Regex.Matches call.. with the following regex: (http|https)://[^\s "']{4,}
Now, I've put in the following text:
here is a link http://somelink.com
here is a link that I didn't space withhttp://nospacelink.com/something?something=&39358235
http://nospacelink.com/something?something=&12233454
here is a link I already handled.
Here is some secret t&cs you're not allowed to know https://somethingbad.com
Just to be a little annoying I've put in a new address thingy capture type of 'http://somethinginspeechmarks.com' and what are you going to do now?
here is a link http://postTextLink.com at then some post text
Here is a link with a full stop http://alinkwithafullstoplink.com. And then some more.
and I get the following output:
http://somelink.com
http://nospacelink.com?something=&39358235
http://nospacelink.com?something=&12233454
http://alreadyhandledlink.com
https://somethingbad.com
http://somethinginspeechmarks.com
http://postTextLink.com
http://alinkwithafullstoplink.com.
Please notice the full stop on the last entry. How can I update my regex to say "If there is a full stop at the end, please ignore it?"
Also, please note that "Getting parts of a URL (Regex)" has nothing to do with my question, as that question is about how to break down a particular URL. I want to extract multiple, complete urls. Please see my input and current outputs for clarification!
I have got a regex already that does most of what I want, but isn't quite right. Could you please explain where my approach might be improved?
I would add something like [^\.] to the pattern.
This pattern says that the last char can't be a full stop.
So for (http|https)://[^\s "']{4,}[^\.] it will try to match all adresses not ending with a full stop.
Edit:
This one should be better as said in comments: [^.\s"']
Updated:
Consider the following minor change to your pattern:
(http|https)://[^\s "']{4,}(?=\.)

Regex c# to jquery implementation

I have this regex on my asp.net mvc3 application:
Regex pattern = new Regex(#"^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$");
I needed to implement this with jquery due to some requirements with something like this:
password.match(/(.*(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]/))
This is working. It will detect if 1 uppercase, 1 lowercase and 1 number is present on the password. However , i also have need to detect if 3 consecutive letter is present (eg: aaa, bbb).
With my regex on c# , it is working with the help of:
/(.)\1\1/
But I can't make it work on password.match(/(.)\1\1/)
Did I missed something here? Thanks in advance!
I've just copied your C# regex and tried in JavaScript console and it works great:
"waweEEad2".match(/^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$/)
returns ["waweEEad2", undefined] and
"waweEEEad2".match(/^(?!.*(.)\1\1)(?=.*\d)(?=.*[a-z])(?=.*[A-Z])[0-9a-zA-Z]{8,20}$/)
returns null.

Categories