Regex to remove urls from string c# - c#

I have the following code and the regular expression I am currently using does not appear to be catching any url I enter in standard format (www.google.com) as when it is displayed in a listbox, the URL is still there. Does anyone know where I'm going wrong?
e1.MessageBody = txtMessage.Text;
Regex.Replace(e1.MessageBody, #"/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)/", ""string.Empty);

var msg = "ASD www.google.com EFIG";
msg = Regex.Replace(msg, #"((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+#)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+#)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%#.\w_]*)#?(?:[\w]*))?)", string.Empty);
C# doesn't use the regex identifiers (the leading/trailing /) and you had extra quotes " by the string.empty parameter.

Related

C# Regex URL Port username & password

I have a URL and need to extract the port, username and password from it and put them into an array. It looks like following.
http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts
Can I use some other method without replaces or substring?
One of the ways in C#
Get the query parameter
var parsedQuery = HttpUtility.ParseQueryString("http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts");
Then, below will give the username
parsedQuery["username"]
For Password:
parsedQuery["password"]
For port you can use URI :
Uri uri = new Uri("http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts");
Get the port by
uri.Port
Create an array or use whatever you require to club.
I don't know C#, but here's one that works for Python. It's pretty straightforward so you should be able to convert.
:(?P<port>[0-9]+).*username=(?P<username>[a-zA-Z0-9]+).*password=(?P<password>[a-zA-Z0-9]+)
The (?P<foo>bar) syntax is a named capture group that will put a variable matching the pattern 'bar' into a variable called 'foo' when you extract them.
Here is another possible solution with pure C# regex:
var url = "http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts";
var urlRegex = new Regex(#"(?<=(http(s)?://)?\w+(\.\w+)*:)\d+(?=/.*)?");
var usernameRegex = new Regex(#"(?<=(\?|&)username=).*?(?=&|$)", RegexOptions.IgnoreCase);
var passwordRegex = new Regex(#"(?<=(\?|&)password=).*?(?=&|$)", RegexOptions.IgnoreCase);
Console.WriteLine(urlRegex.Match(url));
Console.WriteLine(usernameRegex.Match(url));
Console.WriteLine(passwordRegex.Match(url));
If there are any parts that don't change, e.g. if it's always the same url you could just replace it like this
string str = "http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts"
str.Replace("http://myproject.ddns.net","");
This would leave you ":8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts"
There is nothing stopping you repeating the process with another section.
As for regex you could use Regex.Match https://msdn.microsoft.com/en-us/library/twcw2f1c(v=vs.110).aspx to get the parts you want.
You could use ":\d{4}/" to get the port - you'd have to strip the leading ":" and trailing "/" though; this "username=\w*\&" to get the username - you'd have to strip the leading "username=" and trailing "&" though; and for the password you could use "password=\w*\&" - you'd have to strip the leading "password=" and trailing "&" though.
If you'd like to experiment with regex this site https://regex101.com/ is pretty good.

Storing Html into a string in C#

In my project, I need to read some URLs and store the starting tags into some variables, but the project won't compile. May be, its because I am not using the assignment to the string correctly. Following is what i tried and got the compile error
string startTag = "<span id="productLayoutForm:OurPrice" class="pdp_details_hs18Price" itemprop="price">";
string anotherStartTag = "<span class="price final-price our fksk-our" id="fk-mprod-our-id">Rs.<span class="small-font">"
Please tell, what should be the correct code for above and where can I learn how to store such HTMLs into string or how to use string for such puposes.
You need to "escape" the quotes in your strings, for example:
string startTag = "<span id=\"productLayoutForm:OurPrice\" class=\"pdp_details_hs18Price\" itemprop=\"price\">";
The \ before the quotes that are inside the string tells the C# compiler that the quotes are part of the string and not the beginning/end of the string.
The " sign indicates the start and end of a string.so to use it in the middle of a string you have to escape it, do that by setting a backslash in front of it.. Like this: \"

Getting NewLine character in FreeTextBox control in Asp.net

I am using FreeTextBox control
in asp.net.When I am getting its HtmlStrippedText in my code I am getting the String without HTML tags.Now how can I get the new line character from this String i.e. I want to Replace All the NewLine characters with Special Symbol "#".
Got the Solution:
Got the HtmlStrippedText in String str and then got replace it like this:
char enter=(char)111;
temp= str.Replace(enter+"", "\n");
If it is anything like the base ASP:TextBox, you can just grab the string from the text property and do something like
var test = txtYourControl.Text.Replace(Environment.NewLine, "#");
This will vary between browsers (perhaps even between the operating systems on which the browser is running). More on newline definitions here.
I have found the most reliable option to be to search and replace "\n" rather than Environment.NewLine which is "\r\n" in a Windows environment.
string text = this.txtField.Text.Replace("\n", "#");
HtmlStrippedText is used to get the plain text
You should use the text property of freetextbox control
u can remove new line using this code.
lblTitle.Text = txtFreetextbox.Text.Replace("<br>","#");
Click this link
http://freetextbox.com/docs/ftb3/html/P_FreeTextBoxControls_FreeTextBox_Text.htm

Regular Expression to replace unknown value in text file - c# and asp.net

I'd like to replace a line in a text file using a c# function in asp.net. The line is:
SQL-SERVER-VERSION="some unknown value"
I don't know what the value after = might be so I need to use a wildcard for this. I want the new line to read:
SQL-SERVER-VERSION="2008"
I'm trying to use Regex.Replace but no matter what regular expression I try, it doesn't work.
Can anybody help?
Thanks,
John
I don't know what you already tried so I can't tell you what you were doing wrong, but the following should work:
string s = "SQL-SERVER-VERSION=\"some unknown value\"";
s = Regex.Replace(s, "SQL\\-SERVER\\-VERSION=\".*\"", "SQL-SERVER-VERSION=\"2008\"");
Try this:
Regex rgx = new Regex(#"SQL-SERVER-VERSION="".*?""");
string result = rgx.Replace(input, replacement);
It looks a little messy in a .NET string but the pure regex looks like this:
SQL-SERVER-VERSION=".*?"
If you know that " won't appear in the value then you could locate the string using
SQL-SERVER-VERSION=".*"
and replace with SQL-SERVER-VERSION="2008"
e.g
strInput = Regex.Replace( strInput, "SQL-SERVER-VERSION="".*""", "SQL-SERVER-VERSION=""2008""")

asp.net regex.replace()

I have the following code to first remove html tags and then highlight the search term within the resulting text:
protected void ListView1_ItemDataBound(object sender, ListViewItemEventArgs e)
{
try
{
// get search value query string
string searchText = Request.QueryString["search"].Trim();
string encodedValue = Server.HtmlEncode(searchText);
Literal Content = e.Item.FindControl("Content") as Literal;
string contentText = Content.Text;
Content.Text = Regex.Replace(contentText, #"<(.|\n)*?>", string.Empty).Replace(encodedValue, "<font class='highlight2'>" + encodedValue + "</font>");
}
catch
{
// do nothing
}
}
This works to a degree but the second replace is not case insensitive. How can I do the second replace also with regex.replace() so case sensitivity is not an issue? Thank you!
Use this overload which takes in RegexOptions. You'll want the IgnoreCase value.
First let's talk about the regex you're using to remove the tags, <(.|\n)*?>. If you want the dot to match anything including a newline, you should use Singleline mode. It's also known as DOTALL mode in some flavors, because that's what it does: allows the dot to match newlines. You can use the RegexOptions.Singleline flag for that, or embed it in the regex with an inline modifier:
`(?s)<.*?>`
This is still pretty fragile, but I'll leave it at that because there's no way to make it bulletproof; regexes and HTML are fundamentally incompatible.
As for the second replacement, the first thing you need to do is break up those chained method calls--in fact, I would say they never should have been chained. Feeding the result of a Regex.Replace directly to String.Replace is either an error or excessively clever. In either case, you have to split them up if you want to call Regex.Replace twice.
You also need to escape any regex metacharacters the search expression, assuming you still want to do a literal search and not a regex search. You can use the Escape method for that.
string searchText = Request.QueryString["search"].Trim();
string encodedValue = Server.HtmlEncode(searchText);
string escapedValue = Regex.Escape(encodedValue);
string contentText = Content.Text;
contentText = Regex.Replace(contentText, #"(?s)<.*?>", string.Empty);
contentText = Regex.Replace(contentText, escapedValue,
"<font class='highlight2'>$&</font>", RegexOptions.IgnoreCase);
Content.Text = contentText;
There are a few other things in your code that don't seem right to me (like why you seem to be permanently removing all the tags), but I'm trying to stay focused on your actual question. To that end, I've tried to make the minimum necessary changes in the code to illustrate my answer. But there's one more thing I just have to comment on:
catch
{
// do nothing
}
Don't do that. At the very least, send an error message to the console or rethrow the exception for the calling code to deal with, but never silently swallow them.

Categories