string to parse out a URL

string to parse out a URL - c#

Got this regex string from "JavaScript: the good parts" (pp. 66). Can't get it to work. Can anyone see what is wrong with it?
/^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/
it's supposed to split up a string like this:
https://stackoverflow.com/questions/ask
into constituents: scheme, slash, host, port, path, query, hash
btw: this regex needs to be generic... it's going to be used on different "schemes"

Maybe this isn't your goal, but why don't you use System.Uri class?
It has what you want and it parses raw URI/URL(s).
http://msdn.microsoft.com/en-us/library/system.uri.aspx

your question is tagged with c#, so why don't you just use the System.Uri class?
eg
string s = "http://stackoverflow.com/questions/ask";
Uri uri = new System.Uri(s);
string scheme = uri.Scheme;
string host = uri.DnsSafeHost;
// etc

If this is in Javascript try
result = subject.match(/\b(https?|ftp):\/\/([\-A-Z0-9.]+)(\/[\-A-Z0-9+&##\/%=~_|!:,.;]*)?(\?[A-Z0-9+&##\/%=~_|!:,.;]*)?/ig);

I really don't know, what is the meaning of all parts of regex, but the last # character should be escaped by backslash.
/^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:\#(.*))?$/

Related

C# Regex URL Port username & password

I have a URL and need to extract the port, username and password from it and put them into an array. It looks like following.
http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts
Can I use some other method without replaces or substring?

One of the ways in C#
Get the query parameter
var parsedQuery = HttpUtility.ParseQueryString("http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts");
Then, below will give the username
parsedQuery["username"]
For Password:
parsedQuery["password"]
For port you can use URI :
Uri uri = new Uri("http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts");
Get the port by
uri.Port
Create an array or use whatever you require to club.

I don't know C#, but here's one that works for Python. It's pretty straightforward so you should be able to convert.
:(?P<port>[0-9]+).*username=(?P<username>[a-zA-Z0-9]+).*password=(?P<password>[a-zA-Z0-9]+)
The (?P<foo>bar) syntax is a named capture group that will put a variable matching the pattern 'bar' into a variable called 'foo' when you extract them.

Here is another possible solution with pure C# regex:
var url = "http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts";
var urlRegex = new Regex(#"(?<=(http(s)?://)?\w+(\.\w+)*:)\d+(?=/.*)?");
var usernameRegex = new Regex(#"(?<=(\?|&)username=).*?(?=&|$)", RegexOptions.IgnoreCase);
var passwordRegex = new Regex(#"(?<=(\?|&)password=).*?(?=&|$)", RegexOptions.IgnoreCase);
Console.WriteLine(urlRegex.Match(url));
Console.WriteLine(usernameRegex.Match(url));
Console.WriteLine(passwordRegex.Match(url));

If there are any parts that don't change, e.g. if it's always the same url you could just replace it like this
string str = "http://myproject.ddns.net:8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts"
str.Replace("http://myproject.ddns.net","");
This would leave you ":8080/get.php?username=9zu7T54rt6&password=1Tbliu49iH&type=m3u_plus&output=ts"
There is nothing stopping you repeating the process with another section.
As for regex you could use Regex.Match https://msdn.microsoft.com/en-us/library/twcw2f1c(v=vs.110).aspx to get the parts you want.
You could use ":\d{4}/" to get the port - you'd have to strip the leading ":" and trailing "/" though; this "username=\w*\&" to get the username - you'd have to strip the leading "username=" and trailing "&" though; and for the password you could use "password=\w*\&" - you'd have to strip the leading "password=" and trailing "&" though.
If you'd like to experiment with regex this site https://regex101.com/ is pretty good.

RegEx to extract partial string

So simple but I'm struggling, I do RegExp every 2 years or so , so I'm rusty
I have these two url strings
http://localhost:58876/Products/Product1
https://localhost:58876/Products/Product1
The result I want is
localhost:58876
Basically remove the http(s):// and everything after the first single / so I end up with the domain with or without the port number
P.S: I'm working with C#

This worked for me (tested int notepad++):
(\w+:\d+)

You can use the following regex to split the URL:
((http[s]?|ftp):/)?/?([^:/\s]+)(:([^/]))?((/\w+)/)([\w-.]+[^#?\s]+)(\?([^#]))?(#(.))?
The RegEx positions 3 and 5 are those you are looking for.

(^[^h]|\/\/)([\w\d\:\#\.]+:?[\d]?+)
then in c#:
string address = ...
char[] MyChar = {'/'};
string NewString = address.TrimStart(MyChar);
EDIT: also worked with localhost:58876/Products/Product1
!

Just match anything but a slash: /^https?:\/\/([^\/]+)\/.*$/
var url = 'http://localhost:58876/Products/Product1';
var match = url.match(/^https?:\/\/([^\/]+)\/.*$/);
if(match&&match.length>0)document.write(match[1]);
Even shorter: /\/\/([^\/]+)/. Note that there are (a lot) better ways to parse URLs. Depending on your platform, there’s PHP’s parse_url, NodeJS’s url module or libraries like uri.js that handle the many faces of valid URIs.

C# Uri.EscapeDataString adds incorrect "%25" in the decoded string

I'm trying to UrlEncode a web address using Uri.EscapeDataString, but the result isn't correct. Here's an example:
string url = "https://mega.co.nz/#!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ";
string encodedUrl = Uri.EscapeDataString(url);
Expected result would be:
https%3a%2f%2fmega.co.nz%2f%23!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ
But the actual one is:
https%253a%252f%252fmega.co.nz%252f%2523%21GVZFwAbB%21NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ
As you can see, there's a bunch of extra %25s that don't belong there. Isn't %25 the encode for "%"? There are no %s in my original string... what's going on?
EDIT: I can't use the System.Web assembly for this project, so unfortunately I can't use the HttpUtility.UrlEncode() method for this.

Well, after searching around a bit more, it seems that this does the job, without relying on system web:
System.Net.WebUtility.UrlEncode(url);
The encoding is the correct one, without %25s.

Uri.EscapeDataString doesn't encode URL. Use HttpUtility.UrlEncode instead.
string url = "https://mega.co.nz/#!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ";
string encodedUrl = HttpUtility.UrlEncode(url);
Result is:
https%3a%2f%2fmega.co.nz%2f%23!GVZFwAbB!NzdN2jp7A_WmQBLC4RJrCX8SzixFIEo7oZZARaMAmXQ

I need a regex expression which can return to me the relative URL + query string from an HTML content string

I have found useful regex expressions from the site, but this particular one eludes me.
Basically, I need to extract this:
/uploadedimages/space earth nasa hd wallpapers 62.jpg?n=6965
from this string using regex:
<p>test james lafferty joseph <strong>swami</strong> is a great guy.<img src=\"/uploadedimages/space earth nasa hd wallpapers 62.jpg?n=6965\" alt=\"nasa1\" title=\"nasa1\" style=\"width: 100px; height: 57px; \" width=\"100\" height=\"57\" /></p>\r\n<p><br /></p>\r\n<p><br /></p>
The regex expression I have extracts the URL without the query string. It is ok if the regex hard codes the string '/uploadedimages/'. However, other than this hard-coding, everything else needs to be generic. This could be anything - not just an image, could be an href linked to a pdf file. Query string could be anything valid as well.
Other regex expressions I have found work only with the absolute URLs starting with http, etc.

I am not sure why nobody was able to provide an acceptable answer for this question. As this would be a very real problem for any developer who needs to extract URLs of any kind fully from an HTML fragment which may or may not be valid HTML, here is the answer which I have verified as working in C#:
matches = Regex.Matches(target, "(?<=\")(http:|https:)?[/\\\\](?:[A-Za-z0-9-._~!$&'()*+,;=:# ]|%[0-9a-fA-F]{2})*([/\\\\](?:([A-Za-z0-9-._~!$&'()*+,;=:# ]|%[0-9a-fA-F]{2}))*)*(?:\\?[a-zA-Z0-9=/\\\\&]+)?(?=\")", RegexOptions.IgnoreCase);
This will extract any number of URLs in the HTML fragment with query string, and I have also gone ahead and modified the REGEX so that it works properly with escape characters in C# regex. The pure REGEX will not work as-is in C# as we have to escape the "\" and """ characters.

Assuming you want a regex like this?
<([^=<>]+)=\\?"([^\\"]+)
Otherwise, please be less ambiguous about what you are actually trying to parse out. Thanks!

I'd recommend doing this in stages, since it will be much simpler. You can use .net in a cleaner way, regexes are not needed here, and neither is a full dom parser if you know the format the data will come in. Assuming for the moment that what you really want is the relative url of the image source, and that there is only ever one image in the html, I would recommend something like the following.
string Parse(string html)
{
var temp = html.Substring(html.IndexOf("src=") + 5);
return temp.Substring(0, temp.IndexOf("\""));
}
To do it using regular expressions, based off kgoedtel's answer (modified slightly) you'll need to do something like:
string Parse(string html)
{
var r = new Regex("<img [^=<>]+=\\\\?\"([^\\\\\"]+)");
return r.Match(html).Groups[1].Value;
}
IEnumerable<string> ParseMany(string html)
{
var r = new Regex("[^=<>]+=\\\\?\"([^\\\\\"]+)");
return r.Matches(html).OfType<Match>().Select(m=>m.Groups[1].Value);
}

How can I replace "/" with "\/" in a string?

I would like to do the following:
if (string.Contains("/"))
{
string.Replace("/", "\/"); //this isn't valid
}
I've tried
string.Replace("/", "\\/");
but this gives me what I started with. How can I do this?
Thanks

Strings are immutable, which means that any modification you do to a string results in a new one, you should assign the result of the Replace method:
if (myString.Contains("/"))
{
myString = myString.Replace("/", "\\/");
}

String.Replace returns the string with replacements made - it doesn't change the string itself. It can't; strings are immutable. You need something like:
text = text.Replace("/", "\\/");
(In future examples, it would be helpful if you could use valid variable names btw. It means that those wishing to respond with working code can use the same names as you've used.)

One way is to use a verbatim string literal
string.Replace("/", #"\");

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

string to parse out a URL - c#

Maybe this isn't your goal, but why don't you use System.Uri class? It has what you want and it parses raw URI/URL(s). http://msdn.microsoft.com/en-us/library/system.uri.aspx

your question is tagged with c#, so why don't you just use the System.Uri class? eg string s = "http://stackoverflow.com/questions/ask"; Uri uri = new System.Uri(s); string scheme = uri.Scheme; string host = uri.DnsSafeHost; // etc

If this is in Javascript try result = subject.match(/\b(https?|ftp):\/\/([\-A-Z0-9.]+)(\/[\-A-Z0-9+&##\/%=~_|!:,.;])?(\?[A-Z0-9+&##\/%=~_|!:,.;])?/ig);

I really don't know, what is the meaning of all parts of regex, but the last # character should be escaped by backslash. /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]))?(?:\?([^#]))?(?:\#(.*))?$/

Related

C# Regex URL Port username & password

RegEx to extract partial string

C# Uri.EscapeDataString adds incorrect "%25" in the decoded string

I need a regex expression which can return to me the relative URL + query string from an HTML content string

How can I replace "/" with "\/" in a string?

Categories

Resources

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

string to parse out a URL - c#

Maybe this isn't your goal, but why don't you use System.Uri class? It has what you want and it parses raw URI/URL(s). http://msdn.microsoft.com/en-us/library/system.uri.aspx

your question is tagged with c#, so why don't you just use the System.Uri class? eg string s = "http://stackoverflow.com/questions/ask"; Uri uri = new System.Uri(s); string scheme = uri.Scheme; string host = uri.DnsSafeHost; // etc

If this is in Javascript try result = subject.match(/\b(https?|ftp):\/\/([\-A-Z0-9.]+)(\/[\-A-Z0-9+&##\/%=~_|!:,.;]*)?(\?[A-Z0-9+&##\/%=~_|!:,.;]*)?/ig);

I really don't know, what is the meaning of all parts of regex, but the last # character should be escaped by backslash. /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:\#(.*))?$/

Related

C# Regex URL Port username & password

RegEx to extract partial string

C# Uri.EscapeDataString adds incorrect "%25" in the decoded string

I need a regex expression which can return to me the relative URL + query string from an HTML content string

How can I replace "/" with "\/" in a string?

Categories

Resources

If this is in Javascript try result = subject.match(/\b(https?|ftp):\/\/([\-A-Z0-9.]+)(\/[\-A-Z0-9+&##\/%=~_|!:,.;])?(\?[A-Z0-9+&##\/%=~_|!:,.;])?/ig);

I really don't know, what is the meaning of all parts of regex, but the last # character should be escaped by backslash. /^(?:([A-Za-z]+):)?(\/{0,3})([0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]))?(?:\?([^#]))?(?:\#(.*))?$/