Extract a specifc portion of a string in C# [duplicate] - c#

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to parse a query string into a NameValueCollection in .NET
I have input as
https://localhost:8181/PortalSite/View/CommissionStatement.aspx?status=commission&quarter=1;
Output needed
status=commission
How to do in C#(preferably regular expression or something else)..
My solution
var res = src.Split('?')[1].Split('=')[1].Split["&"][0];
but failing in Split["&"]

If what you're going for is guaranteed to be a URL with a query string, I would recommend the HttpUtility.ParseQueryStringMethod.
var result = HttpUtility.ParseQueryString(new Uri(src).Query);

Note that in cases like this, the common fallacy is to just put together some string handling function that cannot deal with the full possible spec of input. In your case, there are a lot of valid URLs that are actually rather hard to handle/parse correctly. So you should stick to the already implemented, proven classes.
Thus, I would use the System.Uri class to consume the URL string. The part of the URL you're actually trying to access is the so called "query", which is also a property of the Uri instance. The query itself can easily and correctly be accessed as its individual key-value parts using the System.Web.HttpUtility.ParseQueryStringMethod() (you need to add System.Web.dll to your project's references and make sure you're not using the .NET 4 client profile for your application, as that will not include this assembly).
Example:
Uri u = new Uri("https://localhost:8181/PortalSite/View/CommissionStatement.aspx?status=commission&quarter=1;");
Console.WriteLine(u.Query); // Prints "status=commission&quarter=1;"
var parameters = HttpUtility.ParseQueryString(u.Query);
Console.WriteLine(parameters["status"]); // Prints "commission"
Once you have the "parameters" you could also iterate over them, search them, etc. YMMV.
If you require the output you show in your question, thus know that you always need the first parameter of the query string (and are not able to look it up by name as I show above), then you could use the following:
string key = parameters.GetKey(0);
Console.WriteLine(key + "=" + parameters[key]); // Prints "status=commission"

You could use the following regex: status=(\w*)
But I think there are better alternatives like using HttpUtility.ParseQueryStringMethod.

Related

How to use a UriBuilder and HttpUtility.ParseQueryString to store a URL and parse it

What I'm trying to do is use an UriBuilder and HttpUtility.ParseQueryString to get the last page the user visited and then parse the URL to get just the productID. (The product ID is different on each page if that matters)
example URL: website.com/stuff/?referrerPage=1&productID=1234567&tab=Tile
and what I want is just the 1234567
Page_Load is where I parse the URL:
protected void Page_Load(object sender, EventArgs e)
{
NameValueCollection query = HttpUtility.ParseQueryString(uriBuilder.Query);
//I want to take the parse string and get productID here, how?
}
grabURL is where I get the last URL visited:
public grabUrl(string Uri)
{
UriBuilder uriBuilder = new UriBuilder(Request.UrlReferrer);
return uriBuilder.Uri;
}
Am I on the right track with my code? How do I put the productID number in something so I can work with it? I'm very new to c# and this type of coding in general... when I say new I mean I've been doing it for about a week. So any detailed explanations or examples will be very much appriciated. Thanks everyone for being so helpful, I'm learning a lot from this site to get me on the right track.
From a NameValueCollection you can then do:
var id = query["ProductID"];
You can use int.TryParse to turn it into an integer proper.
int id = 0;
if (!string.IsNullOrEmpty(query["ProductID"]) &&
int.TryParse(query["ProductID"], out id)) {
// use id here..
}
Or you could just request the querystring value from your URL using Request.QueryString()
protected void Page_Load()
{
//save yourself the conversion to int and just save it as Int if you know for sure
// it will always be int
int _prodID= Request.Querystring["productID"];
//validate _prodID
if (!string.IsNullOrEmpty(_prodID.toString())) {//do something }
}
Could you use a regex to parse it instead?
string uri = "website.com/stuff/?referrerPage=1&productID=1234567&tab=Tile";
var rgx = new Regex("productID=(?<pid>[0-9]+)", RegexOptions.IgnoreCase);
string pid = rgx.Match(uri).Groups[1].Value;
Edit: Providing context as it has been suggested I should do:
The reason for mentioning this option is that while it doesn't use HttpUtility.ParseQueryString, your original question was very specific:
get just the productID
from
the last page the user visited
(which I understand to not be the uri of the current request). Additionally you provided the Uri was provided in a string format.
The approach in your question does this:
Takes Uri (a string variable)
Passes it to UriBuilder; UriBuilder in its constructor initialises a Uri, which itself does a ton of work to validate the uri, populate properties etc. by creating more strings, among other things
Then, from the Uri object generated, you now work with the Query property - this is one of the new strings that Uri has allocated
Passes that to HttpUtility.ParseQueryString. This creates a HttpValueCollection, which itself iterates character-by-character over the string you pass in to parse out the key-value pairs in the query string, and sticks them into a NameValueCollection, which ultimately stores your values in an ArrayList - one of the least efficient collections in .NET (see various references including this one - What's Wrong with an ArrayList?) as it stores everything in an object array, requiring casting every time you get things back out.
finally you then go and search that collection by a key to get your product id back out.
That's a whole lot of string and character allocations, casting to and from objects, putting things into indexed arrays which you then scan, etc. just to get:
a string which is identifiable by a pattern from another string.
While I admit that memory is cheap, it seems that there might be an equally good, or better, alternative. This is what regex was made for - find a pattern in a string, and allow you to get parts of that pattern back out.
So, your options:
If you just want to get productID out of a uri in an exact form, and the uri is originally in a string, then I maintain that a regex is a very good, efficient choice. This will also work if you want to extract other patterns from your uri.
If you want to know all the keys in your query string as well as values for specific keys, then use your HttpUtility.ParseQueryString approach; NameValueCollection allows you to get access to the keys through the AllKeys property.
If you want to get the value of a query string parameter for the uri of your current request then Marcianin's answer is the simplest choice, and you can forget the first 2 options.
In all cases, once you have the string you can parse it using the parse methods on int, but if you use 2. or 3. your extracted id may not be a number (in the case of a malicious request) so make sure you use int.TryParse not int.Parse to convert from a string, and be careful to catch exceptions. You should always take care when taking input from query strings so as not to fall foul of malicious data in the query string (which will hit your website frequently once it is online).
The choice is personal preference.
The actual code you write is about the same - each approach takes up very few lines.
Code should never prematurely optimise, but it should also not be deliberately wasteful if it can help it. You could performance test one against the other, but that would be a waste of your time at this stage.
However, code should also convey intent. The Regex approach, even I would argue, doesn't convey your intent as well as the ParseQueryString approach.
Footnote: I would change the regex slightly to "[?&]productID=(?<pid>[0-9]+)" to ensure you pick up only "productID" and not "fooProductID"
Most importantly, you asked
Am I on the right track with my code?
I would say you are. Always weigh up different options as you proceed. Don't be afraid to try different things out. You say you are new to C#. The one thing you may have missed, in this case, is writing a test to help you on your way before you wrote your code: if the test passes, the code is correct, and the approach you chose is secondary to that. Visual Studio makes testing easy for you if you are using the latest version. If you get into good habits early on, it will pay dividends later on in your C# career.
How do I put the productID number in something so I can work with it?
Grant Thomas answered this perfectly - int.TryParse turns the string into a number.

Why does Context.RewritePath maintain the query string?

After reviewing the code below, I noticed that the call to Context.RewritePath is somehow not losing the query string, even though it is being called without a query string. Is there any documentation explaining why the query string is being maintained?
//URL relative path to ashx files is wrong to to path rewriting.
if (Request.Url.LocalPath.EndsWith(".ashx")) {
Context.RewritePath(Request.Url.LocalPath
.Substring(Request.Url.LocalPath.LastIndexOf("/") + 1));
}
Edit: I am not asking how to fix this; the code behaves correctly. I am just asking for documentation of this behavior.
You're essentially doing a Rewrite of the path, and in most cases, the query needs to be maintained to pass to the new Path.
Take for example a new document retrieval page named "getDocumentWithEnhancements.aspx" as opposed to the old "getDocument.aspx". Both need a parameter to be useful, but you want the new one to be used. A RewritePath would do the job, as it takes the query passed to the old, and passes to the new. If you want to show some sort of error page or something, then instead you'd either use a redirect, or whatever page you're rewriting to will just ignore the querystring.
Why would you not want the query to be passed through? What exactly are you using this for? Maybe it's not the right function for what you need.
edit: There's an overloaded function taking 3 parameters, one of which is a querystring, which you could pass in as null to not use the querystring.

C# ASP.NET HttpWebRequest automatically decodes ampersand (&) values from query string?

Assume the following Url:
"http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents & Functions + Properties.docx&Save=true"
I use HttpUtility.UrlEncode() to encode the value of the Filename parameter and I create the following Url:
"http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents%20%26%20Functions%20%2B%20Properties.docx&Save=true"
I send the following (encoded version) of request from a client to a C# Web Application. On the server when I process the request I have a problem. The HttpRequest variable contains the query string partially decoded. That is to say when I try to use or quick watch the following properties of HttpRequest they have the following values.
Property = Value
================
HttpRequest.QueryString = "{Library=Testing&Filename=Documents+&+Functions+++Properties.docx&Save=true}"
HttpRequest.Url = "{http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents & Functions + Properties.docx&Save=true}"
HttpRequest.Url.AbsoluteUri = "http://server/application1/TestFile.aspx?Library=Testing&Filename=Documents%20&%20Functions%20+%20Properties.docx&Save=true"
I have also checked the following properties but all of them have the & value decoded. However all other values remain properly encoded (e.g. space is %20).
HttpRequest.Url.OriginalString
HttpRequest.Url.Query
HttpRequest.Url.PathAndQuery
HttpRequest.RawUrl
There is no way I can read the value of the parameter Filename properly. Am I missing something?
The QueryString property returns a NameValueCollection object that maps the querystring keys to fully-decoded values.
You need to write Request.QueryString["FileName"].
I'm answering this question many years later because I just had this problem and figured out the solution. The problem is that HttpRequest.Url isn't really the value that you gave. HttpRequest.Url is a Uri class and that value is the ToString() value of that class. ToString() for the Uri class decodes the Url. Instead, what you want to use is HttpRequest.Url.OriginalString. That is the encoded version of the URL that you are looking for. Hope this helps some future person having this problem.
What happens when you don't use UrlEncode? You didn't show how exactly you are using the url that you created using UrlEncode, so it is quite possible that things are just being double encoded (lots of the framework will encode the URLs for you automatically).
FWIW I ran into this same problem with RavenDB (version 960). They implement their own HttpRequest object that behaves just like this -- it first decodes just the ampersands (from %26 to &) and then decodes the entire value. I believe this is a bug.
A couple of workarounds to this problem:
Implement your own query string parsing on the server. It's not fun but it is effective.
Double-encode ampersands. First encode just the ampersands in the string, then encode the entire string. (It's an easy solution but not extensible because it puts the burden on the client.)

Conditional Regex Replace in C# without MatchEvaluator

So, Im trying to make a program to rename some files. For the most part, I want them to look like this,
[Testing]StupidName - 2[720p].mkv
But, I would like to be able to change the format, if so desired. If I use MatchEvaluators, you would have to recompile every time. Thats why I don't want to use the MatchEvaluator.
The problem I have is that I don't know how, or if its possible, to tell Replace that if a group was found, include this string. The only syntax for this I have ever seen was something like (?<group>:data), but I can't get this to work. Well if anyone has an idea, im all for it.
EDIT:
Current Capture Regexes =
^(\[(?<FanSub>[^\]\)\}]+)\])?[. _]*(?<SeriesTitle>[\w. ]*?)[. _]*\-[. _]*(?<EpisodeNumber>\d+)[. _]*(\-[. _]*(?<EpisodeName>[\w. ]*?)[. _]*)?([\[\(\{](?<MiscInfo>[^\]\)\}]*)[\]\)\}][. _]*)*[\w. ]*(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*[Ss](?<SeasonNumber>\d+)[Ee](?<EpisodeNumber>\d+).*?(?<Extension>\.[a-zA-Z]+)$
^(?<SeriesTitle>[\w. ]*?)[. _]*(?<SeasonNumber>\d)(?<EpisodeNumber>\d{2}).*?(?<Extension>\.[a-zA-Z]+)$
Current Replace Regex = [${FanSub}]${SeriesTitle} - ${EpisodeNumber} [${MiscInfo}]${Extension}
Using Regex.Replace, the file TestFile 101.mkv, I get []TestFile - 1[].mkv. What I want to do is make it so that [] is only included if the group FanSub or MiscInfo was found.
I can solve this with a MatchEvaluator because I actually get to compile a function. But this would not be a easy solution for users of the program. The only other idea I have to solve this is to actually make my own Regex.Replace function that accepts special syntax.
It sounds like you want to be able to specify an arbitrary format dynamically rather than hard-code it into your code.
Perhaps one solution is to break your filename parts into specific groups then pass in a replacement pattern that takes advantage of those group names. This would give you the ability to pass in different replacement patterns which return the desired filename structure using the Regex.Replace method.
Since you didn't explain the categories of your filename I came up with some random groups to demonstrate. Here's a quick example:
string input = "Testing StupidName Number2 720p.mkv";
string pattern = #"^(?<Category>\w+)\s+(?<Name>.+?)\s+Number(?<Number>\d+)\s+(?<Resolution>\d+p)(?<Extension>\.mkv)$";
string[] replacePatterns =
{
"[${Category}]${Name} - ${Number}[${Resolution}]${Extension}",
"${Category} - ${Name} - ${Number} - ${Resolution}${Extension}",
"(${Number}) - [${Resolution}] ${Name} [${Category}]${Extension}"
};
foreach (string replacePattern in replacePatterns)
{
Console.WriteLine(Regex.Replace(input, pattern, replacePattern));
}
As shown in the sample, named groups in the pattern, specified as (?<Name>pattern), are referred to in the replacement pattern by ${Name}.
With this approach you would need to know the group names beforehand and pass these in to rearrange the pattern as needed.

Regex index in matching string where the match failed

I am wondering if it is possible to extract the index position in a given string where a Regex failed when trying to match it?
For example, if my regex was "abc" and I tried to match that with "abd" the match would fail at index 2.
Edit for clarification. The reason I need this is to allow me to simplify the parsing component of my application. The application is an Assmebly language teaching tool which allows students to write, compile, and execute assembly like programs.
Currently I have a tokenizer class which converts input strings into Tokens using regex's. This works very well. For example:
The tokenizer would produce the following tokens given the following input = "INP :x:":
Token.OPCODE, Token.WHITESPACE, Token.LABEL, Token.EOL
These tokens are then analysed to ensure they conform to a syntax for a given statement. Currently this is done using IF statements and is proving cumbersome. The upside of this approach is that I can provide detailed error messages. I.E
if(token[2] != Token.LABEL) { throw new SyntaxError("Expected label");}
I want to use a regular expression to define a syntax instead of the annoying IF statements. But in doing so I lose the ability to return detailed error reports. I therefore would at least like to inform the user of WHERE the error occurred.
I agree with Colin Younger, I don't think it is possible with the existing Regex class. However, I think it is doable if you are willing to sweat a little:
Get the Regex class source code
(e.g.
http://www.codeplex.com/NetMassDownloader
to download the .Net source).
Change the code to have a readonly
property with the failure index.
Make sure your code uses that Regex
rather than Microsoft's.
I guess such an index would only have meaning in some simple case, like in your example.
If you'll take a regex like "ab*c*z" (where by * I mean any character) and a string "abbbcbbcdd", what should be the index, you are talking about?
It will depend on the algorithm used for mathcing...
Could fail on "abbbc..." or on "abbbcbbc..."
I don't believe it's possible, but I am intrigued why you would want it.
In order to do that you would need either callbacks embedded in the regex (which AFAIK C# doesn't support) or preferably hooks into the regex engine. Even then, it's not clear what result you would want if backtracking was involved.
It is not possible to be able to tell where a regex fails. as a result you need to take a different approach. You need to compare strings. Use a regex to remove all the things that could vary and compare it with the string that you know it does not change.
I run into the same problem came up to your answer and had to work out my own solution. Here it is:
https://stackoverflow.com/a/11730035/637142
hope it helps

Categories