regex.replace Querystring parameters - c#

I don't know if this is even possible, I have the following regular expression (?<=[\?|\&])(?[^\?=\&#]+)=?(?[^\?=\&#]*)& which splits a URL into key/value pairs. I would like to use this in a replace function to build some markup:
string sTest = "test.aspx?width=100&height=200&";
ltTest.Text = Regex.Replace(sTest, #"(?<=[\?|\&])(?<key>[^\?=\&\#]+)=?(?<value>[^\?=\&\#]*)&",
"< div style='width:$2px; height:$2px; border:solid 1px red;'>asdf</div>");
this is generating:
test.aspx?<div style='width:100px; height:100px; border:solid 1px red;'>asdf</div><div style='width:200px; height:200px; border:solid 1px red;'>asdf</div>
Any ideas?
Thanks in advance!

First, .net has better ways of dealing with your peoblem. Consider HttpUtility.ParseQueryString:
string urlParameters = "width=100&height=200";
NameValueCollection parameters = HttpUtility.ParseQueryString(urlParameters);
s = String.Format("<div style='width:{0}px; height:{1}px;'>asdf</div>",
parameters["width"], parameters["height"]);
That takes care of escaping for you, so it is a better option.
Next, to the question, your code fails because you're using it wrong. you're looking for pairs of key=value, and replacing every pair with <div width={value} height={value}>. So you end up with ad many DIVs as values.
You should make a more surgical match, for example (with some added checks):
string width = Regex.Match(s, #"width=(\d+)").Groups[1].Value;
string height = Regex.Match(s, #"height=(\d+)").Groups[1].Value;
s = String.Format("<div style='width:{0}px; height:{1}px;'>asdf</div>",
width, height);

Is there a reason why you would want to use a regular expression to handle this instead of a Request.QueryString function to grab the data and put this into the string instead?
Right now you would have to make a much more specific Regular Expression to get the value of each key/value pairs and put them into the replace.

Related

C# appending string from variable inside double quotes

Hi I have the following line:
var table = #"<table id=""table_id"" class=""display"">
which is building a table and continues on the next line but I'm just trying to append a string at the end of table_id :
var table = #"<table id=""table_id" + instance + """ class=""display"">
so the final output (if instance = 1234) should be:
<table id="table_id1234" class="display">
But I think the quotes are throwing it off. Any suggestions on how t achieve the last line?
Thanks
A string.Format method placeholder is enough to concatenate instance without cutting through quote signs ({0} is the placeholder):
var table = string.Format(#"<table id=""table_id{0}"" class=""display"">", instance);
Or you can use escape sequence \" for escaping quotes without string literal:
var table = "<table id=\"table_id" + instance + "\" class=\"display\">"
Result:
<table id="table_id1234" class="display">
Demo: .NET Fiddle
Try to use escape character for double quote(\") using this code:
var id = "1234";
var table = "<table id=\"table_id" + id + "\" class=\"display\">";
Here is an online tool for converting string to escape/unescape:
https://www.freeformatter.com/java-dotnet-escape.html
So you can copy the result and place your variables.
I think the best idea and newest idea for this situation is $ sign before your text and with this sign you dont need to extra sign in your string
example
vat table = $"<table id='table_id{instance}' class='display'">
# is used to escape double quotes from one string but in your example, you are actually concatenating three different strings, soyou should escape the third string as well like this:
var table = #"<table id=""table_id" + instance + #" "" class=""display"" >";
Alternatively, you could also use the StringBuilder class which is more memory efficient and might make your strings easier to read.

c# read the value of a changing Progressbar in webbrowser

I am kinda new to c# (spent my time in delphi before) and I am having trouble finding this out:
the Html code of the website is this:
<div class="progress-bar progress-bar-danger" id="counter" style="width: 10.%; overflow: hidden;"></div>
I am trying to figure out sth like this:
var CheckValue = webBrowser1.Document.GetElementById("counter"); if (counter.style.width > 70%) { //code }
So basically what im trying to do is:
I want to check if the width of the progressbar on the website ist filled by more than 70% and if it is it shall execute a code but if it isnt it shall try again after a few seconds.
If you need any more information just tell me!
Thanks
You can use CheckValue.Style, which will return a string containing the style. Then you can use Regex to find what you are looking for.
You want your regex to match the digits between the width: and the .%. You can use this for that:
width: ([0-9]+(\.[0-9]+)?)\.?%
This will match every string starting with width: and ending with % with the possibility of a . before the %, with at least 1 character between 0 and 9.
You can use this code to get this value:
var checkValue = webBrowser1.Document.GetElementById("counter");
Regex regex = new Regex("width: ([0-9]+(\\.[0-9]+)?)\\.?%");
Match match = regex.Match(checkValue.Style);
// Check if match found
if (match.Groups.Count > 1)
{
String s = match.Groups[1].ToString();
int width = (int)Convert.ToDouble(s);
}

AppendLine is not inserting new line?

I have the following:
StringBuilder errors = new StringBuilder();
if(IsNullOrEmpty(value))
{
errors.AppendLine("Enter value");
}
if(IsNullOrEmpty(value2))
{
errors.AppendLine("Enter value 2");
}
I would expect this to display:
Enter value
Enter value 2
But it is displaying:
Enter value Enter value 2
I have also tried: AppendFormat("Enter value{0}",Environment.NewLine);
as well as with the \n character.
The errors string is outputted to an asp:Label like:
lblErrors.Text = errors.ToString();
As mentioned in some of the comments, HTML does not respect the new line character \n. You need to use <br/> instead.
If you'd like to preserve all formatting (including tabs, consecutive white-space, etc), you can apply the white-space:pre style to your label or use an html pre element.
Sample fiddle
I would try to do something like
if(value.Empty == null){
errors.AppendLine("Enter Value");
}
By the code you are showing it seems like it doesn't go into the if statement.
in to string u can make as well errors.Append(Enviroment.NewLine);
#Brian in c# new line isn't "\n" "\r\n" is correct syntax of new line character:)

c# regular expression to match img src="*" type URLs

I have a regex in c# that i'm using to match image tags and pull out the URL. My code is working in most situations. The code below will "fix" all relative image URLs to Absolute URLs.
The issue is that the regex will not match the following:
<img height="150" width="202" alt="" src="../Image%20Files/Koala.jpg" style="border: 0px solid black; float: right;">
For example it matches this one just fine
<img height="147" width="197" alt="" src="../Handlers/SignatureImage.ashx?cid=5" style="border: 0px solid black;">
Any ideas on how to make it match would be great. I think the issue is the % but I could be wrong.
Regex rxImages = new Regex(" src=\"([^\"]*)\"", RegexOptions.IgnoreCase & RegexOptions.IgnorePatternWhitespace);
mc = rxImages.Matches(html);
if (mc.Count > 0)
{
Match m = mc[0];
string relitiveURL = html.Substring(m.Index + 6, m.Length - 7);
if (relitiveURL.Substring(0, 4) != "http")
{
Uri absoluteUri = new Uri(baseUri, relitiveURL);
ret += html.Substring(0, m.Index + 5);
ret += absoluteUri.ToString();
ret += html.Substring(m.Index + m.Length - 1, html.Length - (m.Index + m.Length - 1));
ret = convertToAbsolute(URL, ret);
}
}
Using RegEx to parse images in this way is a bad idea. See here for a good demonstration of why.
You can use an HTML parser such as the HTML Agility Pack to parse the HTML and query it using XPath syntax.
First, I would try to skip all the manual parsing and use linq to html
HDocument document = HDocument.Load("http://www.microsoft.com");
foreach (HElement element in document.Descendants("img"))
{
Console.WriteLine("src = " + element.Attribute("src"));
}
If that didn't work, only then would I go back to manual parsing and I'm sure one of the fine gentle-people here has already posted a working regex for your needs.
regex is a bad idea. better use an html parser. here is a a regex i used for parsing links with regex though:
String body = "..."; //body of the page
Matcher m = Pattern.compile("(?im)(?:(?:(?:href)|(?:src))[ ]*?=[ ]*?[\"'])(((?:http|https)(?::\\/{2}[\\w]+)(?:[\\/|\\.]?)(?:[^\\s\"]*))|((?:\\/{0,1}[\\w\\.]+)+))[\"']").matcher(body);
while(m.find()){
String absolute = m.group(2);
String relative = m.group(3);
}
its a lot easier with a parser though, and better on resources. here is a link showing what i eventually wrote when i switched to a parser.
http://notetodogself.blogspot.com/2007/11/extract-links-using-htmlparser.html
probably not as helpful since that was java and you need C#
I don't know what your program does, but I'm guessing this is an example of something you would do in 5 minutes from the command line in linux. You can download windows versions of many of the same tools (sed, for instance) and save yourself the hassle of writing all that code.

.NET String parsing performance improvement - Possible Code Smell

The code below is designed to take a string in and remove any of a set of arbitrary words that are considered non-essential to a search phrase.
I didn't write the code, but need to incorporate it into something else. It works, and that's good, but it just feels wrong to me. However, I can't seem to get my head outside the box that this method has created to think of another approach.
Maybe I'm just making it more complicated than it needs to be, but I feel like this might be cleaner with a different technique, perhaps by using LINQ.
I would welcome any suggestions; including the suggestion that I'm over thinking it and that the existing code is perfectly clear, concise and performant.
So, here's the code:
private string RemoveNonEssentialWords(string phrase)
{
//This array is being created manually for demo purposes. In production code it's passed in from elsewhere.
string[] nonessentials = {"left", "right", "acute", "chronic", "excessive", "extensive",
"upper", "lower", "complete", "partial", "subacute", "severe",
"moderate", "total", "small", "large", "minor", "multiple", "early",
"major", "bilateral", "progressive"};
int index = -1;
for (int i = 0; i < nonessentials.Length; i++)
{
index = phrase.ToLower().IndexOf(nonessentials[i]);
while (index >= 0)
{
phrase = phrase.Remove(index, nonessentials[i].Length);
phrase = phrase.Trim().Replace(" ", " ");
index = phrase.IndexOf(nonessentials[i]);
}
}
return phrase;
}
Thanks in advance for your help.
Cheers,
Steve
This appears to be an algorithm for removing stop words from a search phrase.
Here's one thought: If this is in fact being used for a search, do you need the resulting phrase to be a perfect representation of the original (with all original whitespace intact), but with stop words removed, or can it be "close enough" so that the results are still effectively the same?
One approach would be to tokenize the phrase (using the approach of your choice - could be a regex, I'll use a simple split) and then reassemble it with the stop words removed. Example:
public static string RemoveStopWords(string phrase, IEnumerable<string> stop)
{
var tokens = Tokenize(phrase);
var filteredTokens = tokens.Where(s => !stop.Contains(s));
return string.Join(" ", filteredTokens.ToArray());
}
public static IEnumerable<string> Tokenize(string phrase)
{
return string.Split(phrase, ' ');
// Or use a regex, such as:
// return Regex.Split(phrase, #"\W+");
}
This won't give you exactly the same result, but I'll bet that it's close enough and it will definitely run a lot more efficiently. Actual search engines use an approach similar to this, since everything is indexed and searched at the word level, not the character level.
I guess your code is not doing what you want it to do anyway. "moderated" would be converted to "d" if I'm right. To get a good solution you have to specify your requirements a bit more detailed. I would probably use Replace or regular expressions.
I would use a regular expression (created inside the function) for this task. I think it would be capable of doing all the processing at once without having to make multiple passes through the string or having to create multiple intermediate strings.
private string RemoveNonEssentialWords(string phrase)
{
return Regex.Replace(phrase, // input
#"\b(" + String.Join("|", nonessentials) + #")\b", // pattern
"", // replacement
RegexOptions.IgnoreCase)
.Replace(" ", " ");
}
The \b at the beginning and end of the pattern makes sure that the match is on a boundary between alphanumeric and non-alphanumeric characters. In other words, it will not match just part of the word, like your sample code does.
Yeah, that smells.
I like little state machines for parsing, they can be self-contained inside a method using lists of delegates, looping through the characters in the input and sending each one through the state functions (which I have return the next state function based on the examined character).
For performance I would flush out whole words to a string builder after I've hit a separating character and checked the word against the list (might use a hash set for that)
I would create A Hash table of Removed words parse each word if in the hash remove it only one time through the array and I believe that creating a has table is O(n).
How does this look?
foreach (string nonEssent in nonessentials)
{
phrase.Replace(nonEssent, String.Empty);
}
phrase.Replace(" ", " ");
If you want to go the Regex route, you could do it like this. If you're going for speed it's worth a try and you can compare/contrast with other methods:
Start by creating a Regex from the array input. Something like:
var regexString = "\\b(" + string.Join("|", nonessentials) + ")\\b";
That will result in something like:
\b(left|right|chronic)\b
Then create a Regex object to do the find/replace:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(regexString, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Then you can just do a Replace like so:
string fixedPhrase = regex.Replace(phrase, "");

Categories