Replacing only a certain part of URL - c#

I am facing following problem: I have to change a certain part of URL if it contains a specific match ("siteassets") and transform it into different word ("syssiteassets"). This particular word which needs to be replaced can occur at various order in the URL, so for example once it can be "example.com/siteassets/title/index" and different time it can be: "example.com/de/items/siteassets/title/index".
I have tried my luck with pretty simple approach:
if (e.UrlBuilder.Path.Contains("siteassets") && (e.UrlBuilder.Path.Contains(".pdf") || e.UrlBuilder.Path.Contains(".dwg")))
{
e.UrlBuilder.Path = e.UrlBuilder.Path.Replace("siteassets", "syssiteassets");
}
...but since this if statement is in the middleware method through which requests run multiple times, the once changed string goes from "syssiteassets" to "syssyssiteassets".
What is the best way to deal with this? I cannot use REGEX (not my decision).

is "siteassets" always surrounded by "/" ? I hope else you might have weird bugs.
So if yes, why not just :
e.UrlBuilder.Path = e.UrlBuilder.Path.Replace("/siteassets/", "/syssiteassets/");

Related

Regex hangs trying to find match

I am trying to match an assignment string in VB code (as in I'm passing in text that is VB code into my program that's written in C#). The assignment string that I'm trying to match is something for example like
CustomClassInitializer(someParameter, anotherParameter, someOtherClassAsParameterWithInitialization()).SomeProperty = 7
and I realize that's rather complex, but it actually isn't far off from some of the real text I'm trying to match.
In order to do so I wrote a Regex. This Regex:
#"[\w,.]+\(([\w,.]*\(*,* *\)*)+ = "
which correctly matches. The problem is it becomes VERY slow (with timeouts), which I've researched and found is probably because of "backtracking". One of the suggested solutions to help with backtracking in general was to add "?>" to the regex, which I think would go in this position:
[\w,.]+\(?>([\w,.]*\(*,* *\)*)+ =
but this no longer matches properly.
I'm fairly new to Regex, so I imagine that there is a much better pattern. What is it please? Or how can I improve my times in general?
Helpful notes:
I'm only interested in position 0 of the string I'm searching for a
match in. My code is "if (isMatch && match.index == 0) { ... }. Can
I tell it to only check position 0 and if it's not a match move on?
The reason I use all the 0 or more things is the match could be as simple as CustomClass() = new CustomClass(), and as complicated as the above or perhaps a bit worse. I'm trying to get as many cases as possible.
This Regex is interested in "[\w,.]+(" and then "whatever may be inside the parentheses" (I tried to think of what all could be inside them based on the fact that it's valid VB code) until you get to the close parenthesis and then " = ". Perhaps I can use a wildcard for literally anything until it get's to ") = " in the string? - Like I said, fairly new to Regex.
Thanks in advance!
This seems to do what you want. Normally, I like to be more specific than .*, but it is working correctly. Note that I am using the Multi-line option.
^.*=\s*.+$
Here is a working example in RegExStorm.net example

How to check the equality of two identical URLs if one of them has "www"?

The following URLs are identical:
http://example.com
http://www.example.com
Can I get the expected result using a method provided in .Net framework like Compare() method in Uri class? or should I handle this case manually?
Unfortunately I don't have enough rep to leave a simple comment for you to ponder about so I'll just leave an answer as food for thought.
I can see a few possible solutions which should cover most scenarios (I'm pretty certain I will probably miss quite a few but everyone makes mistakes):
Use a loop to move over both URIs and compare each part/component as you see fit with break early conditions to speed it up
Clean up the URIs as much as you can using arbitrarily defined rules (eg. remove protocol, remove www prefix, trim ending /) then use Equals() to compare
If you're feeling bold you could use something similar to lexical analysis to convert both URIs into a objects/tokens/parts and compare the subsequent result (harder to implement but probably the most accurate)
Just keep in mind that the guys in the comments are right. The URLs aren't technically the same and the logic you implement in your ultimate solution is purely defined around your definition of 'identical'.
Also, I wouldn't use the solution that simple replaces "www." with "" since someone crazy could easily put a 'www.' somewhere else in their URL and break that implementation unless you perform the replace on both URLs which is also quite risky since one could have more 'www.' instances than the other and would still be considered 'identical'
As mentioned in the comments, these are actually not identical urls and cannot be treated as such.
That aside, it may help you to check the equality after removing the www part:
string urla = #"http://example.com";
string urlb = #"http://www.example.com";
if (urlb.contains("www.")) urlb = urlb.replace("www.", "");
if (urla == urlb) {
// url matches
}

Automatically escaping characters within strings

I am working on a C# project where I am exporting data from a database that is defined by the user so I have no idea what the data is going to contain or the format it is going to be in.
Some of the strings within the database might include apostrophe's (') which I need to escape but everything I've found on the internet shows that I would have to do string.replace("'", "\'"); which seems a bit odd as it would be a mass of replace statements for every possibility.
Isn't there a better way to do this.
Thanks for any help you can provide.
I recently had to make a code fix for this same problem. I had to put a ton of string.replace() statements everywhere. My recommendation would be to create a method that handles all escape character possibilities and have your query strings pass through this method before being executed. If you design your structure correctly you should only have to call this method once.
public string FixEscapeCharacterSequence(string query)
{
query = query.Replace("'", "\'");
//..Any other replace statements you need
//....
return query;
}

Which is the best way to Prevent Bad word entry in Discussion module in my site

In my web site some hackers are entering bad words. Which is the best way to prevent this?
I am using ASP.NET, C# and SQL Server as resources.
check bad words in form backend ?
check bad words in javascript?
check bad words in stored procedure before insert?
I think first method is best.
Please tell the optimized code for this check
Now I am using this method
var filterWords = ["fool", "dumb", "couch potato"];
// "i" is to ignore case and "g" for global
var rgx = new RegExp(filterWords.join(""), "gi");
function wordFilter(str) {
return str.replace(rgx, "****");
}
// call the function
document.write("Original String - ");
document.writeln("You fool. Why are you so dumb <br/>");
document.write("Replaced String - ");
document.writeln(wordFilter("You fool. Why are you so dumb"));
You should check in the ASP.NET code, on the server side. JavaScript or any other client side check can be easily worked around. The code you posted works fine, except it is not particularly robust (a variety of simple misspellings will get around it).
make sure to check for permutations such as
Secure --> $3(ur3
And I would replace the word with something like
[REMOVED] or [CENSORED]
Having words like s***t still can be viewed as offensive to customers/others.
Edit: Seeing HevyLight's thoughts on javascript usage here... you might try a string filter in your C# layer (assuming that is doing the heavy lifting already and database calls). Pass all strings posts through the filter before writing to database (and for others to see).
Reality is that you can’t prevent 100% of bad words.
I’d go with a two-step verification on the server side (JS can be disabled and SQL is not really suitable for handling this)
Create a list of most common bad words that are used the most – this will probably catch like 80% of all inputs.
Create a list of patterns for suspects that will signal you to manually verify these.
This could be patterns such as
a) Word contains two or more ** characters
b) Word contains letters and one of the following characters 0,3,$, and others
In time you’ll just have to keep both lists updated. Again, this will not solve 100% of cases but it will probably catch and fix like 95% if implemented properly.

Need C# regexp for URL validation

How to validate by a single regular expression the urls:
http://83.222.4.42:8880/listen.pls
http://www.my_site.com/listen.pls
http://www.my.site.com/listen.pls
to be true?
I see that I formulated the question not exactly :(, sorry my mistake. The idea is that I want to validate with the help of regexp valid urls, let it be an external ip address or the domain name. This is the idea, other valid urls can be considered:
http://93.122.34.342/
http://193.122.34.342/abc/1.html
http://www.my_site.com/listen2.pls
http://www.my.site.com/listen.php
and so on.
The road to hell is paved with string parsing.
URL parsing in particular is the source of many, many exploited security issues. Don't do it.
For example, do you want this to match?
Note the uppercase scheme section. Remember that some parts of a URL are case sensitive, and some are not. Then there's encoding rules. Etc.
Start by using System.Uri to parse the URLs you provide:
var uri = new Uri("http://83.222.4.42:8880/listen.pls");
Then you can write things like:
if (uri.Scheme == "http" &&
uri.Host == "83.222.4.42" &&
uri.AbsolutePath == "/listen.pls"
)
{
// ...
}
^http://.+/listen\.pls$
If there are strictly only 3 of them don't bother with a regular expression because there is not necessarily a good pattern match when everything is already strictly known - in fact you might accidentally match more than these three urls - which becomes a problem if the urls are intended for security purposes or something equally important. Instead, test the three cases directly - maybe put them in a configuration file.
In the future if you want to add more URLs to the list you'll likely end up with an overly complicated regular expression that's increasingly hard to maintain and takes the place of a simpler check against a small list.
You won't necessarily get speed gains by running Regex to find these three strings - in fact it might be quite expensive.
Note: If you wantUri regular expressions also try websites hosting libraries like Regex Library - there are many to pick and choose from if your needs change.
/^http:\/\/[-_a-zA-Z0-9.]+(:\d+)?\/listen\.pls$/
Do you mean any URL ending with /listen.pls? In that case try this:
^http://[^/]+/listen\.pls$
or if the protocol identifier must be optional:
^[http://]?[^/]+/listen\.pls$
Anyway take a look here, maybe it is useful for you: Url and Email validation using Regex
A modified version base upon Jay Bazuzi's solution above since I can't post code in comment, it checks a blacklisted extensions (I do this only for demonstration purpose, you should strongly consider to build a whitelist rather than a blacklist) :
string myurl = "http://www.my_site.com/listen.pls";
Uri myUri = new Uri(myurl);
string[] invalidExtensions = {
".pls",
".abc"
};
foreach(string invalidExtension in invalidExtensions) {
if (invalidExtension.ToLower().Equals(System.IO.Path.GetExtension(myUri.AbsolutePath))) {
//Logic here
}
}

Categories