Tweaking my search functionality - c#

I have tested search functionality i have implemented on a live website. I came across some small issues. I can't put special characters in the search box or my application will crash.
I tried to solve this using some replaces on the characters it crashes on, but this won't cure the pain.
When i entered this sign: * into the searchbox it gave me the following error:
Cannot parse '<%%> echo;': '' or '?' not allowed as first character in WildcardQuery. I have had this error before and then stripped the spaces between all words. The error was then gone. However when i now replace this: * with this: "" i will get the error described above.
Is there any standard way i can solve the special character issue with?
I'll write down some of my code here, so i can get better feedback.
Analyzer analyzer = new StandardAnalyzer();
QueryParser qpContent = new QueryParser(Index.ContentFieldName, analyzer);
keyword.Trim();
keyword = keyword.Replace("\"", "");
keyword = keyword.Replace("^", "");
keyword = keyword.Replace("*", "");
Query queryContent = qpContent.Parse(keyword + "*");
QueryParser qpLanguage = new QueryParser("language", analyzer);
Query queryLanguage = qpLanguage.Parse(Sitecore.Context.Language.Name.ToString());
As you see i first replace * and then later on add it back in the queryparser. I'm not 100% familiar with this kind of functionality and therefore have no clue at all what i'm doing wrong. All help is much appreciated, thanks!

you may have the ValidateRequest option set in your config, this helps to protect against Injection Attacks in asp.net.
Some details can be found here...
http://msdn.microsoft.com/en-us/library/bb355989.aspx
http://msdn.microsoft.com/en-us/library/system.web.configuration.pagessection.validaterequest.aspx
and...
http://en.wikipedia.org/wiki/Code_injection
http://en.wikipedia.org/wiki/SQL_injection

Related

Replacing words in a Word document cause multiple times replacement with C#

I need to create a C#.NET program which will search specific words in a Microsoft Word document and will replace it with another words. For example, in my word file there is a text which is – LeadSoft IT. This “LeadSoft IT” will be replaced by – LeadSoft IT Limited. Now there is a problem which is, at the first time LeadSoft IT will be replaced with LeadSoft IT Limited. But if I run the program again then it will change LeadSoft IT again and in the next time the text will be LeadSoft IT Limited Limited. This is a problem. Can anyone suggest me how to solve this problem with C# code to replace words in word document.
If you already have some script for this, feel free to post it and I'll try and help more.
I'm not sure what functionality you're using to find the text instance, but I would suggest looking into regex, and using something like (LeadSoft IT(?! Limited)).
Regex: https://regexr.com/
A good regex tester: https://www.regextester.com/109925
Edit: I made a Python script that uses regex to replace the instances:
import re
word_doc = "We like working " \
"here at Leadsoft IT.\n" \
"We are not limited here at " \
"Leadsoft It Limited."
replace_str = "Leadsoft IT Limited"
reg_str = '(Leadsoft IT(?!.?Limited))'
fixed_str = re.sub(reg_str, replace_str, word_doc, flags=re.IGNORECASE)
print(fixed_str)
# Prints:
# We like working here at Leadsoft IT Limited.
# We are not limited here at Leadsoft It Limited.
Edit 2: Code re-created in C#: https://gist.github.com/Zylvian/47ecd6d1953b8d8c3900dc30645efe98
The regex checks the entire string for instances where Leadsoft IT is NOT followed by Limited, and for all those instances, replaces Leadsoft IT with Leadsoft IT Limited.
The regex uses what's called a "negative lookahead (?!)" which makes sure that the string to the left is not followed by the string to the right. Feel free to edit the regex how you see fit, but be aware that the matching is very strong.
If you want to understand the regex string better, feel free to copy it into https://www.regextester.com/.
Let me know if that helps!
Simplistically, you can just run another replace to fix the problem you cause:
s = s.Replace("LeadSoft IT", "LeadSoft IT Limited").Replace("LeadSoft IT Limited Limited", "LeadSoft IT Limited");
If you're after a more generic fixing of this that doesn't hard code the problem string, consider examining whether the string you find is inside the string you replace with, which will mean the problem occurs. This means you need to run a second replacement on the document that finds the result of running the replacement on the replacement
var find = "LeadSoft IT";
var repl = "LeadSoft IT Limited";
var result = document.Replace(find, repl);
var problemWillOccur = repl.Contains(find);
if(problemWillOccur){
var fixProblemByFinding = repl.Replace(find, repl); //is "LeadSoft IT Limited Limited"
result = result.Replace(fixProblemByFinding, repl);
}
You may be interested how I solve this problem.
At first, I was using NPOI but it was making a mess with document, so I discovered that a DOCX file is simply a ZIP Archive with XMLs.
https://github.com/kubala156/DociFlow/blob/main/DociFlow.Lib/Word/SeekAndReplace.cs
Usage:
var vars = Dictionary<string, string>()
{
{ "testtag", "Test tag value" }
}
using (var doci = new DociFlow.Lib.Word.SeekAndReplace())
{
// test.docx contains text with tag "{{testtag}}" it will be replaced with "Test tag value"
doci.Open("test.docx");
doci.FindAndReplace(vars, "{{", "}}");
}
NPOI 2.5.4 provides ReplaceText method to help you replace placeholders in a Word file.
Here is an example.
https://github.com/nissl-lab/npoi-examples/blob/main/xwpf/ReplaceTexts/Program.cs

Regex hangs trying to find match

I am trying to match an assignment string in VB code (as in I'm passing in text that is VB code into my program that's written in C#). The assignment string that I'm trying to match is something for example like
CustomClassInitializer(someParameter, anotherParameter, someOtherClassAsParameterWithInitialization()).SomeProperty = 7
and I realize that's rather complex, but it actually isn't far off from some of the real text I'm trying to match.
In order to do so I wrote a Regex. This Regex:
#"[\w,.]+\(([\w,.]*\(*,* *\)*)+ = "
which correctly matches. The problem is it becomes VERY slow (with timeouts), which I've researched and found is probably because of "backtracking". One of the suggested solutions to help with backtracking in general was to add "?>" to the regex, which I think would go in this position:
[\w,.]+\(?>([\w,.]*\(*,* *\)*)+ =
but this no longer matches properly.
I'm fairly new to Regex, so I imagine that there is a much better pattern. What is it please? Or how can I improve my times in general?
Helpful notes:
I'm only interested in position 0 of the string I'm searching for a
match in. My code is "if (isMatch && match.index == 0) { ... }. Can
I tell it to only check position 0 and if it's not a match move on?
The reason I use all the 0 or more things is the match could be as simple as CustomClass() = new CustomClass(), and as complicated as the above or perhaps a bit worse. I'm trying to get as many cases as possible.
This Regex is interested in "[\w,.]+(" and then "whatever may be inside the parentheses" (I tried to think of what all could be inside them based on the fact that it's valid VB code) until you get to the close parenthesis and then " = ". Perhaps I can use a wildcard for literally anything until it get's to ") = " in the string? - Like I said, fairly new to Regex.
Thanks in advance!
This seems to do what you want. Normally, I like to be more specific than .*, but it is working correctly. Note that I am using the Multi-line option.
^.*=\s*.+$
Here is a working example in RegExStorm.net example

Using translate.google from code

Is there a way to use https://translate.google.co.za/ in code?
Maybe make use of Encoding, WebClients and Uri's, but I'm not sure on the correct way to do this.
In code I can get the translate to language and translate from language as well as the content, but how can I incorporate those parameters into the url and then display the end result?
Please Help
Code attempt:
UnicodeEncoding tmpEncoding = new UnicodeEncoding();
string url = String.Format("http://translate.google.co.za/#{0}/{1}/{2}", languageFrom, languageTo, content);
WebClient tmpClient = new WebClient();
tmpClient.Encoding = System.Text.Encoding.ASCII;
string result = tmpEncoding.GetString(tmpClient.DownloadData(url));
The result it gives me is a list of chinese or japanese characters. I dont know what I doing wrong. Maybe the Encoding?
Take a look at the following website click here
You can use the official Google Translate API for this
Take note that it will cost money to translate. Also take a look at other translate api which can be used inside .net
I Did some searching for ya, Bing translator service is a free API for a maximum of 2M characters a monthe from there on you have to pay for it. It also has a nice SDK to go with it.
I found an answer courtesy of Rick Strahl's Web Log (http://weblog.west-wind.com/posts/2011/Aug/06/Translating-with-Google-Translate-without-API-and-C-Code)
Although I didnt use the JavaScriptSerializer it gave me what I wanted. In the form of (\"content\").
So just a bit of string manipulation and I'm golden.
EDIT:
I ended up using the Serializer as the other way didn't give the special characters that form some words, i.e words from french woudnt have those characters that make the words frenchy. Instead it would give a question mark surrounded by a white diamond.

Plus sign in query string?

I have a webapp created using C# and asp.net. I placed a parameter value in the querystring with a plus(+) sign. But the plus sign disappear.
How can I include the plus sign(+) in the query string without disappearing?
Here I found the same question and according to it, I have used Server.UrlEncode(myqerystring) and the time of decoding Server.UrlDecode(myqerystring) but some how it always resolves to the SPACE here is watch window
1) Querystring after the Server.UrlEncode()
2) Querystring after the Server.UrlDecode()
notice the space between S and R it should be +. I am new to all web development and I read other answers which says use UrlEncode and decode but it giving the same issue as before am I doing something wrong and yes the query string is automatically generated. I have no control over it.
There is other hack replace the " " or "%2b" with "+" I will go to that if I dont find any good way. So is there any good way to do this. Thanks.
The answer you link to just mentions using Server.UrlEncode, not Server.UrlDecode. When you read from Request.Querystring it automatically decodes the string for you. Doing it manually a second time is corrupting it and is why you're getting a space.
You can take a look at http://msdn.microsoft.com/en-us/library/zttxte6w(v=vs.110).aspx
Although this might help
string destinationURL = "http://www.contoso.com/default.aspx?user=test";
NextPage.NavigateUrl = "~/Finish?url=" + Server.UrlEncode(destinationURL);
Regarding plus sign you can not do this as '+' sign has semantic meaning in query string
Take a look at Plus sign in query string
EDIT
Have you used '+' sign while using google search. This provide different results.

Easiest way to convert a URL to a hyperlink in a C# string?

I am consuming the Twitter API and want to convert all URLs to hyperlinks.
What is the most effective way you've come up with to do this?
from
string myString = "This is my tweet check it out http://tinyurl.com/blah";
to
This is my tweet check it out http://tinyurl.com/>blah
Regular expressions are probably your friend for this kind of task:
Regex r = new Regex(#"(https?://[^\s]+)");
myString = r.Replace(myString, "$1");
The regular expression for matching URLs might need a bit of work.
I did this exact same thing with jquery consuming the JSON API here is the linkify function:
String.prototype.linkify = function() {
return this.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/, function(m) {
return m.link(m);
});
};
This is actually an ugly problem. URLs can contain (and end with) punctuation, so it can be difficult to determine where a URL actually ends, when it's embedded in normal text. For example:
http://example.com/.
is a valid URL, but it could just as easily be the end of a sentence:
I buy all my witty T-shirts from http://example.com/.
You can't simply parse until a space is found, because then you'll keep the period as part of the URL. You also can't simply parse until a period or a space is found, because periods are extremely common in URLs.
Yes, regex is your friend here, but constructing the appropriate regex is the hard part.
Check out this as well: Expanding URLs with Regex in .NET.
You can add some more control on this by using MatchEvaluator delegate function with regular expression:
suppose i have this string:
find more on http://www.stackoverflow.com
now try this code
private void ModifyString()
{
string input = "find more on http://www.authorcode.com ";
Regex regx = new Regex(#"\b((http|https|ftp|mailto)://)?(www.)+[\w-]+(/[\w- ./?%&=]*)?");
string result = regx.Replace(input, new MatchEvaluator(ReplaceURl));
}
static string ReplaceURl(Match m)
{
string x = m.ToString();
x = "< a href=\"" + x + "\">" + x + "</a>";
return x;
}
/cheer for RedWolves
from: this.replace(/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&\?/.=]+/, function(m){...
see: /[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&\?/.=]+/
There's the code for the addresses "anyprotocol"://"anysubdomain/domain"."anydomainextension and address",
and it's a perfect example for other uses of string manipulation. you can slice and dice at will with .replace and insert proper "a href"s where needed.
I used jQuery to change the attributes of these links to "target=_blank" easily in my content-loading logic even though the .link method doesn't let you customize them.
I personally love tacking on a custom method to the string object for on the fly string-filtering (the String.prototype.linkify declaration), but I'm not sure how that would play out in a large-scale environment where you'd have to organize 10+ custom linkify-like functions. I think you'd definitely have to do something else with your code structure at that point.
Maybe a vet will stumble along here and enlighten us.

Categories