I want to search in my data. I'm using regular expression with IsMatch() function.
I have a class:
public class MyClass
{
public string Name { get; set; }
public string Address { get; set; }
}
I want to search in my data by Name Or Address, and Name or Address Like input
With input FuzzySearch is: a%b or a%b%c or japanese characters (ex:区%水).
In main function I have a list: List<MyClass> data and using regular expression with IsMatch() function as below:
Regex regex = new Regex(FuzzySearch, RegexOptions.IgnoreCase);
var allInfoList1 = allInfoList.Where(x => regex.IsMatch(x.Name) ||
regex.IsMatch(x.Address)).ToList();
Sometime result is true, but sometime result is wrong( with case input is japanese characters)
(Is the regular expression not support unicode?)
Are there any other solutions?
Rather than implementing your own document search engine, I would suggest considering tools like Apache Lucene or Apache Solr. I do not know your specific use case, and perhaps my suggestion is an overkill, but I would give it a thought.
Hope I helped!
Related
I'm not sure if it's okay to ask... But here goes.
I implemented a method that parses a string using regex, each matching are parsed through the delegates with an order ( actually, order is not important-- I think, wait, is it? ... But I wrote it this way, and it's not fully tested ):
Pattern Regex.Replace: #"(?<!\\)\$.+?\$" then String.Replace: #"\$", #"$"; Replace string enclosed by dollar sign. Ignores backslash ones, then erases backslash. Ex: "$global name$" -> "motherofglobalvar", "Money \$9000" -> "Money $9000"
Pattern Regex.Replace #"(?<!\\)%.+?%" then String.Replace #"\%", #"%"; Replace string enclosed by percentage sign. Ignores backslash ones, then erase backslash. Same as previous example: "%local var%" -> "lordoflocalvar", "It's over 9000\%" -> "It's over 9000%"
Pattern Regex.Replace #"(?<!\\)#" then String.Replace #"\#", #"#"; Replace char '#' with whitespace, ' '. But ignore backslash ones, then erase the backslash. Ex: "I#hit#the#ground#too#hard" -> "I hit the ground too hard", "qw\#op" -> "qw#op"
What I've done without much experience (I think):
//parse variable
public static string ParseVariable(string text)
{
return Regex.Replace(Regex.Replace(Regex.Replace(text, #"(?<!\\)\$.+?\$", match =>
{
string trim = match.Value.Trim('$');
string trimUpper = trim.ToUpper();
return variableGlobal.ContainsKey(trim) ? variableGlobal[trim] : match.Value;
}).Replace(#"\$", #"$"), #"(?<!\\)%.+?%", match =>
{
string trim = match.Value.Trim('%');
string trimUpper = trim.ToUpper();
return variableLocal.ContainsKey(trim) ? variableLocal[trim] : match.Value;
}).Replace(#"\%", #"%"), #"(?<!\\)#", " ").Replace(#"\#", #"#");
}
In short, what I used is: Regex.Replace().Replace()
Since I need to parse 3 kinds of symbols, I chained it as following: Regex.Replace(Regex.Replace(Regex.Replace().Replace()).Replace()).Replace()
Is there any more efficient way than this? I mean, like without need to go through the text 6 times? (3 times regex.replace, 3 times string.replace, where each replace modifies the text to be used by the next replace )
Or is it the best way it can do?
Thanks.
Here's a unique take on the problem, I think. You can build a class that will be used to construct the overall pattern piece-by-piece. This class will be responsible for the generating of the MatchEvaluator delegate that will be passed to Replace as well.
class RegexReplacer
{
public string Pattern { get; private set; }
public string Replacement { get; private set; }
public string GroupName { get; private set; }
public RegexReplacer NextReplacer { get; private set; }
public RegexReplacer(string pattern, string replacement, string groupName, RegexReplacer nextReplacer = null)
{
this.Pattern = pattern;
this.Replacement = replacement;
this.GroupName = groupName;
this.NextReplacer = nextReplacer;
}
public string GetAggregatedPattern()
{
string constructedPattern = this.Pattern;
string alternation = (this.NextReplacer == null ? string.Empty : "|" + this.NextReplacer.GetAggregatedPattern()); // If there isn't another replacer, then we won't have an alternation; otherwise, we build an alternation between this pattern and the next replacer's "full" pattern
constructedPattern = string.Format("(?<{0}>{1}){2}", this.GroupName, this.Pattern, alternation); // The (?<XXX>) syntax builds a named capture group. This is used by our GetReplacementDelegate metho.
return constructedPattern;
}
public MatchEvaluator GetReplaceDelegate()
{
return (match) =>
{
if (match.Groups[this.GroupName] != null && match.Groups[this.GroupName].Length > 0) // Did we get a hit on the group name?
{
return this.Replacement;
}
else if (this.NextReplacer != null) // No? Then is there another replacer to inspect?
{
MatchEvaluator next = this.NextReplacer.GetReplaceDelegate();
return next(match);
}
else
{
return match.Value; // No? Then simply return the value
}
};
}
}
It should be obvious as to what Pattern and Replacement represent. GroupName is kind of a hack to let the replacement evaluator know which RegexReplacer fragment resulted in the match. NextReplacer points to another replacer instance that holds a different pattern fragment (et al.).
The idea here is to have a kind of linked list of objects that will represent the overall pattern. You can call GetAggregatedPattern on the outer-most replacer to get the full pattern--each replacer calls the next replacer's GetAggregatedPattern to get that replacer's patter fragment, to which it concatenates its own fragment. The GetReplacementDelegate generates a MatchEvaluator. This MatchEvaluator will compare its own GroupName to the Match's captured groups. If the group name was captured, then we have a hit, and we return this replacer's Replacement value. Otherwise, we step into the next replacer (if there is one) and repeat the group name comparison. If there is no hit on any replacer, then we simply yield back the original value (i.e. what was matched by the pattern; this should be rare).
The usage of such might look like this:
string target = #"$global name$ Money \$9000 %local var% It's over 9000\% I#hit#the#ground#too#hard qw\#op";
RegexReplacer dollarWrapped = new RegexReplacer(#"(?<!\\)\$[^$]+\$", "motherofglobalvar", "dollarWrapped");
RegexReplacer slashDollar = new RegexReplacer(#"\\\$", string.Empty, "slashDollar", dollarWrapped);
RegexReplacer percentWrapped = new RegexReplacer(#"(?<!\\)%[^%]+%", "lordoflocalvar", "percentWrapped", slashDollar);
RegexReplacer slashPercent = new RegexReplacer(#"\\%", string.Empty, "slashPercent", percentWrapped);
RegexReplacer singleAt = new RegexReplacer(#"(?<!\\)#", " ", "singleAt", slashPercent);
RegexReplacer slashAt = new RegexReplacer(#"\\#", "#", "slashAt", singleAt);
RegexReplacer replacer = slashAt;
string pattern = replacer.GetAggregatedPattern();
MatchEvaluator evaluator = replacer.GetReplaceDelegate();
string result = Regex.Replace(target, pattern, evaluator);
Because you want each replacer to know if it got a hit, and because we are hacking this by using group names, you want to make sure that each group name is distinct. A simple way to ensure this would be to use a name that's identical to the variable name since you can't have two variables with the same name within the same scope.
You can see above that I am building each part of the pattern separately, but as I build, I pass the previous replacer as a 4th parameter to the current replacer. This builds the chain of replacers. Once built, I use the last replacer constructed in order to generate the overall pattern and evaluator. If you use anything but, then you will only have part of the overall pattern. Finally, it's simply a matter of passing the generated pattern and evaluator to the Replace method.
Keep in mind that this approach was targeted more at the problem as described. It may work in more general scenarios, but I've only worked with what you've presented. Also, since this is more of a parsing question, a parser may be the proper route to take--although the learning curve is going to be higher.
Also keep in mind that I haven't profiled this code. It certainly doesn't loop over the target string multiple times, but it does involve additional method calls during replacement. You would certainly want to test it in your environment.
I want to include some special characters in a string variable name in C#.
Example:
string foo-bar = String.Empty;
As far as my understand I can't declare a variable as I mentioned in the above example.
Is there any way around to declare a variable name with "-" included?
From MSDN:
You can't just choose any sequence of characters as a variable name. This isn't as worrying as it might sound, however, because you're still left with a very flexible naming
system.
The basic variable naming rules are as follows:
The first character of a variable name must be either a letter, an underscore character (_), or the at symbol (#).
Subsequent characters may be letters, underscore characters, or numbers.
No, this is not possible to do in C#.
If you really, really, really want to so this, you could use a Dictionary<string, string>:
Dictionary<string, string> someVars = new Dictionary<string, string>()
{
{"foo-bar", String.Empty},
{"bar-foo", "bazinga"}
}
Using them would look like this:
string newstring = someVars["foo-bar"] + "Hello World!";
Instead of just using the variable name, you would look up the string in your dictionary. Note that this is very inefficient and just intended as a joke, so please do no really use this ;)
If you are trying to deserialize object, you can use JsonProperty to acieve this.
Example:
public class SubscriptionListJsonData
{
[JsonProperty(PropertyName = "subscriptions")]
public string SubscriptionData { get; set; }
[JsonProperty(PropertyName = "#nextLink")]
public string nextLink { get; set; }
}
Follow this link for partially reading Json.
You can't do that in c# and most of other programming languages .. I also advice you to follow the C# naming conventions as it help you read your code in a way that -at least for me- always felt comfortable .
I am using MVC3, C#, .net4.0
I have objects that contain a search string with which I can use to search for the relevant objects ie for 4 objects:
[car:vw:engine:1800]
[car:vw:engine:Diesel 1800]
[car:vw:engine:1600]
[car:ford:engine:1800]
I would like to search for objects that have a make of "vw" and "1800" engine.
I could try Contains():
SearchString.Contains("vw:engine:1800")
Which will return just one object.
I need something like:
SearchString.Contains("vw:engine:*1800")
Where * is a wildcard and would pick up :
[car:vw:engine:1800]
[car:vw:engine:Diesel 1800]
The only way around this, at present, would be:
SearchString.Contains("vw:engine:1800") or
SearchString.Contains("vw:engine:Diesel 1800")
Is there a simple way to do this using a mainstream .net function like Contains(), if not Contains() itself.
There is a good reason for me using a search string like this, but this is not part of the question.
You can use regular expressions to check if SearchString is a match. .* means zero or more of any characters and is used in place of your wildcard.
string pattern = #"^\[car:vw:engine:.*1800]$";
bool matches = Regex.IsMatch(SearchString, pattern);
Generally I'd prefer the regular expressions.
In your particular case you could use something like this:
string car1 = "[car:vw:engine:Diesel 1800]";
string car2 = "[car:vw:engine:1800]";
var tokens1 = car1.Substring(1, car1.Length - 2).Split(':');
var tokens2 = car2.Substring(1, car2.Length - 2).Split(':');
bool IsMatch1 = tokens1[3].EndsWith("1800");
bool IsMatch2 = tokens2[3].EndsWith("1800");
So I have data like this:
((4886.03 12494.89 "LYR3_SIG2"))
It is always going to be SPACE delimited thus I want to use Regex to place each into a property.
Yes, I was playing around with some regex
string q = "4886.03 12494.89 \"LYR3_SIG2";
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);
but what I aim to do is to put each of the 3 values into a class like this
public class BowTies
{
public double XCoordinate { get; set; }
public double YCoordinate { get; set; }
public string Layer { get; set; }
}
Now I originally was parsing the data into a property
t = streamReader.ReadLine();
if ((t != null) && Regex.IsMatch(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))"))
currentVioType.Bowtie = new ParseType() { Formatted = Regex.Match(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))").Value.Trim('(', ')'), Original = t };
But now I really want to put that data into the doubles and string
thus this data is space delimited ((4886.03 12494.89 "LYR3_SIG2"))
I was started down my path of refactoring , but I temporarily was not using the regex for getting the doubles ( which are ALWAYS going to be the first 2 values, followed by a string so I started doing this:
currentAddPla.Bows.Add(new BowTies() { XCoordinate = 44.33, YCoordinate = 344.33, Layer = Regex.Match(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))").Value.Trim('(', ')')});
but I obviously need to use regex and parse this dumping the first value (the double into XCoordinate, then the 2nd value into YCoordinate and the 3rd value that regex is already getting ALL the data and needs to only get the 3rd value of "LYR3_SIG2" which should be found with regex right?
It is always going to be SPACE delimited thus
RegEx for this sounds like overkill. Have you considered using string.Split(' ');, eg:
string s = "((4886.03 12494.89 \"LYR3_SIG2\"))";
s = s.Replace("(", string.Empty).Replace(")", string.Empty);
string[] arr = s.Split(' ');
currentAddPla.Bows.Add(new BowTies() {
XCoordinate = Convert.ToDouble(arr[0]),
YCoordinate = Convert.ToDouble(arr[1]),
Layer = arr[3]});
You should just use String.Split instead of RegEx. The data is formatted simply enough that RegEx would be overkill even if it worked well here. On top of that the language which defines your data is not regular ( http://en.wikipedia.org/wiki/Regular_language ) and thus cannot be reliably parsed with RegEx. It may be working right now because the data inside the parens is simply formatted but languages which have matching braces are context-free and in general are not able to be parsed with regular expressions.
Im trying to match properties in class. Example class:
public static string ComingSoonPage
{
get { return "/blog-coming-soon.aspx"; }
}
public static string EncodeBase64(string dataToEncode)
{
byte[] bytes = System.Text.ASCIIEncoding.UTF8.GetBytes(dataToEncode);
string returnValue = System.Convert.ToBase64String(bytes);
return returnValue;
}
Im using this kind of regex:
(?:public|private|protected)([\s\w]*)\s+(\w+)[^(]
It matches not only properties but also methods which is wrong. So i want remove from matches sentences that contains (. So it select all but not methods (which contains ( ). How can i achieve that.
Try matching the "{" and the "get {" instead
(public|private|protected|internal)[\s\w]*\s+(\w+)\s*\{\s*get\s*\{
UPDATE
Match only the name of the property
(?<=(public|private|protected|internal)[\s\w]*\s+)\w+(?=\s*\{\s*get\s*\{)
uses the general pattern
(?<=prefix)find(?=suffix)
EDIT
A property might have no modifier (public, private etc.) at all and the type might contain extra characters (e.g. for arrays int[,]. Therefore it would probably be better to test only for the syntax elements following the property name (and the name itself). Also a property could consist of only a setter and be abstract: abstract int[,] Matrix { set; }. I suggest retrieving the property names like this:
\w+(?=\s*\{\s*(get|set)\b)
where \b matches a word beginning or (in this case) a word end.
This may be what you are looking for and this works perfectly! I deserve some treat though :)...
Regex r=new Regex(#"(public|private).*?(?=(public|private|$))",RegexOptions.Singleline);
Regex nr=new Regex(#"\(.*?\)\s+\{",RegexOptions.Singleline);
foreach(Match m in r.Matches(yourCodeFile))//extracts all methods and properties
{
if(!nr.IsMatch(m.Value))//shoots down methods
m.Value;//properties only
}
According to this answer, try using:
for Properties: type and name:
(?:public\s|private\s|protected\s|internal\s)\s*(?:readonly|static\s+)?(?<type>\w+)\s+(?<name>\w+)[\s\r\n]*{
for Fields: type and name:
(?:public\s|private\s|protected\s)\s*(?:readonly|static\s+)?(?<type>\w+)\s+(?<name>\w+);
for Methods: methodName and parameterType and parameter:
(?:public\s|private\s|protected\s|internal\s)?[\s\w]*\s+(?<methodName>\w+)\s*\(\s*(?:(ref\s|/in\s|out\s)?\s*(?<parameterType>\w+)\s+(?<parameter>\w+)\s*,?\s*)+\)
for c# code analysis try Irony or The Roslyn Project, see this sample:
C# and VB.NET Code Searcher - Using Roslyn codeproject