Regex matches in C# but not in java - c#

I have the following regex (long, I know):
(?-mix:((?-mix:(?-mix:\{\%).*?(?-mix:\%\})|(?-mix:\{\{).*?(?-mix:\}\}?))
|(?-mix:\{\{|\{\%)))
that I'm using to split a string. It matches correctly in C#, but when I moved the code to Java, it doesn't match. Is there any particular feature of this regex that is C#-only?
The source is produced as:
String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");
While in C# it's:
string source = #"{% assign foo = values %}.{{ foo[0] }}.";
The C# version is like this:
string[] split = Regex.split(source, regex);
In Java I tried both:
String[] split = source.split(regex);
and also
Pattern p = Pattern.compile(regex);
String[] split = p.split(source);

Here is a sample program with your code: http://ideone.com/hk3uy
There is a major difference here between Java and other languages: Java does not add captured groups as tokens in the result array (example). That means that all delimiters are removed from result, though they would be included in .Net.
The only alternative I know is not to use split, but getting a list of matches and splitting manually.

I think the problem is with how you're defining source. On my system, this:
String source = Pattern.quote("{% assign foo = values %}.{{ foo[0] }}.");
is equivalent to this:
String source = "\\Q{% assign foo = values %}.{{ foo[0] }}.\\E";
(that is, it adds a stray \Q and \E), but the way the method is defined, your Java implementation could treat it as equivalent to this:
String source = "\\{% assign foo = values %\\}\\.\\{\\{ foo\\[0\\] \\}\\}\\.";
(that is, inserting lots of backslashes).
Your regex itself seems fine. This program:
public static void main(final String... args)
{
final Pattern p = Pattern.compile("(?-mix:((?-mix:(?-mix:\\{\\%).*?(?-mix:\\%\\})|(?-mix:\\{\\{).*?(?-mix:\\}\\}?))|(?-mix:\\{\\{|\\{\\%)))");
for(final String s : p.split("a{%b%}c{{d}}e{%f%}g{{h}}i{{j{%k"))
System.out.println(s);
}
prints
a
c
e
g
i
j
k
that is, it successfully treats {%b%}, {{d}}, {%f%}, {{h}}, {{, and {% as split-points, with all the non-greediness you'd expect. But tor the record, it also works if I strip p down to just
Pattern.compile("\\{%.*?%\\}|\\{\\{.*?\\}\\}?|\\{\\{|\\{%");
;-)

use \\{ instead of \{ and for other symbols too

Related

Is it possible to store a regex match and use part of it as a list enumerator?

I have created a MadLibs style game where the user enters responses to prompts which in turn replace blanks, represented by %s0, %s1 etc., in a story. I have this working using a for loop but someone else suggested I could do it using regex. What I have so far is below, which replaces all instances of %s+number with "wibble". What I was wondering is if it is possible to store the number found by the regex in a temporary variable and in turn use that to return a value from the list Words? E.g. return Regex.Replace(story, pattern, Global.Words[x]); where x is the number returned by the regex pattern as it goes over the string.
static void Main(string[] args)
{
Globals.Words = new List<string>();
Globals.Words.Add("nathan");
Globals.Words.Add("bob");
var text = "Once upon a time there was a %s0 and it was %s1";
Console.WriteLine(FindEscapeCharacters(text));
}
public static string FindEscapeCharacters(string story)
{
var pattern = #"%s([0-9]+)";
return Regex.Replace(story, "%s([0-9]+)", "wibble");
}
Thanks in advance, Nathan.
Not a direct answer to your question about regexes, but if I understand you correctly, there is an easier way to do this:
string baseString = "I have a {0} {1} in my {0} {2}.";
List<string> words = new List<string>() { "red", "cat", "hat" };
string outputString = String.Format(baseString, words.ToArray());
outputString will be I have a red cat in my red hat..
Is that not what you want, or is there more to your question that I'm missing?
Minor elaboration
String.Format uses the following signature:
string Format(string format, params object[] values)
The neat thing about params is that you can either list values separately:
var a = String.Format("...", valueA, valueB, valueC);
but you can also pass in an array directly:
var a = String.Format("...", valueArray);
Note that you can't mix and match the two approaches.
Yes, you are very close in your attempt with Regex.Replace; the last step is to change constant "wibble" into lambda match => how_to_replace_the_match:
var text = "Once upon a time there was a %s0 and it was %s1";
// Once upon a time there was a nathan and it was bob
var result = Regex.Replace(
text,
"%s([0-9]+)",
match => Globals.Words[int.Parse(match.Groups[1].Value)]);
Edit: In case you don't want working with capturing groups by their numbers, you can name them explicitly:
// Once upon a time there was a nathan and it was bob
var result = Regex.Replace(
text,
"%s(?<number>[0-9]+)",
match => Globals.Words[int.Parse(match.Groups["number"].Value)]);
There is an overload of Regex.Replace that, rather than taking a string for the last argument, takes a MatchEvaluator delegate - a function that takes a Match object and returns a string.
You could make that function parse the integer from the Match's Groups[1].Value property and then use that to index into your list, returning the string you find.

Split string with plus sign as a delimiter

I have an issue with a string containing the plus sign (+).
I want to split that string (or if there is some other way to solve my problem)
string ColumnPlusLevel = "+-J10+-J10+-J10+-J10+-J10";
string strpluslevel = "";
strpluslevel = ColumnPlusLevel;
string[] strpluslevel_lines = Regex.Split(strpluslevel, "+");
foreach (string line in strpluslevel_lines)
{
MessageBox.Show(line);
strpluslevel_summa = strpluslevel_summa + line;
}
MessageBox.Show(strpluslevel_summa, "summa sumarum");
The MessageBox is for my testing purpose.
Now... The ColumnPlusLevel string can have very varied entry but it is always a repeated pattern starting with the plus sign.
i.e. "+MJ+MJ+MJ" or "+PPL14.1+PPL14.1+PPL14.1" as examples.
(It comes form Another software and I cant edit the output from that software)
How can I find out what that pattern is that is being repeated?
That in this exampels is the +-J10 or +MJ or +PPL14.1
In my case above I have tested it by using only a MessageBox to show the result but I want the repeated pattering stored in a string later on.
Maybe im doing it wrong by using Split, maybe there is another solution.
Maybe I use Split in the wrong way.
Hope you understand my problem and the result I want.
Thanks for any advice.
/Tomas
How can I find out what that pattern is that is being repeated?
Maybe i didn't understand the requirement fully, but isn't it easy as:
string[] tokens = ColumnPlusLevel.Split(new[]{'+'}, StringSplitOptions.RemoveEmptyEntries);
string first = tokens[0];
bool repeatingPattern = tokens.Skip(1).All(s => s == first);
If repeatingPattern is true you know that the pattern itself is first.
Can you maybe explain how the logic works
The line which contains tokens.Skip(1) is a LINQ query, so you need to add using System.Linq at the top of your code file. Since tokens is a string[] which implements IEnumerable<string> you can use any LINQ (extension-)method. Enumerable.Skip(1) will skip the first because i have already stored that in a variable and i want to know if all others are same. Therefore i use All which returns false as soon as one item doesn't match the condition(so one string is different to the first). If all are same you know that there is a repeating pattern which is already stored in the variable first.
You should use String.Split function :
string pattern = ColumnPlusLevel.Split("+")[0];
...but it is always a repeated pattern starting with the plus sign.
Why do you even need String.Split() here if the pattern always only repeats itself?
string input = #"+MJ+MJ+MJ";
int indexOfSecondPlus = input.IndexOf('+', 1);
string pattern = input.Remove(indexOfSecondPlus, input.Length - indexOfSecondPlus);
//pattern is now "+MJ"
No need of string split, no need to use LinQ
String has a method called Split which let's you split/divide the string based on a given character/character-set:
string givenString = "+-J10+-J10+-J10+-J10+-J10"'
string SplittedString = givenString.Split("+")[0] ///Here + is the character based on which the string would be splitted and 0 is the index number
string result = SplittedString.Replace("-","") //The mothod REPLACE replaces the given string with a targeted string,i added this so that you can get the numbers only from the string

How to do a wildcard search in C# on ASP.NET?

I am using MVC3, C#, .net4.0
I have objects that contain a search string with which I can use to search for the relevant objects ie for 4 objects:
[car:vw:engine:1800]
[car:vw:engine:Diesel 1800]
[car:vw:engine:1600]
[car:ford:engine:1800]
I would like to search for objects that have a make of "vw" and "1800" engine.
I could try Contains():
SearchString.Contains("vw:engine:1800")
Which will return just one object.
I need something like:
SearchString.Contains("vw:engine:*1800")
Where * is a wildcard and would pick up :
[car:vw:engine:1800]
[car:vw:engine:Diesel 1800]
The only way around this, at present, would be:
SearchString.Contains("vw:engine:1800") or
SearchString.Contains("vw:engine:Diesel 1800")
Is there a simple way to do this using a mainstream .net function like Contains(), if not Contains() itself.
There is a good reason for me using a search string like this, but this is not part of the question.
You can use regular expressions to check if SearchString is a match. .* means zero or more of any characters and is used in place of your wildcard.
string pattern = #"^\[car:vw:engine:.*1800]$";
bool matches = Regex.IsMatch(SearchString, pattern);
Generally I'd prefer the regular expressions.
In your particular case you could use something like this:
string car1 = "[car:vw:engine:Diesel 1800]";
string car2 = "[car:vw:engine:1800]";
var tokens1 = car1.Substring(1, car1.Length - 2).Split(':');
var tokens2 = car2.Substring(1, car2.Length - 2).Split(':');
bool IsMatch1 = tokens1[3].EndsWith("1800");
bool IsMatch2 = tokens2[3].EndsWith("1800");

C# String compare not working

I'm having some issues with the string comparison of a string the is received by Request.queryString and a line from a file .resx.
The code receive Request.queryString to a variable named q, then it goes to a function to compare if a line has q value in it:
while ((line = filehtml.ReadLine()) != null)
{
if (line.ToLower().Contains(q.ToLower().ToString()))
HttpContext.Current.Response.Write("<b>Content found!</b>");
else
HttpContext.Current.Response.Write("<b>Content not found!</b>");
}
As it's a search in static files, special characters must be consider and seraching for: Iberê for example, isn't returning true because the .Contains, .IndexOf or .LastindexOf is comparing: iberê, that is coming from q, with iberê that is coming from the line.
Consider that I already tried to use ResXResourceReader (which can't be found by Visual Studio), ResourceReader and ResourceManager (these I couldn't set a static file by the path to be read).
EDIT:
Problem solved. There was a instance of SpecialChars, overwriting q value with EntitiesEncode method
The problem is that the ê character is escaped in both strings. So if you did something like this, it wouldn't work:
string line = "sample iberê text";
string q = "iberê";
if (line.Contains(q)) {
// do something
}
You need to unscape the strings. Use HttpUtility in the System.Web assembly. This will work:
line = System.Web.HttpUtility.HtmlDecode(line);
q = System.Web.HttpUtility.HtmlDecode(q);
if (line.Contains(q)) {
// do something
}
As suggested by #r3bel below, if you're using .net 4 or above you can also use System.Net.WebUtility.HtmlDecode, so you don't need an extra assembly reference.

Does not match sentence that contains specified character

Im trying to match properties in class. Example class:
public static string ComingSoonPage
{
get { return "/blog-coming-soon.aspx"; }
}
public static string EncodeBase64(string dataToEncode)
{
byte[] bytes = System.Text.ASCIIEncoding.UTF8.GetBytes(dataToEncode);
string returnValue = System.Convert.ToBase64String(bytes);
return returnValue;
}
Im using this kind of regex:
(?:public|private|protected)([\s\w]*)\s+(\w+)[^(]
It matches not only properties but also methods which is wrong. So i want remove from matches sentences that contains (. So it select all but not methods (which contains ( ). How can i achieve that.
Try matching the "{" and the "get {" instead
(public|private|protected|internal)[\s\w]*\s+(\w+)\s*\{\s*get\s*\{
UPDATE
Match only the name of the property
(?<=(public|private|protected|internal)[\s\w]*\s+)\w+(?=\s*\{\s*get\s*\{)
uses the general pattern
(?<=prefix)find(?=suffix)
EDIT
A property might have no modifier (public, private etc.) at all and the type might contain extra characters (e.g. for arrays int[,]. Therefore it would probably be better to test only for the syntax elements following the property name (and the name itself). Also a property could consist of only a setter and be abstract: abstract int[,] Matrix { set; }. I suggest retrieving the property names like this:
\w+(?=\s*\{\s*(get|set)\b)
where \b matches a word beginning or (in this case) a word end.
This may be what you are looking for and this works perfectly! I deserve some treat though :)...
Regex r=new Regex(#"(public|private).*?(?=(public|private|$))",RegexOptions.Singleline);
Regex nr=new Regex(#"\(.*?\)\s+\{",RegexOptions.Singleline);
foreach(Match m in r.Matches(yourCodeFile))//extracts all methods and properties
{
if(!nr.IsMatch(m.Value))//shoots down methods
m.Value;//properties only
}
According to this answer, try using:
for Properties: type and name:
(?:public\s|private\s|protected\s|internal\s)\s*(?:readonly|static\s+)?(?<type>\w+)\s+(?<name>\w+)[\s\r\n]*{
for Fields: type and name:
(?:public\s|private\s|protected\s)\s*(?:readonly|static\s+)?(?<type>\w+)\s+(?<name>\w+);
for Methods: methodName and parameterType and parameter:
(?:public\s|private\s|protected\s|internal\s)?[\s\w]*\s+(?<methodName>\w+)\s*\(\s*(?:(ref\s|/in\s|out\s)?\s*(?<parameterType>\w+)\s+(?<parameter>\w+)\s*,?\s*)+\)
for c# code analysis try Irony or The Roslyn Project, see this sample:
C# and VB.NET Code Searcher - Using Roslyn codeproject

Categories