Split data with regex into class of properties with C# and Regex

Split data with regex into class of properties with C# and Regex - c#

So I have data like this:
((4886.03 12494.89 "LYR3_SIG2"))
It is always going to be SPACE delimited thus I want to use Regex to place each into a property.
Yes, I was playing around with some regex
string q = "4886.03 12494.89 \"LYR3_SIG2";
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);
but what I aim to do is to put each of the 3 values into a class like this
public class BowTies
{
public double XCoordinate { get; set; }
public double YCoordinate { get; set; }
public string Layer { get; set; }
}
Now I originally was parsing the data into a property
t = streamReader.ReadLine();
if ((t != null) && Regex.IsMatch(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))"))
currentVioType.Bowtie = new ParseType() { Formatted = Regex.Match(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))").Value.Trim('(', ')'), Original = t };
But now I really want to put that data into the doubles and string
thus this data is space delimited ((4886.03 12494.89 "LYR3_SIG2"))
I was started down my path of refactoring , but I temporarily was not using the regex for getting the doubles ( which are ALWAYS going to be the first 2 values, followed by a string so I started doing this:
currentAddPla.Bows.Add(new BowTies() { XCoordinate = 44.33, YCoordinate = 344.33, Layer = Regex.Match(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))").Value.Trim('(', ')')});
but I obviously need to use regex and parse this dumping the first value (the double into XCoordinate, then the 2nd value into YCoordinate and the 3rd value that regex is already getting ALL the data and needs to only get the 3rd value of "LYR3_SIG2" which should be found with regex right?

It is always going to be SPACE delimited thus
RegEx for this sounds like overkill. Have you considered using string.Split(' ');, eg:
string s = "((4886.03 12494.89 \"LYR3_SIG2\"))";
s = s.Replace("(", string.Empty).Replace(")", string.Empty);
string[] arr = s.Split(' ');
currentAddPla.Bows.Add(new BowTies() {
XCoordinate = Convert.ToDouble(arr[0]),
YCoordinate = Convert.ToDouble(arr[1]),
Layer = arr[3]});

You should just use String.Split instead of RegEx. The data is formatted simply enough that RegEx would be overkill even if it worked well here. On top of that the language which defines your data is not regular ( http://en.wikipedia.org/wiki/Regular_language ) and thus cannot be reliably parsed with RegEx. It may be working right now because the data inside the parens is simply formatted but languages which have matching braces are context-free and in general are not able to be parsed with regular expressions.

Related

Split string with plus sign as a delimiter

I have an issue with a string containing the plus sign (+).
I want to split that string (or if there is some other way to solve my problem)
string ColumnPlusLevel = "+-J10+-J10+-J10+-J10+-J10";
string strpluslevel = "";
strpluslevel = ColumnPlusLevel;
string[] strpluslevel_lines = Regex.Split(strpluslevel, "+");
foreach (string line in strpluslevel_lines)
{
MessageBox.Show(line);
strpluslevel_summa = strpluslevel_summa + line;
}
MessageBox.Show(strpluslevel_summa, "summa sumarum");
The MessageBox is for my testing purpose.
Now... The ColumnPlusLevel string can have very varied entry but it is always a repeated pattern starting with the plus sign.
i.e. "+MJ+MJ+MJ" or "+PPL14.1+PPL14.1+PPL14.1" as examples.
(It comes form Another software and I cant edit the output from that software)
How can I find out what that pattern is that is being repeated?
That in this exampels is the +-J10 or +MJ or +PPL14.1
In my case above I have tested it by using only a MessageBox to show the result but I want the repeated pattering stored in a string later on.
Maybe im doing it wrong by using Split, maybe there is another solution.
Maybe I use Split in the wrong way.
Hope you understand my problem and the result I want.
Thanks for any advice.
/Tomas

How can I find out what that pattern is that is being repeated?
Maybe i didn't understand the requirement fully, but isn't it easy as:
string[] tokens = ColumnPlusLevel.Split(new[]{'+'}, StringSplitOptions.RemoveEmptyEntries);
string first = tokens[0];
bool repeatingPattern = tokens.Skip(1).All(s => s == first);
If repeatingPattern is true you know that the pattern itself is first.
Can you maybe explain how the logic works
The line which contains tokens.Skip(1) is a LINQ query, so you need to add using System.Linq at the top of your code file. Since tokens is a string[] which implements IEnumerable<string> you can use any LINQ (extension-)method. Enumerable.Skip(1) will skip the first because i have already stored that in a variable and i want to know if all others are same. Therefore i use All which returns false as soon as one item doesn't match the condition(so one string is different to the first). If all are same you know that there is a repeating pattern which is already stored in the variable first.

You should use String.Split function :
string pattern = ColumnPlusLevel.Split("+")[0];

...but it is always a repeated pattern starting with the plus sign.
Why do you even need String.Split() here if the pattern always only repeats itself?
string input = #"+MJ+MJ+MJ";
int indexOfSecondPlus = input.IndexOf('+', 1);
string pattern = input.Remove(indexOfSecondPlus, input.Length - indexOfSecondPlus);
//pattern is now "+MJ"
No need of string split, no need to use LinQ

String has a method called Split which let's you split/divide the string based on a given character/character-set:
string givenString = "+-J10+-J10+-J10+-J10+-J10"'
string SplittedString = givenString.Split("+")[0] ///Here + is the character based on which the string would be splitted and 0 is the index number
string result = SplittedString.Replace("-","") //The mothod REPLACE replaces the given string with a targeted string,i added this so that you can get the numbers only from the string

Split spinner data when item selected

I am using spinner.Add("heading"+23) for adding item to spinner, and it shows heading23 in spinner list. How may I get data back in two variables (one for heading and one for int value).

If I understand it correctly. You want to split strings like "heading23" back into "heading" and "23".
If so, you can leverage Regular Expression. In C#, there is a class Regex for that you can use:
string str = "test123";
Regex reg = new Regex(#"\d+");//"\d" means digits "+" means match one or more
Match match=reg.Match(str);
if (match.Success)
{
string numberStr =match.Value;//value="123"
int number = int.Parse(numberStr);//number=123
var title = str.Replace(numberStr, "");//title="test"
}

When using java, spinner uses adapter. You can pass custom objects to that adapter. If I recall correctly, toString method is called for each item.
class MySpinnerDataItem {
String title;
int number;
}

C# Regex, any more efficient way to parse string enclosed by symbol?

I'm not sure if it's okay to ask... But here goes.
I implemented a method that parses a string using regex, each matching are parsed through the delegates with an order ( actually, order is not important-- I think, wait, is it? ... But I wrote it this way, and it's not fully tested ):
Pattern Regex.Replace: #"(?<!\\)\$.+?\$" then String.Replace: #"\$", #"$"; Replace string enclosed by dollar sign. Ignores backslash ones, then erases backslash. Ex: "$global name$" -> "motherofglobalvar", "Money \$9000" -> "Money $9000"
Pattern Regex.Replace #"(?<!\\)%.+?%" then String.Replace #"\%", #"%"; Replace string enclosed by percentage sign. Ignores backslash ones, then erase backslash. Same as previous example: "%local var%" -> "lordoflocalvar", "It's over 9000\%" -> "It's over 9000%"
Pattern Regex.Replace #"(?<!\\)#" then String.Replace #"\#", #"#"; Replace char '#' with whitespace, ' '. But ignore backslash ones, then erase the backslash. Ex: "I#hit#the#ground#too#hard" -> "I hit the ground too hard", "qw\#op" -> "qw#op"
What I've done without much experience (I think):
//parse variable
public static string ParseVariable(string text)
{
return Regex.Replace(Regex.Replace(Regex.Replace(text, #"(?<!\\)\$.+?\$", match =>
{
string trim = match.Value.Trim('$');
string trimUpper = trim.ToUpper();
return variableGlobal.ContainsKey(trim) ? variableGlobal[trim] : match.Value;
}).Replace(#"\$", #"$"), #"(?<!\\)%.+?%", match =>
{
string trim = match.Value.Trim('%');
string trimUpper = trim.ToUpper();
return variableLocal.ContainsKey(trim) ? variableLocal[trim] : match.Value;
}).Replace(#"\%", #"%"), #"(?<!\\)#", " ").Replace(#"\#", #"#");
}
In short, what I used is: Regex.Replace().Replace()
Since I need to parse 3 kinds of symbols, I chained it as following: Regex.Replace(Regex.Replace(Regex.Replace().Replace()).Replace()).Replace()
Is there any more efficient way than this? I mean, like without need to go through the text 6 times? (3 times regex.replace, 3 times string.replace, where each replace modifies the text to be used by the next replace )
Or is it the best way it can do?
Thanks.

Here's a unique take on the problem, I think. You can build a class that will be used to construct the overall pattern piece-by-piece. This class will be responsible for the generating of the MatchEvaluator delegate that will be passed to Replace as well.
class RegexReplacer
{
public string Pattern { get; private set; }
public string Replacement { get; private set; }
public string GroupName { get; private set; }
public RegexReplacer NextReplacer { get; private set; }
public RegexReplacer(string pattern, string replacement, string groupName, RegexReplacer nextReplacer = null)
{
this.Pattern = pattern;
this.Replacement = replacement;
this.GroupName = groupName;
this.NextReplacer = nextReplacer;
}
public string GetAggregatedPattern()
{
string constructedPattern = this.Pattern;
string alternation = (this.NextReplacer == null ? string.Empty : "|" + this.NextReplacer.GetAggregatedPattern()); // If there isn't another replacer, then we won't have an alternation; otherwise, we build an alternation between this pattern and the next replacer's "full" pattern
constructedPattern = string.Format("(?<{0}>{1}){2}", this.GroupName, this.Pattern, alternation); // The (?<XXX>) syntax builds a named capture group. This is used by our GetReplacementDelegate metho.
return constructedPattern;
}
public MatchEvaluator GetReplaceDelegate()
{
return (match) =>
{
if (match.Groups[this.GroupName] != null && match.Groups[this.GroupName].Length > 0) // Did we get a hit on the group name?
{
return this.Replacement;
}
else if (this.NextReplacer != null) // No? Then is there another replacer to inspect?
{
MatchEvaluator next = this.NextReplacer.GetReplaceDelegate();
return next(match);
}
else
{
return match.Value; // No? Then simply return the value
}
};
}
}
It should be obvious as to what Pattern and Replacement represent. GroupName is kind of a hack to let the replacement evaluator know which RegexReplacer fragment resulted in the match. NextReplacer points to another replacer instance that holds a different pattern fragment (et al.).
The idea here is to have a kind of linked list of objects that will represent the overall pattern. You can call GetAggregatedPattern on the outer-most replacer to get the full pattern--each replacer calls the next replacer's GetAggregatedPattern to get that replacer's patter fragment, to which it concatenates its own fragment. The GetReplacementDelegate generates a MatchEvaluator. This MatchEvaluator will compare its own GroupName to the Match's captured groups. If the group name was captured, then we have a hit, and we return this replacer's Replacement value. Otherwise, we step into the next replacer (if there is one) and repeat the group name comparison. If there is no hit on any replacer, then we simply yield back the original value (i.e. what was matched by the pattern; this should be rare).
The usage of such might look like this:
string target = #"$global name$ Money \$9000 %local var% It's over 9000\% I#hit#the#ground#too#hard qw\#op";
RegexReplacer dollarWrapped = new RegexReplacer(#"(?<!\\)\$[^$]+\$", "motherofglobalvar", "dollarWrapped");
RegexReplacer slashDollar = new RegexReplacer(#"\\\$", string.Empty, "slashDollar", dollarWrapped);
RegexReplacer percentWrapped = new RegexReplacer(#"(?<!\\)%[^%]+%", "lordoflocalvar", "percentWrapped", slashDollar);
RegexReplacer slashPercent = new RegexReplacer(#"\\%", string.Empty, "slashPercent", percentWrapped);
RegexReplacer singleAt = new RegexReplacer(#"(?<!\\)#", " ", "singleAt", slashPercent);
RegexReplacer slashAt = new RegexReplacer(#"\\#", "#", "slashAt", singleAt);
RegexReplacer replacer = slashAt;
string pattern = replacer.GetAggregatedPattern();
MatchEvaluator evaluator = replacer.GetReplaceDelegate();
string result = Regex.Replace(target, pattern, evaluator);
Because you want each replacer to know if it got a hit, and because we are hacking this by using group names, you want to make sure that each group name is distinct. A simple way to ensure this would be to use a name that's identical to the variable name since you can't have two variables with the same name within the same scope.
You can see above that I am building each part of the pattern separately, but as I build, I pass the previous replacer as a 4th parameter to the current replacer. This builds the chain of replacers. Once built, I use the last replacer constructed in order to generate the overall pattern and evaluator. If you use anything but, then you will only have part of the overall pattern. Finally, it's simply a matter of passing the generated pattern and evaluator to the Replace method.
Keep in mind that this approach was targeted more at the problem as described. It may work in more general scenarios, but I've only worked with what you've presented. Also, since this is more of a parsing question, a parser may be the proper route to take--although the learning curve is going to be higher.
Also keep in mind that I haven't profiled this code. It certainly doesn't loop over the target string multiple times, but it does involve additional method calls during replacement. You would certainly want to test it in your environment.

Replace part of a string with new value

I've got a scenario, wherein i need to replace the string literal with new text.
For example, if my string is "01HW128120", i will first check if the text contains "01HW" If yes, then replace that with the string "MachineID-".
So eventually i wanted "01HW128120" to be "MachineID-128120". Sometimes i do get the string as "1001HW128120" - In this case also i require to replace the "1001HW" with "MachineID-"
I tried the below code snippet, but this does not work to my expectation.
string sampleText = "01HW128120";
if(sampleText.Contains("01HW"))
sampleText = sampleText.Replace("01HW","MachineID-");
Any suggestion would be of great help to me.

Few Possible Search Values
If there are only a few possible combinations, you can simply do multiple tests:
string value = "01HW128120";
string replacement = "MachineID-";
if( value.Contains( "01HW" ) ) {
value = value.Replace( "01HW", replacement );
}
else if( value.Contains( "1001HW" ) ) {
value = value.Replace( "1001HW", replacement );
}
Assert.AreEqual( "MachineID-128120", value );
Many Possible Search Values
Of course, this approach quickly becomes unwieldy if you have a large quantity of possibilities. Another approach is to keep all of the search strings in a list.
string value = "01HW128120";
string replacement = "MachineID-";
var tokens = new List<string> {
"01HW",
"1001HW"
// n number of potential search strings here
};
foreach( string token in tokens ) {
if( value.Contains( token ) ) {
value = value.Replace( token, replacement );
break;
}
}
"Smarter" Matching
A regular expression is well-suited for string replacement if you have a manageable number of search strings but you perhaps need not-exact matches, case-insensitivity, lookaround, or capturing of values to insert into the replaced string.
An extremely simple regex which meets your stated requirements: 1001HW|01HW.
Demo: http://regexr.com?34djm
A slightly smarter regex: ^\d{2,4}HW
Assert position at start of string
Match 2-4 digits
Match the value "HW" literally
See also: Regex.Replace Method

If you just want to replace everything up to "01HW" with "MachineID-", you could use a generic regex:
sampleText = Regex.Replace(sampleText, "^.*01HW", "MachineID-");

c# How to process the string?

I connect to a webservice that gives me a response something like this(This is not the whole string, but you get the idea):
sResponse = "{\"Name\":\" Bod\u00f8\",\"homePage\":\"http:\/\/www.example.com\"}";
As you can see, the "Bod\u00f8" is not as it should be.
Therefor i tried to convert the unicode (\u00f8) to char by doing this with the string:
public string unicodeToChar(string sString)
{
StringBuilder sb = new StringBuilder();
foreach (char chars in sString)
{
if (chars >= 32 && chars <= 255)
{
sb.Append(chars);
}
else
{
// Replacement character
sb.Append((char)chars);
}
}
sString = sb.ToString();
return sString;
}
But it won't work, probably because the string is shown as \u00f8, and not \u00f8.
Now it would not be a problem if \u00f8 was the only unicode i had to convert, but i got many more of the unicodes.
That means that i can't just use the replace function :(
Hope someone can help.

You're basically talking about converting from JSON (JavaScript Object Notation). Try this link--near the bottom you'll see a list of publicly available libraries, including some in C#, that might do what you need.

The excellent Json.NET library has no problems decoding unicode escape sequences:
var sResponse = "{\"Name\":\"Bod\u00f8\",\"homePage\":\"http://www.ex.com\"}";
var obj = (JObject)JsonConvert.DeserializeObject(sResponse);
var name = ((JValue)obj["Name"]).Value;
var homePage = ((JValue)obj["homePage"]).Value;
Debug.Assert(Equals(name, "Bodø"));
Debug.Assert(Equals(homePage, "http://www.ex.com"));
This also allows you to deserialize to real POCO objects, making the code even cleaner (although less dynamic).
var obj = JsonConvert.DeserializeObject<Response>(sResponse);
Debug.Assert(obj2.Name == "Bodø");
Debug.Assert(obj2.HomePage == "http://www.ex.com");
public class Response
{
public string Name { get; set; }
public string HomePage { get; set; }
}

Perhaps you want to try:
string character = Encoding.UTF8.GetString(chars);
sb.Append(character);

I know this question is getting quite old, but I crashed into this problem as of today, while trying to access the Facebook Graph API. I was getting these strange \u00f8 and other variations back.
First I tried a simple replace as the OP also said (with the help from an online table). But I thought "no way!" after adding 2 replaces.
So after looking a little more at the "codes" it suddenly hit me...
The "\u" is a prefix, and the 4 characters after that is a hexadecimal encoded char code! So writing a simple regex to find all \u with 4 alphanumerical characters after, and afterwards converting the last 4 characters to integer and then to a character made the deal.
My source is in VB.NET
Private Function DecodeJsonString(ByVal Input As String) As String
For Each m As System.Text.RegularExpressions.Match In New System.Text.RegularExpressions.Regex("\\u(\w{4})").Matches(Input)
Input = Input.Replace(m.Value, Chr(CInt("&H" & m.Value.Substring(2))))
Next
Return Input
End Function
I also have a C# version here
private string DecodeJsonString(string Input)
{
foreach (System.Text.RegularExpressions.Match m in new System.Text.RegularExpressions.Regex(#"\\u(\w{4})").Matches(Input))
{
Input = Input.Replace(m.Value, ((char)(System.Int32.Parse(m.Value.Substring(2), System.Globalization.NumberStyles.AllowHexSpecifier))).ToString());
}
return Input;
}
I hope it can help someone out... I hate to add libraries when I really only need a few functions from them!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Split data with regex into class of properties with C# and Regex - c#

Related

Split string with plus sign as a delimiter

Split spinner data when item selected

C# Regex, any more efficient way to parse string enclosed by symbol?

Replace part of a string with new value

c# How to process the string?

Categories

Resources