Extract some values in formatted string - c#

I would like to retrieve values in string formatted like this :
public var any:int = 0;
public var anyId:Number = 2;
public var theEnd:Vector.<uint>;
public var test:Boolean = false;
public var others1:Vector.<int>;
public var firstValue:CustomType;
public var field2:Boolean = false;
public var secondValue:String = "";
public var isWorks:Boolean = false;
I want to store field name, type and value in a custom class Property :
public class Property
{
public string Name { get; set; }
public string Type { get; set; }
public string Value { get; set; }
}
And with a Regex expression get these values.
How can I do ?
Thanks
EDIT : I tried this but I don't know how to go further with vectors..etc
/public var ([a-zA-Z0-9]*):([a-zA-Z0-9]*)( = \"?([a-zA-Z0-9]*)\"?)?;/g

Ok, posting my regex-based answer.
Your regex - /public var ([a-zA-Z0-9]*):([a-zA-Z0-9]*)( = \"?([a-zA-Z0-9]*)\"?)?;/g - contains regex delimiters, and they are not supported in C#, and thus are treated as literal symbols. You need to remove them and the modifier g since to obtain multiple matches in C# Regex.Matches, or Regex.Match with while and Match.Success/.NextMatch() can be used.
The regex I am using is (?<=\s*var\s*)(?<name>[^=:\n]+):(?<type>[^;=\n]+)(?:=(?<value>[^;\n]+))?. The newline symbols are included as negated character classes can match a newline character.
var str = "public var any:int = 0;\r\npublic var anyId:Number = 2;\r\npublic var theEnd:Vector.<uint>;\r\npublic var test:Boolean = false;\r\npublic var others1:Vector.<int>;\r\npublic var firstValue:CustomType;\r\npublic var field2:Boolean = false;\r\npublic var secondValue:String = \"\";\r\npublic var isWorks:Boolean = false;";
var rx = new Regex(#"(?<=\s*var\s*)(?<name>[^=:\n]+):(?<type>[^;=\n]+)(?:=(?<value>[^;\n]+))?");
var coll = rx.Matches(str);
var props = new List<Property>();
foreach (Match m in coll)
props.Add(new Property(m.Groups["name"].Value,m.Groups["type"].Value, m.Groups["value"].Value));
foreach (var item in props)
Console.WriteLine("Name = " + item.Name + ", Type = " + item.Type + ", Value = " + item.Value);
Or with LINQ:
var props = rx.Matches(str)
.OfType<Match>()
.Select(m =>
new Property(m.Groups["name"].Value,
m.Groups["type"].Value,
m.Groups["value"].Value))
.ToList();
And the class example:
public class Property
{
public string Name { get; set; }
public string Type { get; set; }
public string Value { get; set; }
public Property()
{}
public Property(string n, string t, string v)
{
this.Name = n;
this.Type = t;
this.Value = v;
}
}
NOTE ON PERFORMANCE:
The regex is not the quickest, but it certainly beats the one in the other answer. Here is a test performed at regexhero.net:

It seems, that you don't want regular expressions; in a simple case
as you've provided:
String text =
#"public var any:int = 0;
public var anyId:Number = 2;
public var theEnd:Vector.<uint>;
public var test:Boolean = false;
public var others1:Vector.<int>;
public var firstValue:CustomType;
public var field2:Boolean = false;";
List<Property> result = text
.Split(new Char[] {'\r','\n'}, StringSplitOptions.RemoveEmptyEntries)
.Select(line => {
int varIndex = line.IndexOf("var") + "var".Length;
int columnIndex = line.IndexOf(":") + ":".Length;
int equalsIndex = line.IndexOf("="); // + "=".Length;
// '=' can be absent
equalsIndex = equalsIndex < 0 ? line.Length : equalsIndex + "=".Length;
return new Property() {
Name = line.Substring(varIndex, columnIndex - varIndex - 1).Trim(),
Type = line.Substring(columnIndex, columnIndex - varIndex - 1).Trim(),
Value = line.Substring(equalsIndex).Trim(' ', ';')
};
})
.ToList();
if text can contain comments and other staff, e.g.
"public (*var is commented out*) var sample: int = 123;;;; // another comment"
you have to implement a parser

You can use the following pattern:
\s*(?<vis>\w+?)\s+var\s+(?<name>\w+?)\s*:\s*(?<type>\S+?)(\s*=\s*(?<value>\S+?))?\s*;
to match each element in a line. Appending ? after a quantifier results in a non-greedy match which makes the pattern a lot simpler - no need to negate all unwanted classes.
Values are optional, so the value group is wrapped in another, optional group (\s*=\s*(?<value>\S+?))?
Using the RegexOptions.Multiline option means we don't have to worry about accidentally matching newlines.
The C# 6 syntax in the following example isn't required, but multiline string literals and interpolated strings make for much cleaner code.
var input= #"public var any:int = 0;
public var anyId:Number = 2;
public var theEnd:Vector.<uint>;
public var test:Boolean = false;
public var others1:Vector.<int>;
public var firstValue:CustomType;
public var field2:Boolean = false;
public var secondValue:String = """";
public var isWorks:Boolean = false;";
var pattern= #"\s*(?<vis>\w+?)\s+var\s+(?<name>\w+?)\s*:\s*(?<type>\S+?)(\s*=\s*(?<value>\S+?))?\s*;"
var regex = new Regex(pattern, RegexOptions.Multiline);
var results=regex.Matches(input);
foreach (Match m in results)
{
var g = m.Groups;
Console.WriteLine($"{g["name"],-15} {g["type"],-10} {g["value"],-10}");
}
var properties = (from m in results.OfType<Match>()
let g = m.Groups
select new Property
{
Name = g["name"].Value,
Type = g.["type"].Value,
Value = g["value"].Value
})
.ToList();
I would consider using a parser generator like ANTLR though, if I had to parse more complex input or if there are multiple patterns to match. Learning how to write the grammar takes some time, but once you learn it, it's easy to create parsers that can match input that would require very complicated regular expressions. Whitespace management also becomes a lot easier
In this case, the grammar could be something like:
property : visibility var name COLON type (EQUALS value)? SEMICOLON;
visibility : ALPHA+;
var : ALPHA ALPHA ALPHA;
name : ALPHANUM+;
type : (ALPHANUM|DOT|LEFT|RIGHT);
value : ALPHANUM
| literal;
literal : DOUBLE_QUOTE ALPHANUM* DOUBLE_QUOTE;
ALPHANUM : ALPHA
| DIGIT;
ALPHA : [A-Z][a-z];
DIGIT : [0-9];
...
WS : [\r\n\s] -> skip;
With a parser, adding eg comments would be as simple as adding comment before SEMICOLON in the property rule and a new comment rule that would match the pattern of a comment

Related

How to parse this single line Console input most efficiently

I am trying to get the input from the user in a single Line with with [, ,] separators. Like this:
[Q,W,1] [R,T,3] [Y,U,9]
And then I will use these inputs in a function like this:
f.MyFunction('Q','W',1); // Third parameter will be taken as integer
f.MyFunction('R','T',3);
f.MyFunction('Y','U',9);
I thought I could do sth like:
string input = Console.ReadLine();
string input1 = input.Split(' ')[0];
char input2 = input.Trim(',') [0];
But it seems to repeat a lot.
What would be the most logical way to do this?
Sometimes a regular expression really is the best tool for the job. Use a pattern that matches the input pattern and use Regex.Matches to extract all the possible inputs:
var funcArgRE = new Regex(#"\[(.),(.),(\d+)\]", RegexOptions.Compiled);
foreach (Match match in funcArgRE.Matches(input)) {
var g = match.Groups;
f.MyFunction(g[1].Value[0], g[2].Value[0], Int32.Parse(g[3].Value));
}
Well, you could use LinQ to objects functions and do something like this:
var inputs = input.Split(' ')
.Select(x =>
x.Replace("[", "")
.Replace("]", ""))
.Select(x => new UserInput(x))
.ToList();
foreach(var userInput in inputs)
{
f.MyFunction(userInput.A, userInput.B, userInput.Number);
}
// Somewhere else
public record UserInput
{
public UserInput(string input)
{
//Do some kind of validation here and throw exception accordingly
var parts = input.Split(',');
A = parts[0][0];
B = parts[1][0];
Number = Convert.ToInt32(parts[2]);
}
public char A { get; init; }
public char B { get; init; }
public int Number { get; init; }
};
Or you could go further and implement "operator overloading" for the UserInput record and make it possible to implicitly convert from string to UserInput

How to Split and Sum Members of a String Value

I have a database column that is a text field, and this text field contains values that look like
I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109
and can vary sometimes to look like:
I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300
Where 'I' represents the invoice Id, 'A' the invoice amount, 'D' the date in YYYYMMDD format and 'F' the original foreign currency value if the invoice was from a foreign supplier.
I am fetching that column and binding it to a datagrid which has a button labelled "Show Amount". On button click, it fetches the selected row and splits the string to extract "A"
I need to fetch all the sections with A= within the column result... i.e
A=97920
A=77360
A=43975
Then sum them all together and display the result on a label.
I have tried splitting using '|' first, extracting the substring 'A=' then splitting it using ';' to get the amount after "=".
string cAlloc;
string[] amount;
string InvoiceTotal;
string SupplierAmount;
string BalanceUnpaid;
DataRowView dv = invoicesDataGrid.SelectedItem as DataRowView;
if (dv != null)
{
cAlloc = dv.Row.ItemArray[7].ToString();
InvoiceTotal = dv.Row.ItemArray[6].ToString();
if (invoicesDataGrid.Columns[3].ToString() == "0")
{
lblAmount.Foreground = Brushes.Red;
lblAmount.Content = "No Amount Has Been Paid Out to the Supplier";
}
else
{
amount = cAlloc.Split('|');
foreach (string i in amount)
{
string toBeSearched = "A=";
string code = i.Substring(i.IndexOf(toBeSearched) + toBeSearched.Length);
string[] res = code.Split(';');
SupplierAmount = res[0];
float InvTotIncl = float.Parse(InvoiceTotal, CultureInfo.InvariantCulture.NumberFormat);
float AmountPaid = float.Parse(SupplierAmount, CultureInfo.InvariantCulture.NumberFormat);
float BalUnpaid = InvTotIncl - AmountPaid;
BalanceUnpaid = Convert.ToString(BalUnpaid);
if (BalUnpaid == 0)
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " No Balance Remaining, Supplier Invoice Paid in Full";
}
else if (BalUnpaid < 0)
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " Supplier Paid an Excess of " + BalanceUnpaid;
}
else
{
lblAmount.Content = "Amount Paid = " + SupplierAmount + " You Still Owe the Supplier a Total of " + BalanceUnpaid; ;
}
}
}
But I am only able to extract A=43975, the very last "A=". Instead of all three, plus I have not figured out how to sum the strings. Somebody help... please.
Regex is prefered solution. Alternatively split, split and split.
var cAlloc = "I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300";
var amount = cAlloc.Split('|');
decimal sum = 0;
foreach (string i in amount)
{
foreach (var t in i.Split(';'))
{
var p = t.Split('=');
if (p[0] == "A")
{
var s = decimal.Parse(p[1], CultureInfo.InvariantCulture);
sum += s;
break;
}
}
}
var in1 = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var in2 = "I=29;A=20009.34;D=20190712;F=300|I=29;A=2259.34;D=20190714;F=300";
var reg = #"A=(\d+(\.\d+)?)";
Regex.Matches(in1, reg).OfType<Match>().Sum(m => double.Parse(m.Groups[1].Value));
Regex.Matches(in2, reg).OfType<Match>().Sum(m => double.Parse(m.Groups[1].Value));
You're doing too much work for something like this. Here's a simpler solution using Regex.
If the invoice amount is always located as a second value in the set you can access it directly by index after split:
var str = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var invoices = str.Trim().Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
var totalSum = 0M;
foreach (var invoice in invoices)
{
var invoiceParts = invoice.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var invoiceAmount = decimal.Parse(invoiceParts[1].Trim().Substring(2));
totalSum += invoiceAmount;
}
Otherwise, you can use a little more "flexible" solution like this:
var str = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
var invoices = str.Trim().Split(new[] { '|' }, StringSplitOptions.RemoveEmptyEntries);
var totalSum = 0M;
foreach (var invoice in invoices)
{
var invoiceParts = invoice.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries);
var invoiceAmount = decimal.Parse(invoiceParts.First(ip => ip.Trim().ToLower().StartsWith("a=")).Substring(2));
totalSum += invoiceAmount;
}
Import the input: "Deserialisation"
With the following given input, we have a list of object with property name I,A, and D.
var input = "I=5212;A=97920;D=20181121|I=5176;A=77360;D=20181117|I=5087;A=43975;D=20181109";
Give this simple class:
public class inputClass
{
public decimal I { get; set; }
public decimal A { get; set; }
public decimal D { get; set; }
}
Parsing it will look like:
var inputItems =
input.Split('|')
.Select(
x =>
x.Split(';')
.ToDictionary(
y => y.Split('=')[0],
y => y.Split('=')[1]
)
)
.Select(
x => //Manual parsing from dictionary to inputClass.
//If dictionary Key match an object property we could use something more generik.
new inputClass
{
I = decimal.Parse(x["I"], CultureInfo.InvariantCulture.NumberFormat),
A = decimal.Parse(x["A"], CultureInfo.InvariantCulture.NumberFormat),
D = decimal.Parse(x["D"], CultureInfo.InvariantCulture.NumberFormat),
}
)
.ToList();
It look complexe? lets give the inputClass the responsability to initialise it self based on string
PropertyName=Value[; PropertyName=Value] :
public inputClass(string input, NumberFormatInfo numberFormat)
{
var dict = input
.Split(';')
.ToDictionary(
y => y.Split('=')[0],
y => y.Split('=')[1]
);
I = decimal.Parse(dict["I"], numberFormat);
A = decimal.Parse(dict["A"], numberFormat);
D = decimal.Parse(dict["D"], numberFormat);
}
Then the parsing is simple:
var inputItems = input.Split('|').Select(x => new inputClass(x, CultureInfo.InvariantCulture.NumberFormat));
Once we have a more useable Structure a List of object We can easly compute Sum, Avg, Max, Min:
var sumA = inputItems.Sum(x => x.A);
Producing the output: "Serialisation"
In order to process the input we will define an object like similar to the Input
public class outputClass
{
public decimal I { get; set; }
public decimal A { get; set; }
public decimal D { get; set; }
public decimal F { get; set; }
The Class should be able to produce the String PropertyName=Value[; PropertyName=Value], :
public override string ToString()
{
return $"I={I};A={A};D={D};F={F}";
}
Then producing and string "serialisation" after computing the ListOutput based on the List input:
//process The input into the output.
var outputItems = new List<outputClass>();
foreach (var item in inputItems)
{
// compute things to be able to create the nex output item
item.A++;
outputItems.Add(
new outputClass { A = item.A, D = item.D, I = item.I, F = 42 }
);
}
// "Serialisation"
var outputString = String.Join("|", outputItems);
Online Demo. https://dotnetfiddle.net/VcEQmf
Long story short:
Define a class with the property you will use/display.
Add a constructor that take a string like "I=5212;A=97920;D=20181121"
nb: the String may contain property that will not be map to the object
Override the ToString(), so It can easly produce it's serialisation.
nb: Property and value that are not stored in the object will not be in the serialisation result.
Now You simply have to split on your line/object separator "|" and you are ready to go using real object, not having to care about that weird string anymore.
PS:
There was a little missunderstand about your 2 type of inputs, I mentally saw them as input, output. Dont mind those name. It can be the same class. It doens't change anything in this answer.

Split a string into three seperate parts

I have a URL string coming into an API e.g. c1:1=25.
*http://mysite/api/controllername?serial=123&c1:=25*
I want to split it into the channel name (c1), the channel reading number (1) after the colon and the value (25).
There are also occasions, where there is no colon as it is a fixed value such as a serial number (serial=123).
I have created a class:
public class UriDataModel
{
public string ChannelName { get; set; }
public string ChannelNumber { get; set; }
public string ChannelValue { get; set; }
}
I am trying to use an IEnumerable with some LINQ and not getting very far.
var querystring = HttpContext.Current.Request.Url.Query;
querystring = querystring.Substring(1);
var urldata = new UrlDataList
{
UrlData = querystring.Split('&').ToList()
};
IEnumerable<UriDataModel> uriData =
from x in urldata.UrlData
let channelname = x.Split(':')
from y in urldata.UrlData
let channelreading = y.Split(':', '=')
from z in urldata.UrlData
let channelvalue = z.Split('=')
select new UriDataModel()
{
ChannelName = channelname[0],
ChannelNumber = channelreading[1],
ChannelValue = channelvalue[2]
};
List<UriDataModel> udm = uriData.ToList();
I feel as if I am over complicating things here.
In summary, I want to split the string into three parts and where there is no colon split it into two.
Any pointers will be great. TIA
You can use regex. I think you switched the channel number and the colon in your example, so my code reflects this assumption.
public static (string channelName, string channelNumber, string channelValue) ParseUrlData(string urlData)
{
var regex = new Regex(#"serial=(\d+)(&c(:\d+)?=(\d+))?");
var matches = regex.Match(urlData);
string name = null;
string number = null;
string value = null;
if (matches.Success)
{
name = matches.Groups[1].Value;
if (matches.Groups.Count == 5) number = matches.Groups[3].Value.TrimStart(':');
if (matches.Groups.Count >= 4) value = matches.Groups[matches.Groups.Count - 1].Value;
}
Console.WriteLine($"[{name}] [{number}] [{value}]");
return (name, number, value);
}
Then you can call it like this
(var channelName, var channelNumber, var channelValue) = ParseUrlData("serial=123&c:1=25");
(var channelName, var channelNumber, var channelValue) = ParseUrlData("serial=123&c=25");
(var channelName, var channelNumber, var channelValue) = ParseUrlData("serial=123");
and it'll return (and print)
[123] [1] [25]
[123] [] [25]
[123] [] []

How to get capture groups in a regular expression match used for replace

I have the following program that demonstrates replacing matches found in a regular expression search:
using System;
public class Test {
public static void Main() {
var regexSearch = #"\{(\w+)\}";
var format = "{Level}:{Name}:{Message}";
var regex = new System.Text.RegularExpressions.Regex(regexSearch);
var result = regex.Replace(format, Test.Replace);
Console.WriteLine($"result = {result}");
}
public static string Replace(System.Text.RegularExpressions.Match match) {
Console.WriteLine($"match = {match}");
return "<replacement>";
}
}
This prints the following to standard out:
match = {Level}
match = {Name}
match = {Message}
result = <replacement>:<replacement>:<replacement>
How would I get the code to print the following instead if only the Replace method may be changed?
match = Level
match = Name
match = Message
result = Level:Name:Message
I am aware of Match.Groups and Match.Captures but keep finding strings that include the curly braces.
The following example is an even better illustration of my true goal:
using System;
public class Test {
public static void Main() {
var regexSearch = #"\{(\w+)\}";
var format = "{Level}:{Name}:{Message}";
var regex = new System.Text.RegularExpressions.Regex(regexSearch);
var record = new Information(Importance.Normal, "John Doe", "Hello, world!");
var result = regex.Replace(format, x => Test.Replace(x, record));
Console.WriteLine($"result = {result}");
}
public static string Replace(System.Text.RegularExpressions.Match match, Information record) {
Console.WriteLine($"match = {match}");
var name = "Level";
var property = record.GetType().GetProperty(name);
if (property == null) {
throw new InvalidOperationException($"{name} is not available");
}
var value = property.GetValue(record);
if (value is DateTime) {
return ((DateTime)value).ToString("yyyy-MM-ddTHH:mm:ss");
}
return value.ToString();
}
}
public class Information {
public Importance Level { get; }
public string Name { get; }
public string Message { get; }
public DateTime Created { get; }
public Information(Importance level, string name, string message) {
this.Level = level;
this.Name = name;
this.Message = message;
this.Created = DateTime.Now;
}
}
public enum Importance {
Low,
Normal,
Hight
}
The program works almost exactly as expected but writes this to standard output:
match = {Level}
match = {Name}
match = {Message}
result = Normal:Normal:Normal
Line 15 of the program says var name = "Level"; and needs to get the name in the capture group of the match. The output should say this instead:
match = {Level}
match = {Name}
match = {Message}
result = Normal:John Doe:Hello, world!
Does anyone know how to get the contents of the regular expression capture group so line 15 can be replaced with the result?
Your problem is that the following code within your Replace method, it is only ever looking for name -> "Level".
If you go into debug mode, put a breakpoint in your Replace method and f11 through it... You'll see that every 'cycle' of the method, the property will always be "Level"
You could do several things to solve this... such as including a counter that increases by one every time the method is called... followed by a switch statement to determine what name should be equal to... among other things.
The problem that you are having is that your expectation of Groups is incorrect. It does not only contains each capture group but also the entire match. Its length should be one more than expected, and the first group should be at index one.
In your 1st example, change line 14 so that it reads:
return match.Groups[1].Value;
In your 2nd example, change line 15 so that it reads:
var name = match.Groups[1].Value;

order by culture is not working as expected

Why "Ū" comes first instead "U"?
CultureInfo ci = CultureInfo.GetCultureInfo("lt-LT");
bool ignoreCase = true; //whether comparison should be case-sensitive
StringComparer comp = StringComparer.Create(ci, ignoreCase);
string[] unordered = { "Za", "Žb", "Ūa", "Ub" };
var ordered = unordered.OrderBy(s => s, comp);
Results of ordered :
Ūa
Ub
Za
Žb
Should be: Ub Ūa Za Žb
Here is lithuanian letters in order. https://www.assorti.lt/userfiles/uploader/no/norvegiska-lietuviska-delione-abecele-maxi-3-7-m-vaikams-larsen.jpg
I just made what could be a (limited) solution to your problem.
This is not optimized, but it can give an idea of how to solve it.
I create a LithuanianString class which is only used to encapsulate your string.
This class implement IComparable in order to be able to sort a list of LithuanianString.
Here is what could be the class:
public class LithuanianString : IComparable<LithuanianString>
{
const string UpperAlphabet = "AĄBCČDEĘĖFGHIĮYJKLMNOPRSŠTUŲŪVZŽ";
const string LowerAlphabet = "aąbcčdeęėfghiįyjklmnoprsštuųūvzž";
public string String;
public LithuanianString(string inputString)
{
this.String = inputString;
}
public int CompareTo(LithuanianString other)
{
var maxIndex = this.String.Length <= other.String.Length ? this.String.Length : other.String.Length;
for (var i = 0; i < maxIndex; i++)
{
//We make the method non case sensitive
var indexOfThis = LowerAlphabet.Contains(this.String[i])
? LowerAlphabet.IndexOf(this.String[i])
: UpperAlphabet.IndexOf(this.String[i]);
var indexOfOther = LowerAlphabet.Contains(other.String[i])
? LowerAlphabet.IndexOf(other.String[i])
: UpperAlphabet.IndexOf(other.String[i]);
if (indexOfOther != indexOfThis)
return indexOfThis - indexOfOther;
}
return this.String.Length - other.String.Length;
}
}
And here is a sample I made to try it :
static void Main(string[] args)
{
string[] unordered = { "Za", "Žb", "Ūa", "Ub" };
//Create a list of lithuanian string from your array
var lithuanianStringList = (from unorderedString in unordered
select new LithuanianString(unorderedString)).ToList();
//Sort it
lithuanianStringList.Sort();
//Display it
Console.WriteLine(Environment.NewLine + "My Comparison");
lithuanianStringList.ForEach(c => Console.WriteLine(c.String));
}
The output is the expected one :
Ub Ūa Za Žb
This class allows to create custom alphabets just by replacing characters in the two constants defined at the beginning.

Categories