C# Regex match case - split string and write to file output - c#

Basically I have a text file of records in this format:
(1909, 'Ford', 'Model T'),
(1926, 'Chrysler', 'Imperial'),
(1948, 'Citroën', '2CV'),
That I want to output to a text file in the following format
new Vehicle() { Id = 1, Year = 1909, Make = "Ford", Model = "Model T" },
new Vehicle() { Id = 2, Year = 1926, Make = "Chrysler", Model = "Imperial" },
new Vehicle() { Id = 3, Year = 1948, Make = "Citroën", Model = "2CV" },
I know I need to split each line in to the relevant text sections, e.g. trying to follow something like this SO question. But have hit mental block on how to get the relevant matching string sections for Year, Make and Model.
So far I have found this, that finds everthing between the parentheses:
\(([^()]+)\)
But not sure how to then group the the values and split by the commas:
Any help greatly appreciated.

Regex to get them in groups:
\((\d+),\s+[']([\w\së]+)['],\s+[']([\w\s]+)[']\)[,]*
Make note there is problem about Citroën => You have to enter all the special symbols not within a-z, A-Z (like ë ü ÿ etc..)
To use in code, You will get the groups 1st:
string cars = #"(1909, 'Ford', 'Model T'),"
string pattern = #"\((\d+),\s+[']([\w\së]+)['],\s+[']([\w\s]+)[']\)[,]*";
var lResult = Regex.Match(cars, pattern);
if(lResult.Success)
foreach( var iGroup in lResult.Groups)
Console.WriteLine(iGroup);
In lResult.Groups You got the info about car, You have just output it to the file as You need.
C# 6.0:
Console.WriteLine($"new Vehicle() {{ Id = 1, Year = {lResults.Groups[1]}, Make = \"{lResults.Groups[2]}\", Model = \"{lResults.Groups[3]}\"}},");
Old syntax:
Console.WriteLine(#"new Vehicle() { Id = 1, Year = "+ lMatch.Groups[1]+", Make = "+ lMatch.Groups[2] + ", Model = "+ lMatch.Groups[3] + " },");
Once You get this automatized into for loops, You can add Id easily.
My example have in Groups[0] whole string, so this is why my indexing starting from 1 to 3.
As #Toto said, \w already includes \d, there is no need to write it then.

Why not use string.Split(',')? Would be faster than Regex and suits for you (first delete the last ',' of each line, of course.

if you are willing to use a parser framework (which is maybe a little bit of an overkill), you could use for example sprache. Example without proper error handling:
Parser<string> stringContent =
from open in Parse.Char('\'').Once()
from content in Parse.CharExcept('\'').Many().Text()
from close in Parse.Char('\'').Once()
select content;
Parser<string> numberContent = Parse.Digit.AtLeastOnce().Text();
Parser<string> element = stringContent.XOr(numberContent);
Parser<List<string>> elements =
from e in element.DelimitedBy(Parse.Char(',').Token())
select e.ToList();
Parser<List<string>> parser =
from open in Parse.Char('(').Once()
from content in elements
from close in Parse.Char(')').Once()
select content;
var input = new List<string> { "(1909, 'Ford', 'Model T')", "(1926, 'Chrysler', 'Imperial')", "(1948, 'Citroën', '2CV')" };
foreach (var line in input)
{
var parsed = parser.Parse(line);
var year = Int32.Parse(parsed[0]);
var make = parsed[1];
var model = parsed[2];
Console.WriteLine(">> " + year + " " + make + " " + model);
}

You can use this snippet based on named capture groups:
var cars = new List<string>() {
"(1909, 'Ford', 'Model T')",
"(1926, 'Chrysler', 'Imperial')",
"(1948, 'Citroën', '2CV')",
};
var regex = #"(?<Year>\d+).*?'(?<Brand>.*?)'.*?'(?<Model>.*?)'";
foreach (var car in cars)
{
var match = Regex.Match(car, regex);
if (match.Success)
{
Console.WriteLine($"{match.Groups["Brand"]} make {match.Groups["Model"]} in {match.Groups["Year"]}");
}
}
Which will print:
Ford make Model T in 1909
Chrysler make Imperial in 1926
Citroën make 2CV in 1948

Related

Check if a string starts with a possible number of strings in C#?

How can I most efficiently check to see if an input string starts with a string that belongs in a list of strings?
For example possiblePrefixes = "1234", "1235", "1236". If input = "1235av2425" should return true. If input = "1237352ko" should return false.
you can use Any for this. the concept here is you need to check whether there is any item in the list which is the prefix for the given string.
List<string> list = new List<string>() { "1234", "1235", "1236" };
string input = "1237352ko";
var exisits = list.Any(x => input.StartsWith(x)); //returns false
when string input = "1235av2425"; it will return true
An efficient datastructure for this type of search would be a prefix tree (aka "Trie").
For your example data such a tree might look something like this:
123
|-4
|-5
|-6
This could allow a lookup time that is independent of the number of prefixes you want to check against.
But as far as I know there are no builtin types for this, so you would either need to find a library, or implement it yourself.
The solution using Any and StartsWith will be the best in most cases. Looking for an optimized solution will only be necessary if you have a long list of possible prefixes and/or a long list of texts to check against the same prefixes.
In that case, using a pre-compiled regular expression built once from the list of possible prefixes and then re-used for multiple checks might be a little faster.
// Build regular expression once
string[] possiblePrefixes = new string[] { "1234", "1235", "1236" };
var escaped = possiblePrefixes.Select(p => Regex.Escape(p));
string pattern = "^(" + string.Join("|", escaped) + ").*";
Regex regEx = new Regex(pattern, RegexOptions.Compiled);
// Now use it multiple times
string input = "1235av2425";
bool result = regEx.IsMatch(input);
Following are the 2 solutions
Solution # 1 (using Lambda Expression)
List<string> possiblePrefixes = new List<string>() { "1234", "1235", "1236" };
string input = "1235av2425";
var result = possiblePrefixes.Any(x => input.StartsWith(x));
Console.WriteLine(result); //returns True
Solution # 2 (using SQL)
List<string> possiblePrefixes = new List<string>() { "1234", "1235", "1236" };
string input = "1235av2425";
var result = (from val in possiblePrefixes
where input.StartsWith(val)
select val).Any();
Console.WriteLine(result); //returns True

Interpolated strings stored in a variable [duplicate]

Can one store the template of a string in a variable and use interpolation on it?
var name = "Joe";
var template = "Hi {name}";
I then want to do something like:
var result = $template;
The reason is my templates will come from a database.
I guess that these strings will have always the same number of parameters, even if they can change. For example, today template is "Hi {name}", and tomorrow could be "Hello {name}".
Short answer: No, you cannot do what you have proposed.
Alternative 1: use the string.Format method.
You can store in your database something like this:
"Hi {0}"
Then, when you retrieve the string template from the db, you can write:
var template = "Hi {0}"; //retrieved from db
var name = "Joe";
var result = string.Format(template, name);
//now result is "Hi Joe"
With 2 parameters:
var name2a = "Mike";
var name2b = "John";
var template2 = "Hi {0} and {1}!"; //retrieved from db
var result2 = string.Format(template2, name2a, name2b);
//now result2 is "Hi Mike and John!"
Alternative 2: use a placeholder.
You can store in your database something like this:
"Hi {name}"
Then, when you retrieve the string template from the db, you can write:
var template = "Hi {name}"; //retrieved from db
var name = "Joe";
var result = template.Replace("{name}", name);
//now result is "Hi Joe"
With 3 parameters:
var name2a = "Mike";
var name2b = "John";
var template2 = "Hi {name2a} and {name2b}!"; //retrieved from db
var result2 = template2
.Replace("{name2a}", name2a)
.Replace("{name2b}", name2b);
//now result2 is "Hi Mike and John!"
Pay attention at which token you choose for your placeholders. Here I used surrounding curly brackets {}. You should find something that is unlikely to cause collisions with the rest of your text. And that depends entirely on your context.
This can be done as requested using dynamic compilation, such as through the Microsoft.CodeAnalysis.CSharp.Scripting package. For example:
var name = "Joe";
var template = "Hi {name}";
var result = await CSharpScript.EvaluateAsync<string>(
"var name = \"" + name + "\"; " +
"return $\"" + template + "\";");
Note that this approach is slow, and you'd need to add more logic to handle escaping of quotes (and injection attacks) within strings, but the above serves as a proof-of-concept.
No you can't do that since it needs name value at the time string is created (compile time). Consider using String.Format or String.Replace instead.
I just had the same need in my app so will share my solution using String.Replace(). If you're able to use LINQ then you can use the Aggregate method (which is a reducing function, if you're familiar with functional programming) combined with a Dictionary that provides the substitutions you want.
string template = "Hi, {name} {surname}";
Dictionary<string, string> substitutions = new Dictionary<string, string>() {
{ "name", "Joe" },
{ "surname", "Bloggs" },
};
string result = substitutions.Aggregate(template, (args, pair) =>
args.Replace($"{{{pair.Key}}}", pair.Value)
);
// result == "Hi, Joe Bloggs"
This works by starting with the template and then iterating over each item in the substitution dictionary, replacing the occurrences of each one. The result of one Replace() call is fed into the input to the next, until all substitutions are performed.
The {{{pair.Key}}} bit is just to escape the { and } used to find a placeholder.
This is pretty old now, but as I've just come across it it's new to me!
It's a bit overkill for what you need, but I have used Handlebars.NET for this sort of thing.
You can create quite complex templates and merge in hierarchical data structures for the context. There's rules for looping and conditional sections, partial template compositing and even helper function extension points. It also handles many data types gracefully.
There's way too much to go into here, but a short example to illustrate...
var source = #"Hello {{Guest.FirstName}}{{#if Guest.Surname}} {{Guest.Surname}}{{/if}}!";
var template = Handlebars.Compile(source);
var rec = new {
Guest = new { FirstName = "Bob", Surname = null }
};
var resultString = template(rec);
In this case the surname will only be included in the output if the value is not null or empty.
Now admittedly this is more complicated for users than simple string interpolation, but remember that you can still just use {{fieldName}} if you want to, just that you can do lots more as well.
This particular nuGet is a port of HandlebarsJs so it has a high degree of compatibility. HandlebarsJs is itself a port of Mustache - there are direct dotNet ports of Mustache but IMHO HandlebarsNET is the business.

How to search based on field in azure search?

i am using a azure search, and i have a console app with the code as below, which is working fine.
DocumentSearchResult<Hotel> results;
Console.WriteLine("Search started\n");
results = indexClient.Documents.Search<Hotel>("smart", new SearchParameters { Top=5 });
WriteDocuments(results);
currently its searching a text with word "smart". this is straight forword, what i need is i have several fields in the table, i want to search based on the feild .
for example let i have two fields
1)Title
2)SoldDate
I have to write code for finding items which has title 'john' and which has a sold date < current date.
what should i do to achieve this?
You can achieve what you want with search and a filter:
// Approach #1
string currentDate = DateTime.UtcNow.ToString("O");
var parameters = new SearchParameters()
{
Filter = "soldDate lt " + currentDate,
Top = 5
}
results = indexClient.Documents.Search<Hotel>("john", parameters);
This will filter the documents to only those with a soldDate before currentDate, and then searches the filtered documents such that documents match if any of the searchable fields contain "john". You can narrow this down to just the title field like this:
// Approach #2
string currentDate = DateTime.UtcNow.ToString("O");
var parameters = new SearchParameters()
{
Filter = "soldDate lt " + currentDate,
SearchFields = new[] { "title" },
Top = 5
}
results = indexClient.Documents.Search<Hotel>("john", parameters);
Or like this:
// Approach #3
string currentDate = DateTime.UtcNow.ToString("O");
var parameters = new SearchParameters()
{
Filter = "soldDate lt " + currentDate,
QueryType = QueryType.Full,
Top = 5
}
results = indexClient.Documents.Search<Hotel>("title:john", parameters);
Which way you use depends on whether you want all search terms to be limited to a specific set of fields (Approach #2), or if you want specific terms to match specific fields (Approach #3).
The reference for SearchParameters is on learn.microsoft.com.

C# Format string in a way I can get different values from it

What is the best way to format the below string in a way so that I can separate out and find the value of PractitionerId, PhysicianNPI, PhysicianName etc.
"PractitionerId:4343343434 , PhysicianNPI: 43434343434, PhysicianName:
John, Doe, PhysicianPhone:2222222222 , PhysicianFax:3333333333 "
So finally I want something like this:
var practitionerId = "4343343434 ";
var physNPI = "43434343434";
var phyName = "John, Doe";
I was thinking of splitting with the names and finding the values assigned to each field but I am not sure if that is the best solution to it.
You could probably generalise this with a regular expression, then use it to build a dictionary/lookup of the terms.
So:
var input= "PractitionerId:4343343434 , PhysicianNPI: 43434343434,"
+ " PhysicianName: John, Doe, PhysicianPhone:2222222222 ,"
+ " PhysicianFax:3333333333";
var pattern = #"(?<=(?<n>\w+)\:)\s*(?<v>.*?)\s*((,\s*\w+\:)|$)";
var dic = Regex
.Matches(input, pattern)
.Cast<Match>()
.ToDictionary(m => m.Groups["n"].Value,
m => m.Groups["v"].Value);
So now you can:
var practitionerId = dic["PractitionerId"];
or
var physicianName = dic["PhysicianName"];
You could get the exact information, doing something like:
var str = "PractitionerId:4343343434 , PhysicianNPI: 43434343434, PhysicianName: John, Doe, PhysicianPhone:2222222222 , PhysicianFax:3333333333 ";
var newStr = str.Split(',');
var practitionerID = newStr[0].Split(':')[1]; // "4343343434"
var physicianNPI = newStr[1].Split(':')[1].Trim(); // "43434343434"
var phyName = newStr[2].Split(':')[1].Trim() + "," + newStr[3]; // "John, Doe"
There are cleaner solutions using Regex patterns though.
Also, you need to parse the corresponding variables to the specific data type you want. Everything here is being treated as a string
Since you seperate information with ",", this should work:
string[] information = yourWholeString.Split(",");
string practitionerId = information[0];
string physNPI = information[1];
string phyName = information[2] + information[3];

Querying a list of strings with a query string?

I have a dictionary:
<string,List<string>>
The key is the product code say "product1" then the list is a list of properties:
"Brand","10.40","64","red","S"
Then I 'can' have a list of rules/filters e.g.
var tmpFilter = new customfilters();
tmpFilter.Field = "2";
tmpFilter.Expression = ">";
tmpFilter.Filter = "10";
So for the above example this would pass because at index 2 (tmpFilter.Field) it is more than 10; then I have another object which defines which fields within the list I want to write to file. For that dictionary item I just want to write the product brand and price where the filters match.
At the moment without the filter I have:
var tmp = new custom();
tmp.Columns = "0,1";
tmp.Delimiter = ",";
tmp.Extention = ".csv";
tmp.CustomFilters = new List<customfilters>() {new customfilters(){ Field = "2", Expression = ">", Filter = "10"} };
public static void Custom(custom custom)
{
foreach (var x in Settings.Prods)
{
//Get Current Product Code
var curprod = Settings.ProductInformation[x];// the dictionary value
foreach (var column in custom.Columns)
{
var curVal = curprod[Convert.ToInt32(column)];
tsw.Write(curVal + custom.Delimiter);
}
Settings.Lines++;
tsw.WriteLine();
}
tsw.Close();
}
I only want to write the curprod if all the filters pass for that list of strings.
How I can do this?
There's a really nice Nuget package based on an example published by Microsoft, that they have decided to make really hard to find for some reason, that allows dynamic linq queries:
https://www.nuget.org/packages/System.Linq.Dynamic/1.0.2
Source:
https://github.com/kahanu/System.Linq.Dynamic
Using that you can do stuff like this very easily (note: I used strings here because the OP states they have a List<string>):
List<string> stuff = new List<string> { "10.40", "64", "5", "56", "99", "2" };
var selected = stuff.Select(s => new { d = double.Parse(s) }).Where("d > 10");
Console.WriteLine(string.Join(", ", selected.Select(s => s.d.ToString()).ToArray()));
Outputs:
10.4, 64, 56, 99
That may give you a place to start. One thing you are going to have to tackle is identifying which of your fields are numeric and should be converted to a numeric type before trying to apply your filter. Otherwise you are going to comparing as strings.

Categories