Using C# Lambda to split string and search value

Using C# Lambda to split string and search value - c#

I have a string with the following value:
0:12211,90:33221,23:09011
In each pair, the first value (before the : (colon)) is an employee id, the second value after is a payroll id.
So If I want to get the payroll id for employee id 23 right now I have to do:
var arrayValues=mystring.split(',');
and then for each arrayValues do the same:
var employeeData = arrayValue.split(':');
That way I will get the key and the value.
Is there a way to get the Payroll ID by a given employee id using lambda?
If the employeeId is not in the string then by default it should return the payrollid for employeeid 0 zero.

Using a Linq pipeline and anonymous objects:
"0:12211,90:33221,23:09011"
.Split(',')
.Select(x => x.Split(':'))
.Select(x => new { employeeId = x[0], payrollId = x[1] })
.Where(x=> x.employeeId == "23")
Results in this:
{
employeeId = "23",
payrollId = "09011"
}
These three lines represent your data processing and projection logic:
.Split(',')
.Select(x => x.Split(':'))
.Select(x => new { employeeId = x[0], payrollId = x[1] })
Then you can add any filtering logic with Where after this the second Select

You can try something like that
"0:12211,90:33221,23:09011"
.Split(new char[] { ',' })
.Select(c => {
var pair = c.Split(new char[] { ':' });
return new KeyValuePair<string, string>(pair[0], pair[1]);
})
.ToList();
You have to be aware of validations of data

If I were you, I'd use a dictionary. Especially if you're going to do more than one lookup.
Dictionary<int, int> employeeIDToPayrollID = "0:12211,90:33221,23:09011"
.Split(',') //Split on comma into ["0:12211", "90:33221", "23:09011"]
.Select(x => x.Split(':')) //Split each string on colon into [ ["0", "12211"]... ]
.ToDictionary(int.Parse(x => x[0]), int.Parse(x => x[1]))
and now, you just have to write employeeIDtoPayrollID[0] to get 12211 back. Notice that int.Parse will throw an exception if your IDs aren't integers. You can remove those calls if you want to have a Dictionary<string, string>.

You can use string.Split along with string.Substring.
var result =
str.Split(',')
.Where(s => s.Substring(0,s.IndexOf(":",StringComparison.Ordinal)) == "23")
.Select(s => s.Substring(s.IndexOf(":",StringComparison.Ordinal) + 1))
.FirstOrDefault();
if this logic will be used more than once then I'd put it to a method:
public string GetPayrollIdByEmployeeId(string source, string employeeId){
return source.Split(',')
.Where(s => s.Substring(0, s.IndexOf(":", StringComparison.Ordinal)) == employeeId)
.Select(s => s.Substring(s.IndexOf(":", StringComparison.Ordinal) + 1))
.FirstOrDefault();
}

Assuming you have more than three pairs in the string (how long is that string, anyway?) you can convert it to a Dictionary and use that going forward.
First, split on the comma and then on the colon and put in a Dictionary:
var empInfo = src.Split(',').Select(p => p.Split(':')).ToDictionary(pa => pa[0], pa => pa[1]);
Now, you can write a function to lookup payroll IDs from employee IDs:
string LookupPayrollID(Dictionary<string, string> empInfo, string empID) => empInfo.TryGetValue(empID, out var prID) ? prID : empInfo["0"];
And you can call it to get the answer:
var emp23prid = LookupPayrollID(empInfo, "23");
var emp32prid = LookupPayrollID(empInfo, "32");
If you just have three employees in the string, creating a Dictionary is probably overkill and a simpler answer may be appropriate, such as searching the string.

Related

Select list elements contained in another list in linq

I have a string with "|" seperators:
string s = "item1|item2|item3|item4";
a list of objects that each have a name and value:
//object
List<ItemObject> itemList = new List<ItemObject>();
itemList.Add(new ItemObject{Name="item0",Value=0});
itemList.Add(new ItemObject{Name="item1",Value=1});
//class
public class ItemObject(){
public string Name {get;set;}
public int Value {get;set;}
}
How could the following code be done in one line in linq?
var newList = new List<object>();
foreach (var item in s.Split("|"))
{
newList.Add(itemList.FirstOrDefault(x => x.Name == item));
}
// Result: newList
// {Name="item1",Value=1}

I would suggest to start from splitting the string in the beginning. By doing so we won't split it during each iteration:
List<ItemObject> newList = s
.Split("|")
.SelectMany(x => itemList.Where(i => i.Name == x))
.ToList();
Or even better:
List<ItemObject> newList = s
.Split("|") // we can also pass second argument: StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries
.Distinct() // remove possible duplicates, we can also specify comparer f.e. StringComparer.CurrentCulture
.SelectMany(x => itemList
.Where(i => string.Equals(i.Name, x))) // it is better to use string.Equals, we can pass comparison as third argument f.e. StringComparison.CurrentCulture
.ToList();

Try this:
var newList = itemList.Where(item => s.Split('|').Contains(item.Name));
The proposed solution also prevents from populating newList with nulls from nonpresent items. You may also consider a more strict string equality check.

string s = "item1|item2|item3|item4";
I don't see a need for splitting this string s. So you could simply do
var newList = itemList.Where(i => s.Contains(i.Name));
For different buggy input you can also do
s = "|" + s + "|";
var newList = itemList.Where(o => s.Contains("|" + o.Name + '|')).ToList();

List<object> newList = itemList.Where(item => s.Split("|").Contains(item.Name)).ToList<object>();

How to concat string in LinQ group

var list = Table
.GroupBy(t => t.GroupId, (key, g) => new {key, g})
.Select(t => new Transaction
{
Date = t.g.First().DateCreate,
Reference = $"{t.g.First().AccounttName} {t.g.Select(z => z.DocumentNo)}",
TotalAmount = t.g.Sum(x => x.y.Amount.Value),
})
When grouping with linQ I know how to get a single value with First(), sum with Sum() but what should I do to concact a string value?
In my example how can I merge all my DocumentNo?

Use string.Join:
Reference = $"{t.g.First().AccounttName} {string.Join(",",t.g.Select(z => z.DocumentNo))}"

Finding the most specific matching item

User input will be like 'BY1 2PX', which will split and stored into list like below
var items = new List<string> {'BY1 2PX', 'BY12', 'BY1', 'BY'};
I have source list of Products
public class Product
{
public string Name {get;set;}
public string Id {get;set;}
}
Below is a sample product list. There is no guarentee on ordering, it could be in any order.
var products = new List<Product>{
new Product("1", "BY1 2PX"),
new Product("2", "BY12"),
new Product("3", "BY1"),
new Product("4", "BY"),
new Product("5", "AA2 B2X"),
//...etc
}
my output should fetch 1, because its most specific match. If Id = 1 is not there then it should have fetched Id =2 like that...etc Could anyone help me in writing a linq query. I have tried something like below, is this fine?
var result = items.Select(x => products.FirstOrDefault(p =>
string.Equals(p.Name.Trim(), x, StringComparison.OrdinalIgnoreCase)))
.FirstOrDefault();

Well, you can use dictionary with its fast lookups :
var productsDict = products.ToDictionary(p => p.Name, p => p);
var key = items.FirstOrDefault(i => productsDict.ContainsKey(i));
Product result = key != null ? productsDict[key] : null;
Or as Tim suggested, if you have multiple elements with same names you can use Lookup :
var productsDict = products.ToLookup(p => p.Name, p => p);
var key = items.FirstOrDefault(i => productsDict.Contains(i));
Product result = key != null ? productsDict[key] : null;

If you want to select the best-matching product you need to select from the product- not the string-list. You could use following LINQ approach that uses List.FindIndex:
Product bestProduct = products
.Select(p => new {
Product = p,
Index = items.FindIndex(s => String.Equals(p.Name, s, StringComparison.OrdinalIgnoreCase))
})
.Where(x => x.Index != -1)
.OrderBy(x => x.Index) // ensures the best-match logic
.Select(x => x.Product)
.FirstOrDefault();
The Where ensures that you won't get an arbitrary product if there is no matching one.
Update:
A more efficient solution is this query:
Product bestProduct = items
.Select(item => products.FirstOrDefault(p => String.Equals(p.Name, item, StringComparison.OrdinalIgnoreCase)))
.FirstOrDefault(p != null); // ensures the best-match logic

You can try to find resemblance of words by using a specific algorythm called Levenshtein's distance algorythm, which is mostly used on "Did you mean 'word'" on most search websites.
This solution can be found here:
https://stackoverflow.com/a/9453762/1372750
Once you find the distance difference, you can measure which word or phrase is more "like" the searched one.

This will find for each product what is the "most specific" (the longest) match in items and will return the product with the longest match (regardless to order of either of the collections)
var result = products
.Select(p => new
{
Product = p,
MostSpecific = items.Where(item => p.Name.Contains(item))
.OrderByDescending(match => match.Length
.FirstOrDefault()
})
.Where(x => x.MostSpecific != null)
.OrderByDescending(x => x.MostSpecific.Length)
.Select(x => x.Product)
.FirstOrDefault();

LINQ Query to find string of multidimensional array with most duplicates

I have written a function that gives me an multidimensional array of an Match with multiple regex strings. (FileCheck[][])
FileCheck[0] // This string[] contains all the filenames
FileCheck[1] // This string[] is 0 or 1 depending on a Regex match is found.
FileCheck[2] // This string[] contains the Index of the first found Regex.
foreach (string File in InputFolder)
{
int j = 0;
FileCheck[0][k] = Path.GetFileName(File);
Console.WriteLine(FileCheck[0][k]);
foreach (Regex Filemask in Filemasks)
{
if (string.IsNullOrEmpty(FileCheck[1][k]) || FileCheck[1][k] == "0")
{
if (Filemask.IsMatch(FileCheck[0][k]))
{
FileCheck[1][k] = "1";
FileCheck[2][k] = j.ToString(); // This is the Index of the Regex thats Valid
}
else
{
FileCheck[1][k] = "0";
}
j++;
}
Console.WriteLine(FileCheck[1][k]);
}
k++;
}
Console.ReadLine();
// I need the Index of the Regex with the most valid hits
I'm trying to write a function that gives me the string of the RegexIndex that has the most duplicates.
This is what I tried but did not work :( (I only get the count of the string the the most duplicates but not the string itself)
// I need the Index of the Regex with the most valid hits
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().ToList();
Console.WriteLine(LINQ[1]);
Example Data
string[][] FileCheck = new string[3][];
FileCheck[0] = new string[]{ "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt"};
FileCheck[1] = new string[]{ "0","1","1","0","1","1"};
FileCheck[2] = new string[]{ null, "3", "3", null,"1","2"};
In this example I need as result of the Linq query:
string result = "3";

With your current code, substituting 'ToList()' with 'Key' would do the trick.
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().Key;
Since the index is null for values that are not found, you could also filter out null values and skip looking at the FileCheck[1] array. For example:
var maxOccurringIndex = FileCheck[2].Where(ind => ind != null)
.GroupBy(ind=>ind)
.OrderByDescending(x => x.Count())
.First().Key;
However, just a suggestion, you can use classes instead of a nested array, e.g.:
class FileCheckInfo
{
public string File{get;set;}
public bool Match => Index.HasValue;
public int? Index{get;set;}
public override string ToString() => $"{File} [{(Match ? Index.ToString() : "no match")}]";
}
Assuming InputFolder is an enumerable of string and Filemasks an enumerable of 'Regex', an array can be filled with:
FileCheckInfo[] FileCheck = InputFolder.Select(f=>
new FileCheckInfo{
File = f,
Index = Filemasks.Select((rx,ind) => new {ind, IsMatch = rx.IsMatch(f)}).FirstOrDefault(r=>r.IsMatch)?.ind
}).ToArray();
Getting the max occurring would be much the same:
var maxOccurringIndex = FileCheck.Where(f=>f.Match).GroupBy(f=>f.Index).OrderByDescending(gr=>gr.Count()).First().Key;
edit PS, the above is all assuming you need to reuse the results, if you only have to find the maximum occurrence you're much better of with an approach such as Martin suggested!
If the goal is only to get the max occurrence, you can use:
var maxOccurringIndex = Filemasks.Select((rx,ind) => new {ind, Count = InputFolder.Count(f=>rx.IsMatch(f))})
.OrderByDescending(m=>m.Count).FirstOrDefault()?.ind;

Your question and code seems very convoluted. I am guessing that you have a list of file names and another list of file masks (regular expressions) and you want to find the file mask that matches most file names. Here is a way to do that:
var fileNames = new[] { "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt" };
var fileMasks = new[] { #"\.txt$", #"\.xml$", "valid" };
var fileMaskWithMostMatches = fileMasks
.Select(
fileMask => new {
FileMask = fileMask,
FileNamesMatched = fileNames.Count(
fileName => Regex.Match(
fileName,
fileMask,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
)
.Success
)
}
)
.OrderByDescending(x => x.FileNamesMatched)
.First()
.FileMask;
With the sample data the value of fileMaskWithMostMatches is valid.
Note that the Regex class will do some caching of regular expressions but if you have many regular expressions it will be more effecient to create the regular expressions outside the implied fileNames.Count for-each loop to avoid recreating the same regular expression again and again (creating a regular expression may take a non-trivial amount of time depending on the complexity).

As an alternative to Martin's answer, here's a simpler version to your existing Linq query that gives the desired result;
var LINQ = FileCheck[2]
.ToLookup(x => x) // Makes a lookup table
.OrderByDescending(x => x.Count()) // Sorts by count, descending
.Select(x => x.Key) // Extract the key
.FirstOrDefault(x => x != null); // Return the first non null key
// or null if none found.

Isn't this much more easier?
string result = FileCheck[2]
.Where(x => x != null)
.GroupBy(x => x)
.OrderByDescending(x => x.Count())
.FirstOrDefault().Key;

LINQ approach to parse lines with keys/values

I have the following string
MyKey1=MyVal1
MyKey2=MyVal2
MyKey3=MyVal3
MyKey3=MyVal3
So first, in need to split into lines, then I need to split each line by '=' char to get key and value from that line. What I want, as a result, is a List<KeyValuePair<string, string>> (why not a Dictionary? => there may be duplicate keys inside the list), so I can't use the .ToDictionary() extension.
I'm pretty stuck with the following:
List<KeyValuePair<string, string>> fields =
(from lines in Regex.Split(input, #"\r?\n|\r", RegexOptions.None)
where !String.IsNullOrWhiteSpace(lines)
.Select(x => x.Split(new [] { '='}, 2, StringSplitOptions.RemoveEmptyEntries))
.ToList()
--> select new KeyValuePair? Or with 'let' for splitting by '='?
what about exception handling (e.g. ignoring empty values)

If you're concerned about duplicate keys, you could use an ILookup instead:
var fields =
(from line in Regex.Split(input, #"\r?\n|\r", RegexOptions.None)
select line.Split(new [] { '=' }, 2))
.ToLookup(x => x[0], x => x[1]);
var items = fields["MyKey3"]; // [ "MyVal3", "MyVal3" ]

You could use a Lookup<TKey, TValue> instead of a dictionary:
var keyValLookup = text.Split(new[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
.Select(l =>
{
var keyVal = l.Split('=');
return new { Key = keyVal[0].Trim(), Value = keyVal.ElementAtOrDefault(1) };
})
.Where(x => x.Key.Length > 0) // not required, just to show how to handle invalid data
.ToLookup(x => x.Key, x => x.Value);
IEnumerable<string> values = keyValLookup["MyKey3"];
Console.Write(string.Join(", ",values)); // MyVal3, MyVal3
A lookup always returns a value even if the key is not present. Then it's an empty sequence. The key must not be unique, so you don't need to group by or remove duplicates before you use ToLookup.

You're pretty close (I changed your example to all method syntax for consistency):
List<KeyValuePair<string, string>> fields =
Regex.Split(input, #"\r?\n|\r", RegexOptions.None)
.Where(s => !String.IsNullOrWhiteSpace(s))
.Select(x => x.Split(new [] {'='}, 2, StringSplitOptions.RemoveEmptyEntries)
.Where(p => p.Length == 2) // to avoid IndexOutOfRangeException
.Select(p => new KeyValuePair(p[0], p[1]));
Although I agree with Jon's comment that a grouping would be cleaner if you have duplicate keys:
IEnumerable<IGrouping<string, string>> fields =
Regex.Split(input, #"\r?\n|\r", RegexOptions.None)
.Where(s => !String.IsNullOrWhiteSpace(s))
.Select(x => x.Split(new [] {'='}, 2, StringSplitOptions.RemoveEmptyEntries))
.GroupBy(p => p[0]);

I suggest you try matching the Key/Value instead of splitting. If you want a dictionary with multiple values for a key, you could use ToLookup (an ILookup):
var result = Regex.Matches(input, #"(?<key>[^=\r\n]+)=(?<value>[^=\r\n]+)")
.OfType<Match>()
.ToLookup(m => m.Groups["key"].Value,
m => m.Groups["value"].Value);
If you need to add to that list later on or want to keep using a list:
var result = Regex.Matches(input, #"(?<key>[^=\r\n]+)=(?<value>[^=\r\n]+)")
.OfType<Match>()
.Select(m => new KeyValuePair<string, string>(m.Groups["key"].Value, m.Groups["value"].Value))
.ToList();
Note: the Regex used might not be suited for your uses as we don't know the inputs you might have.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Using C# Lambda to split string and search value - c#

You can try something like that "0:12211,90:33221,23:09011" .Split(new char[] { ',' }) .Select(c => { var pair = c.Split(new char[] { ':' }); return new KeyValuePair<string, string>(pair[0], pair[1]); }) .ToList(); You have to be aware of validations of data

Related

Select list elements contained in another list in linq

How to concat string in LinQ group

Finding the most specific matching item

LINQ Query to find string of multidimensional array with most duplicates

LINQ approach to parse lines with keys/values

Categories

Resources