string grouping algorithm c# - c#

I need something like a grouping algorithm for strings in C#.
I've tried for days and before I go mad, I should maybe ask someone :)
(no adjazenctmatrix^^)
what do I have is data in an Dictonary
something like this:
key|value
"bla","AAA;BBB;CCC" // ';' is split sign
"whatever","BBB;DDD;EEE;FF"
"hmm", "ZZZ,YYY,XXX"
"foo", "CCC,JJJ,VVV"
....
value1 and value2 contains "BBB" so group it to new string : (in a new dictionary,key whatever...counter?)
"AAA;BBB;CCC;EEE;FF" (or without distinct to "AAA;BBB;CCC;BBB;DDD;EEE;FF")
value3 is his own group
value4 contains "CCC" so group it to the others
"AAA;BBB;CCC;EEE;FF;JJJ;VVV" (or without distinct to "AAA;BBB;CCC;BBB;DDD;EEE;FF;CCC;JJJ;VVV")
I need that string for SQL update
update item set group = bar
where group in ('','',... )
I do it with split and join, this part works :-P
thanks

So first organize the data. Have a map keys["bla"] = some_set("AAA", "BBB", "CCC"); and so on. Then build a reverse map that should look like reverse["BBB"] = ["bla", "whatever"] both maps should be about the same size as the original data.
Next you can do a DFS over the implicit graph (pseudocode):
merge = some_set()
DFS(string key) {
if (key in merge) return; // Been here already.
merge.insert(key);
for (string edge : keys[key])
for (string other_key : reverse[edge])
DFS(other_key)
}
So you can now call DSF("bla"). When it returns it should contain "bla", "whatever", ..." and any other keys that might be in the group and you can concatenate their strings from keys to get the result you wanted.
You can call DFS for every key to get all the group each key belongs (complexity O(N^2*set_op)). Or, better, keep track of what keys you already processed to avoid working on them again (complexity O(N*set_op)).
If you use hash based sets/maps your set_op is O(average string length). If you use tree based structures then set_op is O(logN). This shouldn't matter unless you have very long strings or lots of keys.

Related

More efficient way of using LINQ to compare two items?

I am updating records on a SharePoint list based on data from a SQL database. Lets say my table looks something like this:
VendorNumber
ItemNumber
Descrpition
1001
1
abc
1001
2
def
1002
1
ghi
1002
3
jkl
There can be multiple keys in each table. I am trying to make a generic solution that will work for multiple different table structures. In the above example, VendorNumber and ItemNumber would be considered keys.
I am able to retrieve the SharePoint lists as c# List<Microsoft.SharePoint.Client.ListItem>
I need to search through the List to determine which individual ListItem corresponds to the current SQL datarow I am on. Since both ListItem and DataRow allow bracket notation to specify column names, this is pretty easy to do using LINQ if you only have one key column. What I need is a way to do this if I have anywhere from 1 key to N keys. I have found this solution but realize it is very inefficient. Is there a more efficient way of doing this?
List<string> keyFieldNames = new List<string>() { "VendorNumber", "ItemNumber" };
List<ListItem> itemList = MyFunction_GetSharePointItemList();
DataRow row = MyFunction_GetOneRow();
//this is the part I would like to make more efficient:
foreach (string key in keyFieldNames)
{
//this filters the list with each successive pass.
itemList = itemList.FindAll(item => item[key].ToString().Trim() == row[key].ToString().Trim());
}
Edited to Add: Here is a link to the ListItem class documentation:
Microsoft.SharePoint.Client.ListItem
While ListItem is not a DataTable object, its structure is very similar. I have intentionally designed it so that both the ListItem and my DataRow object will have the same number of columns and the same column names. This was done to make comparing them easier.
A quick optimization tip first:
Create a Dictionary<string, string> to use instead of row
List<string> keyFieldNames = new List<string>() { "VendorNumber", "ItemNumber" };
DataRow row = MyFunction_GetOneRow();
var rowData = keyFieldNames.ToDictionary(name=>row[name].ToString().Trim());
foreach (string key in keyFieldNames)
{
itemList = itemList.FindAll(item => item[key].ToString().Trim() == rowData[key]);
}
This will avoid doing the ToString & Trim on the same records over & over. That's probably taking 1/3rd to 1/2 the time of the loop. (The comparison is fast compared to the string manipulation)
Beyond that, all I can think of is to use reflection to build a specific function, on the fly to handle the comparison. BUT, that would be a big effort, and I don't see it saving that much time. Basically, whatever you do, will still have to do the same basics: Lookup the values by key, and compare them. That's what's taking the majority of the time.
After I stopped looking for an answer, I stumbled across one. I have now realized that using a .Where is implemented using deferred execution. This means that even though the foreach loop iterates several times, the LINQ query executes all at once. This was the part I was struggling to wrap my head around.
My new sudo code:
List<string> keyFieldNames = new List<string>() { "VendorNumber", "ItemNumber" };
List<ListItem> itemList = MyFunction_GetSharePointItemList();
DataRow row = MyFunction_GetOneRow();
//this is the part I would like to make more efficient:
foreach (string key in keyFieldNames)
{
//this filters the list with each successive pass.
itemList = itemList.Where(item => item[key].ToString().Trim() == row[key].ToString().Trim());
}
I know that the .ToString().Trim() is still inefficient, I will address this at some point. But for now at least my mind can rest knowing that the LINQ executes all at once.

Is there a way in linq wherin i can insert a row(from dictionary) in datatable using the list of column names c#?

I have a List<Dictionary<string,string>> something like this:
[0] key1 val,key2 val,key3 val
[1] key1 val,key2 val,key3 val
[2] key1 val,key2 val,key3 val
And i have a list of column names in the same order as columns in the datatable.
I want to filter only those keys which are there inside the list from the dictionary and also insert it in the proper order.
I'm able to filter the required keys to be inserted but then how do i insert it in the proper order in linq.
var colList = new List<string>() { "key3", "key1"};
dict.ForEach(p => jsonDataTable.Rows.Add(p.Where(q=>colList.Contains(q.key)).Select(r => r.Value).ToArray()));
I cannot do like this because number of columns will vary and also the method must work when we pass any list of column names:
foreach(var item in dict)
jsonDatatable.Rows.Add(item[colList[0]], item[colList[1]]);
Please suggest some ways.
LINQ will never ever change the input sources. You can only extract data from it.
Divide problems in subproblems
The only way to change the input sources is by using the extracted data to update your sources. Make sure that before you update the source you have materialized your query (= ToList() etc)
You can divide your problem into subproblems:
Convert the table into a sequence of columns in the correct order
convert the sequence of columns into a sequence of column names (still in the correct order)
use the column names and the dictionary to fetch the requested data.
By separating your problem into these steps, you prepare your solution for reusability. If in future you change your table to a DataGridView, or a table in an entity framework database, or a CSV file, or maybe even JSON, you can reuse the latter steps. If in future you need to use the column names for something else, you can still use the earlier steps.
To be able to use the code in a LINQ-like way, my advice would be to create extension method. If you are unfamiliar with extension methods, read Extension Methods Demystified
You will be more familiar with the layout of your table (System.Data.DataTable? Windows.Forms.DataGridView? DataGrid in Windows.Controls?) and your columns, so you'll have to create the first ones yourself. In the example I use MyTable and MyColumn; replace them with your own Table and Column classes.
public static IEnumerable<MyColumn> ToColumns(this MyTable)
{
// TODO: return the columns of the table
}
public static IEnumerable<string> ToColumnNames(this IEnumerable<MyColumn> columns)
{
return columns.Select(column => ...);
}
If the column name is just a property of the column, I wouldn't bother creating the second procedure. However, the nice thing is that it hides where you get the name from. So to be future-changes-proof, maybe create the method anyway.
You said these columns were sorted. If you want to be able to use ThenBy(...) consider returning an IOrderedEnumerable<MyColumn>. If you won't sort the sorted result, I wouldn't bother.
Usage:
MyTable table = ...
IEnumerable<string> columnNames = table.ToColumns().ToColumnNames();
or:
IEnumerable<string> columnNames = table.ToColumns()
.Select(column => column.Name);
The third subproblem is the interesting one.
Join and GroupJoin
In LINQ whenever you have two tables and you want to use a property of the elements in one table to match them with the properties of another table, consider to use (Group-)Join.
If you only want items of the first table that match exactly one item of the other table, use Join: "Get Customer with his Address", "Get Product with its Supplier". "Book with its Author"
On the other hand, if you expect that one item of the first table matches zero or more items from the other table, use GroupJoin: "Schools, each with their Students", "Customers, each with their Orders", "Authors, each with their Books"
Some people still think in database terms. They tend to use some kind of Left Outer Join to fetch "Schools with their Students". The disadvantage of this is that if a School has 2000 Students, then the same data of the School is transferred 2000 times, once for every Student. GroupJoin will transfer the data of the School only once, and the data of every Student only once.
Back to your question
In your problem: every column name is the key of exactly one item in the Dictionary.
What do you want to do with column names without keys? If you want to discard them, use Join. If you still want to use the column names that have nothing in the Dictionary, use GroupJoin.
IEnumerable<string> columNames = ...
var result = columnNames.Join(myDictionary,
columName => columName, // from every columName take the columnName,
dictionaryItem => dictionaryItem.Key, // from every dictionary keyValuePair take the key
// parameter resultSelector: from every columnName and its matching dictionary keyValuePair
// make one new object:
(columnName, keyValuePair) => new
{
// Select the properties that you want:
Name = columnName,
// take the whole dictionary value:
Value = keyValuePair.Value,
// or select only the properties that you plan to use:
Address = new
{
Street = keyValuePair.Street,
City = keyValuePair.City,
PostCode = keyValuePair.Value.PostCode
...
},
});
If you use this more often: consider to create an extension method for this.
Note: the order of the result of a Join is not specified, so you'll have to Sort after the Order
Usage:
Table myTable = ...
var result = myTable.ToColumns()
.Select(column => column.Name)
.Join(...)
.Sort(joinResult => joinResult.Name)
.ToList();
Instead of filtering on the List<Dictionary<string, string>>, filter on the colList so that you will get in the same order and only if the colList is available in the List<Dictionary<string, string>>
This is as per my understanding, please comment if you need the result in any other way.
var dictAllValues = dict.SelectMany(x => x.Select(y => y.Value)).ToList();
// Now you can filter the colList using the above values
var filteredList = colList.Where(x => dictAllValues.Contains(x));
// or you can directly add to final list as below
jsonDataTable.Rows.AddRange(colList.Where(x => dictAllValues.Contains(x)).ToList());

Dictionary look up where we want the keys contained in a string

I have a dictionary containing keys, e.g.
"Car"
"Card Payment"
I have a string description, e.g. "Card payment to tesco" and I want to find the item in the dictionary that corresponds to the string.
I have tried this:
var category = dictionary.SingleOrDefault(p => description.ToLowerInvariant().Contains(p.Key)).Value;
This currently results in both "Car" and "Card Payment" being returned from the dictionary and my code blows up as I have SingleOrDefault.
How can I achieve what I want? I thought about prefixing and suffixing the keys in spaces, but I'd have to do the same to the descriptions - I think this would work but it is a bit dirty. Are there any better ways? I have no objections of changing the Dictionary to some other type as long as performance is not impacted too much.
Required Result for above example: only get "Card Payment"
You can try to use linq OrderByDescending and Take after your where condition. to find the most match word value.
var category = dictionary
.Where(p => description.ToLowerInvariant().Contains(p.Key.ToLowerInvariant()))
.OrderByDescending(x => x.Key.Length)
.Take(1);
c# online
I would use List<string> to contain your keys, because there isn't any reason need to use a key and value collection.
List<string> keys = new List<string>();
keys.Add("Car");
keys.Add("Card Payment");
string description = "Card payment to tesco";
var category = keys
.Where(p => description.ToLowerInvariant().Contains(p.ToLowerInvariant()))
.OrderByDescending(x => x.Length)
.Take(1)
.FirstOrDefault();
NOTE
OrderBy key values length desc can make sure which key is the most match word value.
Here I'm using List<string> keys and System.Text.RegularExpressions find desired key.Try it.
string description = "Card payment to tesco";
List<string> keys = new List<string> {
{"Car" }, {"Card Payment" }
};
string desc = description.ToLowerInvariant( );
string pattern = #"([{0}]+) (\S+)";
var resp = keys.FirstOrDefault( a => {
var regx = new Regex( string.Format( pattern, a.ToLowerInvariant( ) ) );
return regx.Match( desc ).Success;
} );
Check here .NET Fiddle
You are abusing dictionaries. You will get no performance gain from dictionaries by scanning the keys. Even worse, a simple list would be faster in this case. Dictionaries approach a constant time access (O(1)) if you look up a value by the key.
if (dictionary.TryGetValue(key, out var value)) { ...
To be able to use this advantage you will need a more subtle approach. The main difficulty is that sometimes keys might consist of more than a single word. Therefore I would suggest a two level approach where at the first level you store single word keys and at the second level you store the composed keys and values.
Example: Key value pairs to be stored:
["car"]: categoryA
["card payment"]: categoryB
["payment"]: categoryC
We build a dictionary as
var dictionary = new Dictionary<string, List<KeyValuePair<string, TValue>>> {
["car"] = new List<KeyValuePair<string, TValue>> {
new KeyValuePair("car", categoryA)
},
["card"] = new List<KeyValuePair<string, TValue>> {
new KeyValuePair("card payment", categoryB)
},
["payment"] = new List<KeyValuePair<string, TValue>> {
new KeyValuePair("card payment", categoryB),
new KeyValuePair("payment", categoryC)
}
};
Of course, in reality, we would do this using an algorithm. But the point here is to show the structure. As you can see, the third entry for the main key "payment" contains two entries: One for "card payment" and one for "payment".
The algorithm for adding values goes like this:
Split the key the be entered into single words.
For each word, create a dictionary entry using this word as main key and store a key value pair in a list as dictionary value. This second key is the original key possibly consisting of several words.
As you can imagine, step 2 requires you to test whether an entry with the same main key is already there. If yes, then add the new entry to the existing list. Otherwise create a new list with a single entry and insert it into the dictionary.
Retrieve an entry like this:
Split the key the be entered into single words.
For each word, retrieve the existing dictionary entries using a true and therefore fast dictionary lookup(!) into a List<List<KeyValuePair<string, TValue>>>.
Flatten this list of lists using SelectMany into a single List<KeyValuePair<string, TValue>>
Sort them by key length in descending order and test whether the description contains the key. The first entry found is the result.
You can also combine steps 2 and 3 and directly add the list entries of the single dictionary entries into a main list.

Getting the selected datalist by an array

I am going to ask a very basic question and probably a repeated one but I have a bit different situation.
I want to use "in" operator in Linq.
I have to get all the rows from table which has Id provided
by my array and returns the row if it has. How can I do it.
My array has
var aa="1091","1092","1093" and so on.
and my table uses these Ids as Primary keys
.I have to get all the rows whose Id is contained in the array and I do not want to use S.P.
You can use Enumerable.Contains,
var aa = new string[3] { "1091", "1092", "1093" };
var res = yourDataSource.Where(c => aa.Contains(c.ID));
IN statements are created by using Contains in your Where call. Assuming you use integers as IDs, you could write something like this:
var myArray=new[]{1091,1092,1094};
var myEntities=from entity in myTable
where myArray.Contains(entity.ID)
select entity;

Linq question about grouping something that can change?

I have a list of multiple string and I need to do operation on them by the suffixe they have. The only thing that is not changing is the beginning of the string (They will be always ManifestXXX.txt, FileNameItems1XXX...). The string end's with a suffix is different everytime. Here is what I have so far (Linq Pad):
var filesName = new[] { "ManifestSUFFIX.txt",
"FileNameItems1SUFFIX.txt",
"FileNameItems2SUFFIX.txt",
"FileNameItems3SUFFIX.txt",
"FileNameItems4SUFFIX.txt",
"ManifestWOOT.txt",
"FileNameItems1WOOT.txt",
"FileNameItems2WOOT.txt",
"FileNameItems3WOOT.txt",
"FileNameItems4WOOT.txt",
}.AsQueryable();
var query =
from n in filesName
group n by n.EndsWith("SUFFIX.txt") into ere
select new{ere} ;
query.Dump();
The condition in the GROUP is not good. I am thinking to try to get all possible suffixe with a nested SELECT in the group but I can't find a way to do it.
How can I have 3 differents group, grouping by their suffixe with Linq? Is it possible?
*Jimmy answer is great but still doesn't work the way desired. Any fix?
group by the suffix rather than whether it matches any particular one.
...
group by GetSuffix(n) into ere
...
string GetSuffix(string n) {
return Regex.Replace(n,"^Manifest|^FileNameItems[0-9]+", "");
}

Categories