More efficient way of using LINQ to compare two items? - c#

I am updating records on a SharePoint list based on data from a SQL database. Lets say my table looks something like this:
VendorNumber
ItemNumber
Descrpition
1001
1
abc
1001
2
def
1002
1
ghi
1002
3
jkl
There can be multiple keys in each table. I am trying to make a generic solution that will work for multiple different table structures. In the above example, VendorNumber and ItemNumber would be considered keys.
I am able to retrieve the SharePoint lists as c# List<Microsoft.SharePoint.Client.ListItem>
I need to search through the List to determine which individual ListItem corresponds to the current SQL datarow I am on. Since both ListItem and DataRow allow bracket notation to specify column names, this is pretty easy to do using LINQ if you only have one key column. What I need is a way to do this if I have anywhere from 1 key to N keys. I have found this solution but realize it is very inefficient. Is there a more efficient way of doing this?
List<string> keyFieldNames = new List<string>() { "VendorNumber", "ItemNumber" };
List<ListItem> itemList = MyFunction_GetSharePointItemList();
DataRow row = MyFunction_GetOneRow();
//this is the part I would like to make more efficient:
foreach (string key in keyFieldNames)
{
//this filters the list with each successive pass.
itemList = itemList.FindAll(item => item[key].ToString().Trim() == row[key].ToString().Trim());
}
Edited to Add: Here is a link to the ListItem class documentation:
Microsoft.SharePoint.Client.ListItem
While ListItem is not a DataTable object, its structure is very similar. I have intentionally designed it so that both the ListItem and my DataRow object will have the same number of columns and the same column names. This was done to make comparing them easier.

A quick optimization tip first:
Create a Dictionary<string, string> to use instead of row
List<string> keyFieldNames = new List<string>() { "VendorNumber", "ItemNumber" };
DataRow row = MyFunction_GetOneRow();
var rowData = keyFieldNames.ToDictionary(name=>row[name].ToString().Trim());
foreach (string key in keyFieldNames)
{
itemList = itemList.FindAll(item => item[key].ToString().Trim() == rowData[key]);
}
This will avoid doing the ToString & Trim on the same records over & over. That's probably taking 1/3rd to 1/2 the time of the loop. (The comparison is fast compared to the string manipulation)
Beyond that, all I can think of is to use reflection to build a specific function, on the fly to handle the comparison. BUT, that would be a big effort, and I don't see it saving that much time. Basically, whatever you do, will still have to do the same basics: Lookup the values by key, and compare them. That's what's taking the majority of the time.

After I stopped looking for an answer, I stumbled across one. I have now realized that using a .Where is implemented using deferred execution. This means that even though the foreach loop iterates several times, the LINQ query executes all at once. This was the part I was struggling to wrap my head around.
My new sudo code:
List<string> keyFieldNames = new List<string>() { "VendorNumber", "ItemNumber" };
List<ListItem> itemList = MyFunction_GetSharePointItemList();
DataRow row = MyFunction_GetOneRow();
//this is the part I would like to make more efficient:
foreach (string key in keyFieldNames)
{
//this filters the list with each successive pass.
itemList = itemList.Where(item => item[key].ToString().Trim() == row[key].ToString().Trim());
}
I know that the .ToString().Trim() is still inefficient, I will address this at some point. But for now at least my mind can rest knowing that the LINQ executes all at once.

Related

Is there a way in linq wherin i can insert a row(from dictionary) in datatable using the list of column names c#?

I have a List<Dictionary<string,string>> something like this:
[0] key1 val,key2 val,key3 val
[1] key1 val,key2 val,key3 val
[2] key1 val,key2 val,key3 val
And i have a list of column names in the same order as columns in the datatable.
I want to filter only those keys which are there inside the list from the dictionary and also insert it in the proper order.
I'm able to filter the required keys to be inserted but then how do i insert it in the proper order in linq.
var colList = new List<string>() { "key3", "key1"};
dict.ForEach(p => jsonDataTable.Rows.Add(p.Where(q=>colList.Contains(q.key)).Select(r => r.Value).ToArray()));
I cannot do like this because number of columns will vary and also the method must work when we pass any list of column names:
foreach(var item in dict)
jsonDatatable.Rows.Add(item[colList[0]], item[colList[1]]);
Please suggest some ways.
LINQ will never ever change the input sources. You can only extract data from it.
Divide problems in subproblems
The only way to change the input sources is by using the extracted data to update your sources. Make sure that before you update the source you have materialized your query (= ToList() etc)
You can divide your problem into subproblems:
Convert the table into a sequence of columns in the correct order
convert the sequence of columns into a sequence of column names (still in the correct order)
use the column names and the dictionary to fetch the requested data.
By separating your problem into these steps, you prepare your solution for reusability. If in future you change your table to a DataGridView, or a table in an entity framework database, or a CSV file, or maybe even JSON, you can reuse the latter steps. If in future you need to use the column names for something else, you can still use the earlier steps.
To be able to use the code in a LINQ-like way, my advice would be to create extension method. If you are unfamiliar with extension methods, read Extension Methods Demystified
You will be more familiar with the layout of your table (System.Data.DataTable? Windows.Forms.DataGridView? DataGrid in Windows.Controls?) and your columns, so you'll have to create the first ones yourself. In the example I use MyTable and MyColumn; replace them with your own Table and Column classes.
public static IEnumerable<MyColumn> ToColumns(this MyTable)
{
// TODO: return the columns of the table
}
public static IEnumerable<string> ToColumnNames(this IEnumerable<MyColumn> columns)
{
return columns.Select(column => ...);
}
If the column name is just a property of the column, I wouldn't bother creating the second procedure. However, the nice thing is that it hides where you get the name from. So to be future-changes-proof, maybe create the method anyway.
You said these columns were sorted. If you want to be able to use ThenBy(...) consider returning an IOrderedEnumerable<MyColumn>. If you won't sort the sorted result, I wouldn't bother.
Usage:
MyTable table = ...
IEnumerable<string> columnNames = table.ToColumns().ToColumnNames();
or:
IEnumerable<string> columnNames = table.ToColumns()
.Select(column => column.Name);
The third subproblem is the interesting one.
Join and GroupJoin
In LINQ whenever you have two tables and you want to use a property of the elements in one table to match them with the properties of another table, consider to use (Group-)Join.
If you only want items of the first table that match exactly one item of the other table, use Join: "Get Customer with his Address", "Get Product with its Supplier". "Book with its Author"
On the other hand, if you expect that one item of the first table matches zero or more items from the other table, use GroupJoin: "Schools, each with their Students", "Customers, each with their Orders", "Authors, each with their Books"
Some people still think in database terms. They tend to use some kind of Left Outer Join to fetch "Schools with their Students". The disadvantage of this is that if a School has 2000 Students, then the same data of the School is transferred 2000 times, once for every Student. GroupJoin will transfer the data of the School only once, and the data of every Student only once.
Back to your question
In your problem: every column name is the key of exactly one item in the Dictionary.
What do you want to do with column names without keys? If you want to discard them, use Join. If you still want to use the column names that have nothing in the Dictionary, use GroupJoin.
IEnumerable<string> columNames = ...
var result = columnNames.Join(myDictionary,
columName => columName, // from every columName take the columnName,
dictionaryItem => dictionaryItem.Key, // from every dictionary keyValuePair take the key
// parameter resultSelector: from every columnName and its matching dictionary keyValuePair
// make one new object:
(columnName, keyValuePair) => new
{
// Select the properties that you want:
Name = columnName,
// take the whole dictionary value:
Value = keyValuePair.Value,
// or select only the properties that you plan to use:
Address = new
{
Street = keyValuePair.Street,
City = keyValuePair.City,
PostCode = keyValuePair.Value.PostCode
...
},
});
If you use this more often: consider to create an extension method for this.
Note: the order of the result of a Join is not specified, so you'll have to Sort after the Order
Usage:
Table myTable = ...
var result = myTable.ToColumns()
.Select(column => column.Name)
.Join(...)
.Sort(joinResult => joinResult.Name)
.ToList();
Instead of filtering on the List<Dictionary<string, string>>, filter on the colList so that you will get in the same order and only if the colList is available in the List<Dictionary<string, string>>
This is as per my understanding, please comment if you need the result in any other way.
var dictAllValues = dict.SelectMany(x => x.Select(y => y.Value)).ToList();
// Now you can filter the colList using the above values
var filteredList = colList.Where(x => dictAllValues.Contains(x));
// or you can directly add to final list as below
jsonDataTable.Rows.AddRange(colList.Where(x => dictAllValues.Contains(x)).ToList());

sql nhibernate performance for loop

I have the following logic:
loop through a list of ids, get the associated entity, and for that entity, loop through another list of ids and get another entity. Code is below:
foreach (var docId in docIds)
{
var doc = new EntityManager<Document>().GetById(docId);
foreach (var tradeId in tradeIds)
{
var trade = new EntityManager<Trade>().GetById(tradeId);
if (doc.Trade.TradeId != trade.TradeId)
{
Document newDoc = new Document(doc, trade, 0);
new EntityManager<Document>().Add(newDoc);
}
}
}
my question is mainly about sql performance. Obviously there will be a bunch of selects happening, as well as some adds. Is this a bad way to go about doing something like this?
Should I, instead, use a session and get a list of all entities that match the list of ids (with 1 select statement) and then loop after?
It depends only on my expirience. But you can test it yourselve.
If Trade entity isn't very big and count of entities wouldnt be over 1000 - reading all entities and loop after will be much preferable.
If count is more 1k - its better to call stored procedure with joining temp table, containing your ids.

Linq: Select data into multiple Lists

How can I select the result of a query into multiple Lists? For example,
class Person
{
public string FirstName {get;set;}
public string LastName {get;set;}
}
void test()
{
var query =
from i in Persons
select i;
// now i want to select two lists - list of first names, and list of last names
// approach 1 - run query twice?
List<string> f = query.Select( i=>i.FirstName).ToList();
List<string> l = query.Select( i=>i.LastName).ToList();
// approach 2 - turn it into a list first, and break it up
List<Person> p = query.ToList();
List<string> f = p.Select( i=>i.FirstName).ToList();
List<string> l = p.Select( i=>i.LastName).ToList();
}
Problem with approach 1 is I need to run the query twice.
Problem with approach 2 is I use twice the memory. When the data set is huge, it may become an issue.
Problem with approach 1 is I need to run the query twice. Problem with approach 2 is I use twice the memory. When the data set is huge, it may become an issue.
Either of these tradeoffs may be adequate, but it depends on the resulting dataset and use case.
If you want to avoid this tradeoff entirely, however, you can. The way around this is to not use Linq:
var firstNames = new List<string>();
var lastNames = new List<string>();
foreach(var person in query)
{
firstNames.Add(person.FirstName);
lastNames.Add(person.LastName);
}
This avoids two queries as well as the "copy" of the items, as you only enumerate the query results once, and don't store any extra information.
Problem with approach 2 is I use twice the memory.
Wrong. Measure it. The string instances are reused.

How can I check if a string in sql server contains at least one of the strings in a local list using linq-to-sql?

In my database field I have a Positions field, which contains a space separated list of position codes. I need to add criteria to my query that checks if any of the locally specified position codes match at least one of the position codes in the field.
For example, I have a local list that contains "RB" and "LB". I want a record that has a Positions value of OL LB to be found, as well as records with a position value of RB OT but not records with a position value of OT OL.
With AND clauses I can do this easily via
foreach (var str in localPositionList)
query = query.Where(x => x.Position.Contains(str);
However, I need this to be chained together as or clauses. If I wasn't dealing with Linq-to-sql (all normal collections) I could do this with
query = query.Where(x => x.Positions.Split(' ').Any(y => localPositionList.contains(y)));
However, this does not work with Linq-to-sql as an exception occurs due it not being able to translate split into SQL.
Is there any way to accomplish this?
I am trying to resist splitting this data out of this table and into other tables, as the sole purpose of this table is to give an optimized "cache" of data that requires the minimum amount of tables in order to get search results (eventually we will be moving this part to Solr, but that's not feasible at the moment due to the schedule).
I was able to get a test version working by using separate queries and running a Union on the result. My code is rough, since I was just hacking, but here it is...
List<string> db = new List<string>() {
"RB OL",
"OT LB",
"OT OL"
};
List<string> tests = new List<string> {
"RB", "LB", "OT"
};
IEnumerable<string> result = db.Where(d => d.Contains("RB"));
for (int i = 1; i < tests.Count(); i++) {
string val = tests[i];
result = result.Union(db.Where(d => d.Contains(val)));
}
result.ToList().ForEach(r => Console.WriteLine(r));
Console.ReadLine();

is there a way to get the Order number of a Dictionary Item?

Dictionary<string,string> items = new Dictionary<string,string>();
IEnumerable<string> textvalues = from c in......//using linQ to query
string s = textvalues[items["book"]];
In this case the textvalues array will accept a integer value to return the string value.How can i get the item number , say "items" has 5 itemnames and "book" is at first position then i must get 0.So textvalues[items["book"]] would be translated as textvalues[item[0]]
Ok i was trying to use OpenXML 2.0 to read Excel.The point here is there is no way i could specify a field name and get the value.
So i was trying to iterate the first row of a worksheet, add the values to a dictionary Dictionary fieldItems so that when i say fieldItems["Status"] it would retrieve me the cell value based on the column number , in my case the column header name.ok here's the code for it.
Dictionary<string, int> headers = new Dictionary<string, int>();
IEnumerable<string> ColumnHeaders = from cell in (from row in worksheet.Descendants<Row>()
where row.RowIndex == 1
select row).First().Descendants<Cell>()
where cell.CellValue != null
select
(cell.DataType != null
&& cell.DataType.HasValue
&& cell.DataType == CellValues.SharedString
? sharedString.ChildElements[
int.Parse(cell.CellValue.InnerText)].InnerText
: cell.CellValue.InnerText);
int i=0;
Parallel.ForEach(ColumnHeaders, x => { headers.Add(x,i++); });
order.Number = textValues[headers["Number"]];
(Darn, I didn't misread it after all, and there's no edit history in the first five minutes.)
Your question is somewhat confusingly presented... but it seems like you're basically trying to find the "position" of an item within a dictionary. There's no such concept in Dictionary<TKey, TValue>. You should regard it as a set of mappings from keys to values.
Obviously when you iterate over the entries in that set of mappings they will come out in some order - but there's no guarantee that it will bear any relation to the order in which the entries were added. If you use SortedDictionary<,> or SortedList<,> (both of which are really still dictionaries), you can get the entries in sorted key order... but it's not clear whether that would be good enough for you.
It's also not clear what you're really trying to achieve - you call textvalues an array in the text, but declare it as an IEnumerable<string> - and it looks like you're then trying to use an indexer with a string parameter...
EDIT: Okay, now the question has been edited, you've got a Dictionary<string, int> rather than a Dictionary<string, string>... so the whole thing makes more sense. Now it's easy:
order.Number = textValues.ElementAt(headers["Number"]);

Categories