I have some code that changes a value of some data within my database while within a loop. I'm just wondering what is the most efficient way of filtering my data first? I'll give an example:-
With the class:-
public class myObj
{
int id {get;set;}
string product {get; set;}
string parent{get;set;}
bool received {get;set;}
}
And the DbContext:-
public class myCont:DbContext
{
public DbSet<myObj> myObjs {get;set;}
}
Is it better to do this:-
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
myObj ob = data.myObjs.Where(o => o.parent == "number1");
foreach(int i in list)
{
ob.First(o => o.id == i && o.received != true).received = true;
}
Or:-
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
foreach(int i in list)
{
data.myObjs.First(o => o.parent == "number1" && o.id == i && o.received != true).received = true;
}
Or is there no difference?
Not sure how you get to compile your code example above.
In your myObj object, the received property is an int, yet you are evaluating it against a bool which should cause this line o.received != true to results in an error Cannot apply operator '!=' to operands of type 'int' and 'bool'.
To Check the SQL
Once the code compiles use SQL Profiler to see what SQL is generated.
That will show you the constructed SQLs
Benchmarking
The below is a very crude description of only one possible way you can benchmark your code execution.
Wrap your code into a method, for example:
public void TestingOperationOneWay()
{
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
myObj ob = data.myObjs.Where(o => o.parent == "number1");
foreach(int i in list)
{
ob.First(o => o.id == i && o.received != true).received = true;
}
}
And:
public void TestingOperationAnotherWay()
{
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
foreach(int i in list)
{
data.myObjs.First(o => o.parent == "number1" && o.id == i && o.received != true).received = true;
}
}
Crate a method which iterates x amount of times over each method using the Stopwatch similar to this:
private static TimeSpan ExecuteOneWayTest(int iterations)
{
var stopwatch = Stopwatch.StartNew();
for (var i = 1; i < iterations; i++)
{
TestingOperationOneWay();
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
Evaluate the results similar to this:
static void RunTests()
{
const int iterations = 100000000;
var timespanRun1 = ExecuteOneWayTest(iterations);
var timespanRun2 = ExecuteAnotherWayTest(iterations);
// Evaluate Results....
}
In the case of a choice between your two queries, I agree that they would both execute similarly, and benchmarking is an appropriate response. However, there are some things you can do to optimize. For example, you could use the method 'AsEnumerable' to force evaluation using the IEnumerable 'Where' vice the LINQ 'Where' clause (a difference of translating into SQL and executing against the data source or handling the where within the object hierarchy). Since you appear to be manipulating only properties (and not Entity Relationships), you could do this:
int[] list;
/* Populate list with a bunch of id numbers found in myOBjs */
myCont data = new myCont();
myObj ob = data.myObjs.Where(o => o.parent == "number1").AsEnumerable<myObj>();
foreach(int i in list)
{
ob.First(o => o.id == i && o.received != true).received = true;
}
Doing so would avoid the penalty of hitting the database for each record (possibly avoiding network latency), but would increase your memory footprint. Here's an associated LINQ further explaining this idea. It really depends on where you can absorb the performance cost.
Related
currently I loop through arrays and check if any objects contain a specific id. These objects have a Id property.
public class MyObj
{
public int Id {get; set;}
}
So when checking the locked state I go for this code
bool IsUnlocked(int targetId) {
bool isUnlocked = false;
for (int i = 0; i < myObjs.Length; i++) // loop trough the objects
{
MyObj current = myObjs[i];
if (current.Id == targetId) // a match
{
isUnlocked = true;
break;
}
}
return isUnlocked;
}
I think this can be done smarter with Linq. I tried
bool isUnlocked = myObjs.Contains(current => current.Id == targetId);
but this is a wrong syntax. Do I have to setup something like
myObjs.First(current => current.Id == targetId);
Contains doesn't take a delegate type so passing the behaviour of current => current.Id == targetId into it would not compile.
As for myObjs.First(current => current.Id == targetId);, this will return the first object that satisfies the provided predicate as opposed to returning a bool indicated if there is any item that satisfies the provided predicate or not.
The solution is to use the Any extension method.
bool isUnlocked = myObjs.Any(current => current.Id == targetId);
There is also a dedicated method in the Array class - Array.Exists:
isUnlocked = Array.Exists(myObjs, elem => elem.Id == targetId);
I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.
Hello i have a method that compares the objects of 2 Lists for differences. Right now this works but only for one property at a time.
Here is the Method:
public SPpowerPlantList compareTwoLists(string sqlServer, string database, DateTime timestampCurrent, string noteCurrent, DateTime timestampOld, string noteOld)
{
int count = 0;
SPpowerPlantList powerPlantListCurrent = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampCurrent, noteCurrent);
SPpowerPlantList powerPlantListOld = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampOld, noteOld);
SPpowerPlantList powerPlantListDifferences = new SPpowerPlantList();
count = powerPlantListOld.Count - powerPlantListCurrent.Count;
var differentObjects = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.mwWeb == l.mwWeb)).ToList();
foreach (var differentObject in differentObjects)
{
powerPlantListDifferences.Add(differentObject);
}
return powerPlantListDifferences;
}
This works and i get 4 Objects in the new List. The Problem is that i have a few other properties that i need to compare. Instead of mwWeb for example name. When i try to change it i need for every new property a new List and a new Foreach-Loop.
e.g.
int count = 0;
SPpowerPlantList powerPlantListCurrent = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampCurrent, noteCurrent);
SPpowerPlantList powerPlantListOld = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampOld, noteOld);
SPpowerPlantList powerPlantListDifferences = new SPpowerPlantList();
SPpowerPlantList powerPlantListDifferences2 = new SPpowerPlantList();
count = powerPlantListOld.Count - powerPlantListCurrent.Count;
var differentObjects = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.mwWeb == l.mwWeb)).ToList();
var differentObjects2 = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.shortName == l.shortName)).ToList();
foreach (var differentObject in differentObjects)
{
powerPlantListDifferences.Add(differentObject);
}
foreach (var differentObject in differentObjects2)
{
powerPlantListDifferences2.Add(differentObject);
}
return powerPlantListDifferences;
Is there a way to prevent this? or to make more querys and get only 1 List with all different Objects back?
I tried it with except and intersect but that didnt worked.
So any help or advise would be great and thx for your time.
PS: If there is something wrong with my question-style please say it to me becouse i try to learn to ask better questions.
You may be able to simply chain the properties that you wanted to compare within your Where() clause using OR statements :
// This should get you any elements that have different A properties, B properties, etc.
var different = current.Where(p => !old.Any(l => p.A == l.A || p.B == l.B))
.ToList();
If that doesn't work and you really want to use the Except() or Intersect() methods to properly compare the objects, you could write your own custom IEqualityComparer<YourPowerPlant> to use to properly compare them :
class PowerPlantComparer : IEqualityComparer<YourPowerPlant>
{
// Powerplants are are equal if specific properties are equal.
public bool Equals(YourPowerPlant x, YourPowerPlant y)
{
// Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
// Checks the other properties to compare (examples using mwWeb and shortName)
return x.mwWeb == y.mwWeb && x.shortName == y.shortName;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(YourPowerPlant powerPlant)
{
// Check whether the object is null
if (Object.ReferenceEquals(powerPlant, null)) return 0;
// Get hash code for the mwWeb field if it is not null.
int hashA = powerPlant.mwWeb == null ? 0 : powerPlant.mwWeb.GetHashCode();
// Get hash code for the shortName field if it is not null.
int hashB = powerPlant.shortName == null ? 0 : powerPlant.shortName.GetHashCode();
// Calculate the hash code for the product.
return hashA ^ hashB;
}
}
and then you could likely use something like one of the following depending on your needs :
var different = current.Except(old,new PowerPlantComparer());
or :
var different = current.Intersect(old,new PowerPlantComparer());
One way is to use IEqualityComparer as Rion Williams suggested, if you'd like a more flexible solution you can split logic in to two parts. First create helper method that accepts two lists, and function where you can define what properties you wish to compare. For example :
public static class Helper
{
public static SPpowerPlantList GetDifference(this SPpowerPlantList current, SPpowerPlantList old, Func<PowerPlant, PowerPlant, bool> func)
{
var diff = current.Where(p => old.All(l => func(p, l))).ToList();
var result = new SPpowerPlantList();
foreach (var item in diff) result.Add(item);
return result;
}
}
And use it :
public SPpowerPlantList compareTwoLists(string sqlServer, string database,
DateTime timestampCurrent, string noteCurrent,
DateTime timestampOld, string noteOld)
{
var powerPlantListCurrent = ...;
var powerPlantListOld = ...;
var diff = powerPlantListCurrent.GetDifference(
powerPlantListOld,
(x, y) => x.mwWeb != y.mwWeb ||
x.shortName != y.shortName);
return diff;
}
P.S. if it better suits your needs, you could move method inside of existing class :
public class MyClass
{
public SPpowerPlantList GetDifference(SPpowerPlantList current, SPpowerPlantList old, Func<PowerPlant, PowerPlant, bool> func)
{
...
}
}
And call it (inside of class) :
var result = GetDifference(currentValues, oldValues, (x, y) => x.mwWeb != y.mwWeb);
The easiest way to do this would be to compare some unique identifier (ID)
var differentObjects = powerPlantListCurrent
.Where(p => !powerPlantListOld.Any(l => p.Id == l.Id)
.ToList();
If the other properties might have been updated and you want to check that too, you'll have to compare all of them to detect changes made to existing elements:
Implement a camparison-method (IComparable, IEquatable, IEqualityComparer, or override Equals) or, if that's not possible because you didn't write the class yourself (code generated or external assembly), write a method to compare two of those SPpowerPlantList elements and use that instead of comparing every single property in Linq. For example:
public bool AreThoseTheSame(SPpowerPlantList a,SPpowerPlantList b)
{
if(a.mwWeb != b.mwWeb) return false;
if(a.shortName != b.shortName) return false;
//etc.
return true;
}
Then replace your difference call with this:
var differentObjects = powerPlantListCurrent
.Where(p => !powerPlantListOld.Any(l => AreThoseTheSame(p,l))
.ToList();
Ok, this one has me stumped.
I have a collection of objects called Interviews. An Interview has a collection of Notes in it. A Note has string (nvarchar(max) on the database) property called NoteText.
I have a List called keywords.
What I need to do is find all interviews that have a Note that has any of the keywords within its NoteText property.
I have this so far:
var interviewQuery =
from i in dbContext.Interviews //dbContext was created with Telerik OpenAccess
.Include(n => n.Notes)
.Where(i => i.Notes.Any(n => keywords.Contains(n.NoteText) ))
orderby i.WhenCreated descending
select i;
I don't get an error, I just don't get any results either.
I'm pretty poor at linq, but this can be easily done with a loop instead.
var matchinginterviews = new List<Interview>();
foreach (var inter in MyInterviewEnumerable)
{
foreach (var note in inter.NoteCollection)
{
foreach (string keyword in keywordList)
{
if (note.NoteText.IndexOf(keyword) != -1)
{
matchinginterviews.Add(inter);
}
}
}
}
What's causing the empty results is that you're looking for any keyword values that contain the entire content of any of the notes.
We made an extension method ContainsAny:
public static bool ContainsAny(this string s, IEnumerable<string> possibleContained)
{
foreach (string p in possibleContained)
{
if (s == p) return true;
if (s == null) continue;
if (s.Contains(p)) return true;
}
return false;
}
Then you could do something similar to where you started:
var results = dbContext.Interviews.Where(i => i.Notes.Any(n => n.NoteText.ContainsAny(keywords)));
I have an extension method which takes in two list and compares them for modifications then outputs a new list. Here is the code
public static List<Member> GetModifiedRecords(this List<Member> LocalMemberData, List<Member> RemoteMemberData)
{
var result = (from localdata in LocalMemberData
from remotedata in RemoteMemberData
where
((
localdata.Card != remotedata.Card ||
localdata.DateJoined != remotedata.DateJoined ||
localdata.DatePaidUpTo != remotedata.DatePaidUpTo ||
localdata.Forename != remotedata.Forename ||
localdata.Postcode != remotedata.Postcode ||
localdata.State != remotedata.State ||
localdata.Street != remotedata.Street ||
localdata.Surname != remotedata.Surname ||
localdata.Title != remotedata.Title ||
localdata.Town != remotedata.Town
)
&& (localdata.MemberNumber == remotedata.MemberNumber
))
select localdata).Distinct();
List<Member> modifiedMembers = new List<Member>(result);
return modifiedMembers;
}
What's strange is when running it fails on the line
List<Member> modifiedMembers = new List<Member>(result);
With the error
"The CLR has been unable to transition from COM context 0x3b4668 to COM context 0x3b44f8 for 60 seconds. The thread that owns the destination context/apartment is most likely either doing a non pumping wait or processing a very long running operation without pumping Windows messages. This situation generally has a negative performance impact and may even lead to the application becoming non responsive or memory usage accumulating continually over time. To avoid this problem, all single threaded apartment (STA) threads should use pumping wait primitives (such as CoWaitForMultipleHandles) and routinely pump messages during long running operations."
FYI both Lists that are being compared have over 100,000 records. Am I thinking about this wrong?
If I understand correctly, your LINQ query will compare each element in each list with each element in the other list. That means 100,000 x 100,000 comparisons. That's 1 billion comparisons, which is a lot.
Consider a join, instead of filtering the cartesian product. For instance:
var r = new Random();
var list1 = Enumerable.Range(0,10000).OrderBy(_=>r.Next()).ToList();
var list2 = Enumerable.Range(0,10000).OrderBy(_=>r.Next()).ToList();
var sw = Stopwatch.StartNew();
var c1 = from x1 in list1 from x2 in list2 where x1==x2 select x1;
var j1 = c1.ToList();
sw.ElapsedMilliseconds.Dump();
sw=Stopwatch.StartNew();
var c2 = from x1 in list1 join x2 in list2 on x1 equals x2 select x1;
var j2 = c2.ToList();
sw.ElapsedMilliseconds.Dump();
Gives timings of
3584
1
i.e. joins are super fast (complexity O(n)), your pseudo-join isn't (complexity O(n2)).
At 100000 items, my machine choked (I quit after 5 minutes) on the cartesian product filtering approach, but completed the join in 21ms.
So, rewriting your query as follows should really turbo things up:
(from localdata in LocalMemberData
join remotedata in RemoteMemberData
on localdata.MemberNumber equals remotedata.MemberNumber
where
(
localdata.Card != remotedata.Card ||
localdata.DateJoined != remotedata.DateJoined ||
localdata.DatePaidUpTo != remotedata.DatePaidUpTo ||
localdata.Forename != remotedata.Forename ||
localdata.Postcode != remotedata.Postcode ||
localdata.State != remotedata.State ||
localdata.Street != remotedata.Street ||
localdata.Surname != remotedata.Surname ||
localdata.Title != remotedata.Title ||
localdata.Town != remotedata.Town
)
select localdata).Distinct()
I dont think the content of the error message is where should be focusing. I think you need to rethink your compare operation to streamline and improve the code from a standpoint of readability and execution.
Here is how I read your code, You have 2 sets: List A and List B. You want return the items from List A that exist in List B, but do not exactly match on every property.
That said, I have written and algorithm that uses a slightly simpler 'Member' model, but demonstrates the same functionality you are attempting.
public static List<Member> GetModifiedRecords(List<Member> localMemberData, List<Member> remoteMemberData )
{
var list = new List<Member>();
foreach (var item in localMemberData)
{
var remoteItems = remoteMemberData.Where(q => q.Id == item.Id);
if (remoteItems.Any())
{
var remoteItem = remoteItems.First();
if (item.CompareTo(remoteItem) != 0) list.Add(item);
}
}
return list;
}
public class Member : IComparable<Member>
{
public int Id { get; set; }
public string Card { get; set; }
public DateTime DateJoined { get; set; }
public string PostalCode { get; set; }
// TODO: add other properties
public int CompareTo(Member other)
{
if (this.Card != other.Card) return 1;
if (this.DateJoined != other.DateJoined) return 1;
if (this.PostalCode != other.PostalCode) return 1;
// TODO: add other properties
return 0;
}
}