I have run a profiler on my .NET winforms app (compiled with .NET 4.7.1) and it is pointing at the following function as consuming 73% of my application's CPU time, which seems like far too much for a simple utility function:
public static bool DoesRecordExist(string keyColumn1, string keyColumn2, string keyColumn3,
string keyValue1, string keyValue2, string keyValue3, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0) {
bool exists = dt.AsEnumerable()
.Where(r =>
string.Equals(SafeTrim(r[keyColumn1]), keyValue1, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn2]), keyValue2, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn3]), keyValue3, StringComparison.CurrentCultureIgnoreCase)
)
.Any();
return exists;
} else {
return false;
}
}
The purpose of this function is to pass in some key column names and matching key values, and checking whether any matching record exists in the in-memory c# DataTable.
My app is processing hundreds of thousands of records and for each record, this function must be called multiple times. The app is doing a lot of inserts, and before any insert, it must check whether that record already exists in the database. I figured that an in-memory check against the DataTable would be much faster than going back to the physical database each time, so that's why I'm doing this in-memory check. Each time I do a database insert, I do a corresponding insert into the DataTable, so that subsequent checks as to whether the record exists will be accurate.
So to my question: Is there a faster approach? (I don't think I can avoid checking for record existence each and every time, else I'll end up with duplicate inserts and key violations.)
EDIT #1
In addition to trying the suggestions that have been coming in, which I'm trying now, it occurred to me that I should also maybe do the .AsEnumerable() only once and pass in the EnumerableRowCollection<DataRow> instead of the DataTable. Do you think this will help?
EDIT #2
I just did a controlled test and found that querying the database directly to see if a record already exists is dramatically slower than doing an in-memory lookup.
You should try parallel execution, this should be a very good case for that as you mentioned you are working with a huge set, and no orderliness is needed if you just want to check if a record already exists.
bool exists = dt.AsEnumerable().AsParallel().Any((r =>
string.Equals(SafeTrim(r[keyColumn1]), keyValue1, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn2]), keyValue2, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn3]), keyValue3, StringComparison.CurrentCultureIgnoreCase)
)
Your solution find all occurences which evaluates true in the condition and then you ask if there is any. Instead use Any directly. Replace Where with Any. It will stop processing when hits first true evaulation of the condition.
bool exists = dt.AsEnumerable().Any(r => condition);
I suggest that you are keeping the key columns of the existing records in a HashSet. I'm using tuples here, but you could also create your own Key struct or class by overriding GetHashCode and Equals.
private HashSet<(string, string, string)> _existingKeys =
new HashSet<(string, string, string)>();
Then you can test the existence of a key very quickly with
if (_existingKeys.Contains((keyValue1, keyValue2, keyValue3))) {
...
}
Don't forget to keep this HashSet in sync with your additions and deletions. Note that tuples cannot be compared with CurrentCultureIgnoreCase. Therefore either convert all the keys to lower case, or use the custom struct approach where you can use the desired comparison method.
public readonly struct Key
{
public Key(string key1, string key2, string key3) : this()
{
Key1 = key1?.Trim() ?? "";
Key2 = key2?.Trim() ?? "";
Key3 = key3?.Trim() ?? "";
}
public string Key1 { get; }
public string Key2 { get; }
public string Key3 { get; }
public override bool Equals(object obj)
{
if (!(obj is Key)) {
return false;
}
var key = (Key)obj;
return
String.Equals(Key1, key.Key1, StringComparison.CurrentCultureIgnoreCase) &&
String.Equals(Key2, key.Key2, StringComparison.CurrentCultureIgnoreCase) &&
String.Equals(Key3, key.Key3, StringComparison.CurrentCultureIgnoreCase);
}
public override int GetHashCode()
{
int hashCode = -2131266610;
unchecked {
hashCode = hashCode * -1521134295 + StringComparer.CurrentCultureIgnoreCase.GetHashCode(Key1);
hashCode = hashCode * -1521134295 + StringComparer.CurrentCultureIgnoreCase.GetHashCode(Key2);
hashCode = hashCode * -1521134295 + StringComparer.CurrentCultureIgnoreCase.GetHashCode(Key3);
}
return hashCode;
}
}
Another question is whether it is a good idea to use the current culture when comparing db keys. Users with different cultures might get different results. Better explicitly specify the same culture used by the db.
It might be that you want to transpose your data structure. Instead of having a DataTable where each row has keyColumn1, keyColumn2 and keyColumn3, have 3 HashSet<string>, where the first contains all of the keyColumn1 values, etc.
Doing this should be a lot faster than iterating through each of the rows:
var hashSetColumn1 = new HashSet<string>(
dt.Rows.Select(x => x[keyColumn1]),
StringComparison.CurrentCultureIgnoreCase);
var hashSetColumn2 = new HashSet<string>(
dt.Rows.Select(x => x[keyColumn2]),
StringComparison.CurrentCultureIgnoreCase);
var hashSetColumn3 = new HashSet<string>(
dt.Rows.Select(x => x[keyColumn3]),
StringComparison.CurrentCultureIgnoreCase);
Obviously, create these once, and then maintain them (as you're currently maintaining your DataTable). They're expensive to create, but cheap to query.
Then:
bool exists = hashSetColumn1.Contains(keyValue1) &&
hashSetColumn2.Contains(keyValue2) &&
hashSetColumn3.Contains(keyValue3);
Alternatively (and more cleanly), you can define your own struct which contains values from the 3 columns, and use a single HashSet:
public struct Row : IEquatable<Row>
{
// Convenience
private static readonly IEqualityComparer<string> comparer = StringComparer.CurrentCultureIngoreCase;
public string Value1 { get; }
public string Value2 { get; }
public string Value3 { get; }
public Row(string value1, string value2, string value3)
{
Value1 = value1;
Value2 = value2;
Value3 = value3;
}
public override bool Equals(object obj) => obj is Row row && Equals(row);
public bool Equals(Row other)
{
return comparer.Equals(Value1, other.Value1) &&
comparer.Equals(Value2, other.Value2) &&
comparer.Equals(Value3, other.Value3);
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + comparer.GetHashCode(Value1);
hash = hash * 23 + comparer.GetHashCode(Value2);
hash = hash * 23 + comparer.GetHashCode(Value3);
return hash;
}
}
public static bool operator ==(Row left, Row right) => left.Equals(right);
public static bool operator !=(Row left, Row right) => !(left == right);
}
Then you can make a:
var hashSet = new HashSet<Row>(dt.Select(x => new Row(x[keyColumn1], x[keyColumn2], x[keyColumn3]));
And cache that. Query it like:
hashSet.Contains(new Row(keyValue1, keyValue2, keyValue3));
In some cases using LINQ won't optimize as good as a sequential query, so you might be better of writing the query just the old-fashined way:
public static bool DoesRecordExist(string keyColumn1, string keyColumn2, string keyColumn3,
string keyValue1, string keyValue2, string keyValue3, DataTable dt)
{
if (dt != null)
{
foreach (var r in dt.Rows)
{
if (string.Equals(SafeTrim(r[keyColumn1]), keyValue1, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn2]), keyValue2, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn3]), keyValue3, StringComparison.CurrentCultureIgnoreCase)
{
return true;
}
}
}
return false;
}
But there might be more structural improvements, but this depends on the situation whether you can use it.
Option 1: Making the selection already in the database
You are using a DataTable, so there is a chance that you fetch the data from the database. If you have a lot of records, then it might make more sense to move this check to the database. When using the proper indexes it might be way faster then an in-memory tablescan.
Option 2: Replace string.Equals+SafeTrim with a custom method
You are using SafeTrim up to three times per row, which creates a lot of new strings. When you create your own method that compares both strings (string.Equals) with respect to leading/trailing whitespaces (SafeTrim), but without creating a new string then this could be way faster, reduce memory load and reduce garbage collection. If the implementation is good enough to inline, then you'll gain a lot of performance.
Option 3: Check the columns in the proper order
Make sure you use the proper order and specify the column that has the least probability to match as keyColumn1. This will make the if-statement result to false sooner. If keyColumn1 matches in 80% of the cases, then you need to perform a lot more comparisons.
Related
I have a list of this model
public Class Contact
{
public string MobileNumber {get;set;}
public string PhoneNumber2 {get;set;}
}
I have a method that compares this list against of list of phone numbers and returns non matching values
private List<ContactDto> GetNewContactsNotFoundInCrm(ContactPostModel model)
{
var duplicates = GetAllNumbers(); // Returns a List<string> of 5 million numbers
var mobile = duplicates.Select(x => x.MobilePhone).ToList();
var telephone2 = duplicates.Select(x => x.Telephone2).ToList();
// I'm trying to compare Telephone2 and MobilePhone against the
// duplicates list of 5 million numbers. It works, but it's slow
// and can take over a minute searching for around 5000 numbers.
return model.Contacts
.Where(y =>
!mobile.Contains(y.Phonenumber.ToPhoneNumber()) &&
!telephone2.Contains(y.Phonenumber.ToPhoneNumber()) &&
!mobile.Contains(y.Phonenumber2.ToPhoneNumber()) &&
!telephone2.Contains(y.Phonenumber2.ToPhoneNumber()))
.ToList();
}
// Extension method used
public static string ToPhoneNumber(this string phoneNumber)
{
if (phoneNumber == null || phoneNumber == string.Empty)
return string.Empty;
return phoneNumber.Replace("(", "").Replace(")", "")
.Replace(" ", "").Replace("-", "");
}
What data structure can I use to compare the Mobile and Telephone2 to the list of 5 million numbers for better performance?
Creating a HashSet will probably solve your problems:
var mobile = new HashSet<string>(duplicates.Select(x => x.MobilePhone));
var telephone2 = new HashSet<string>(duplicates.Select(x => x.Telephone2));
There are other performance improvements you can make, but they'll be micro-optimizations compared to avoiding iterating over 5 million items for each number you check.
You can use Enumerable.Except.
Enumerable.Except uses HashSet internally to improve lookup performance. You want to use the overload which allows to pass in a custom IEqualityComparer as an argument.
Also note that ToList() is a LINQ query finalizer. Means that ToList() executes the LINQ expression immediately - which results in a complete iteration over the collection. LINQ's power is that queries are executed deferred, which improves the performance significantly. All sub-queries (whether chained or split up in separate statements) are merged into one single iteration using yield return internally:
Good performance:
// Instead of two iterations, LINQ will defer both
// iterations and merge them into a single iteration
var filtered = collection.Where(); // Deferred iteration #1
var projected = filtered.Select(); // Deferred iteration #2
var results = projected.ToList(); // Results in one single iteration
Bad performance:
// LINQ will execute each iteration immediately
// resulting in two complete iterations
var filtered = collection.Where().ToList(); // Executed iteration #1
var projected = filtered.Select().ToList(); // Executed iteration #2
You should avoid to call a finalizer before you actually want to execute the query to significantly improve the performance:
// Executes deferred
var mobile = duplicates.Select(x => x.MobilePhone);
Instead of:
// Executes immediately
var mobile = duplicates.Select(x => x.MobilePhone).ToList();
Also note that each Enumerable.Contains executes a separate iteration. Contains is a finalizer and will execute immediately:
return model.Contacts
.Where(y =>
!mobile.Contains(y.Phonenumber.ToPhoneNumber()) // Iteration #1
&& !telephone2.Contains(y.Phonenumber.ToPhoneNumber()) // Iteration #2
&& !mobile.Contains(y.Phonenumber2.ToPhoneNumber()) // Iteration #3
&& !telephone2.Contains(y.Phonenumber2.ToPhoneNumber())) // Iteration #4
.ToList(); // Iteraion #5
Worst case iterates over n elements * 4 Enumerable.Contains * 5*10^6 reference elements in mobile and telephone2 - only for comparison!
Enumerable.Except
ContactEqualityComparer.cs
class ContactEqualityComparer : IEqualityComparer<Contact>
{
public bool Equals(Contact contact1, Contact contact2)
{
if (ReferenceEquals(contact1, contact2))
return true;
else if (ReferenceEquals(contact1, null) || ReferenceEquals(contact2, null))
return false;
else if (contact1.MobileNumber.Equals(contact2.MobileNumber, StringComparoison.OrdinalIgnoreCase)
&& contact1.PhoneNumber2.Equals(contact2.PhoneNumber2, StringComparer.OrdinalIgnoreCase))
return true;
else
return false;
}
// Will be used by Enumerable.Except to generate item keys
// for the lookup table
public int GetHashCode(Contact contact)
{
unchecked
{
return ((contact.MobileNumber != null
? contact.MobileNumber.GetHashCode()
: 0) * 397) ^ (contact.PhoneNumber2 != null
? contact.PhoneNumber2.GetHashCode()
: 0);
}
}
}
Contact.cs
Consider to use two properties for each data: one property for computations and one for display e.g. MobileNumber and MobileNumberDisplay. The computation properties should be numeric.
public class Contact : IEqualityComparer<Contact>
{
private string mobileNumber;
public string MobileNumber
{
get => this.mobileNumber;
set => this.mobileNumber = value.ToPhoneNumber();
}
private string phoneNumber2;
public string PhoneNumber2
{
get => this.phoneNumber2;
set => this.phoneNumber2 = value.ToPhoneNumber();
}
public string ToPhoneNumber(string phoneNumber)
{
if (phoneNumber == null || phoneNumber == string.Empty)
return string.Empty;
return phoneNumber.Replace("(", "").Replace(")", "")
.Replace(" ", "").Replace("-", "");
}
}
Example
private List<Contact> GetNewContactsNotFoundInCrm(ContactPostModel model)
{
List<Contact> duplicates = GetAllNumbers();
return model.Contacts
.Except(duplicates, new ContactEqualityComparer())
.ToList();
}
One good option here is to try removing the need for the phone number conversions (call to ToPhoneNumber method) over each iteration step, by making both regular numbers (the one you convert ToPhoneNumber), telephone number and mobile numbers and telephone2 numbers to compared by the same format.
The other thing to improve over the query is to cach the calls for mobile and telephone2 numbers. You can move their calculation outside of the GetNewContactsNotFoundInCrm method and acquire only when there is a new change in data.
Finally, consider using HashSet for removing the need to have duplicates and make fast comparisons.
Side note:
If you are dealing with the database elements, consider moving this logic to SQL Stored Procedure.
I have one list which has data and sometimes it contains duplicate rows and I want to remove that duplicate row for that I used below code
num = numDetailsTemp.Distinct().ToList();
var query = num.GroupBy(o => new { o.Number })
.Select(group =>
new
{
Name = group.Key,
Numbers = group.OrderByDescending(x => x.Date)
})
.OrderBy(group => group.Numbers.First().Date);
List<NumberDetails> numTemp = new List<NumberDetails>();
foreach (var group in query)
{
foreach (var numb in group.Numbers)
{
numTemp.Add(numb);
break;
}
}
num = numTemp;
The below image shows the duplicate value from the list.
And when I apply remove duplicate it give me an output
But I want to remove that row which not contains alter no or id proof and date like shown in first image first row not, contains AlterNo and ID Proof and date and the second row contains that so I want to remove the first row and display only second row. The date is compulsory to check and after that AlterNo and ID Proof.
You can try the following:
var group =
list
.GroupBy(r => r.Number)
.SelectMany(g => g) //flatten your grouping and filter where you have alterno and id
.Where(r => !string.IsNullOrEmpty(r.AlterNo) && !string.IsNullOrEmpty(r.Id))
.OrderByDescending(r=>r.Date)
.ToList();
You may eliminate duplicates using Distinct operator. First you need to define a comparer class which implements IEqualityComparer interface, and then pass it to the distinct operator in your method.
internal class NumberDetailsComparer : IEqualityComparer<NumberDetails>
{
public bool Equals(NumberDetails x, NumberDetails y)
{
if (\* Set of conditions for equality matching *\)
{
return true;
}
return false;
}
public int GetHashCode(Student obj)
{
return obj.Name.GetHashCode(); // Name or whatever unique property
}
}
And here is how to use it:
var distinctRecords = source.Distinct(new NumberDetailsComparer());
All you need to do is define the criteria for comparer class.
Hope this solves your problem.
This link could be useful for a fully working example:
http://dotnetpattern.com/linq-distinct-operator
So you have a sequence of NumberDetails, and a definition about when you would consider to NumberDetails equal.
Once you have found which NumberDetails are equal, you want to eliminate the duplicates, except one: a duplicate that has values for AlterNo and IdProof.
Alas you didn't specify what you want if there are no duplicates with values for AlterNo and IdProof. Nor what you want if there are several duplicates with values for AlterNo and IdProof.
But let's assume that if there are several of these items, you don't care: just pick one, because they are duplicates anyway.
In your requirement you speak about duplicates. So let's write a class that implements your requirements of equality:
class NumberDetailEqualityComparer : IEqualityComparer<NumberDetail>
{
public static IEQualityComparer<NumberDetail> Default {get;} = new NumberDetaulEqualityComparer();
public bool Equals(NumberDetail x, NumberDetail y)
{
if (x == null) return y == null; // true if both null
if (y == null) return false; // because x not null and y null
if (Object.ReferenceEquals(x, y) return true; // because same object
if (x.GetType() != y.GetType()) return false; // because not same type
// by now we are out of quick checks, we need a value check
return x.Number == y.Number
&& x.FullName == y.FullName
&& ...
// etc, such that this returns true if according your definition
// x and y are equal
}
You also need to implement GetHashCode. You can return anything you want, as long as you
are certain that if x and y are equal, then they return the same HashCode
Furthermore it would be more efficient that if x and y not equal,
then there is a high probability for different HashCode.
Something like:
public int GetHashCode(NumberDetail numberDetail)
{
const int prime1 = 12654365;
const int prime2 = 54655549;
if (numberDetail == null) return prime1;
int hash = prime1;
unsafe
{
hash = prime2 * hash + numberDetail.Number.GetHashCode();
hash = prime2 * hash + numberDetail.FullName.GetHashCode();
hash = prime2 * hash + numberDetail.Date.GetHashCode();
...
}
return hash;
Of course you have to check if any of the properties equal NULL before asking the HashCode.
Obviously in your equality (and thus in GetHashCode) you don't look at AlterNo nor IdProof.
Once that you've defined precisely when you consider two NumberDetails equal, you can make groups of equal NumberDetails
var groupsEqualNumberDetails = numberDetails.GroupBy(
// keySelector: make groups with equal NumberDetails:
numberDetail => numberDetail,
// ResultSelector: take the key and all NumberDetails thas equal this key:
// and keep the first one that has values for AlterNo and IdProof
(key, numberDetailsEqualToKey) => numberDetailsEqualToKey
.Where(numberDetail => numberDetail.AlterNo != null
&& numberDetail.IdProof != null)
.FirstOrDefault(),
// KeyComparer: when do you consider two NumberDetails equal?
NumberDetailEqualityComparer.Default;
}
Hello i have a method that compares the objects of 2 Lists for differences. Right now this works but only for one property at a time.
Here is the Method:
public SPpowerPlantList compareTwoLists(string sqlServer, string database, DateTime timestampCurrent, string noteCurrent, DateTime timestampOld, string noteOld)
{
int count = 0;
SPpowerPlantList powerPlantListCurrent = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampCurrent, noteCurrent);
SPpowerPlantList powerPlantListOld = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampOld, noteOld);
SPpowerPlantList powerPlantListDifferences = new SPpowerPlantList();
count = powerPlantListOld.Count - powerPlantListCurrent.Count;
var differentObjects = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.mwWeb == l.mwWeb)).ToList();
foreach (var differentObject in differentObjects)
{
powerPlantListDifferences.Add(differentObject);
}
return powerPlantListDifferences;
}
This works and i get 4 Objects in the new List. The Problem is that i have a few other properties that i need to compare. Instead of mwWeb for example name. When i try to change it i need for every new property a new List and a new Foreach-Loop.
e.g.
int count = 0;
SPpowerPlantList powerPlantListCurrent = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampCurrent, noteCurrent);
SPpowerPlantList powerPlantListOld = loadProjectsAndComponentsFromSqlServer(sqlServer, database, timestampOld, noteOld);
SPpowerPlantList powerPlantListDifferences = new SPpowerPlantList();
SPpowerPlantList powerPlantListDifferences2 = new SPpowerPlantList();
count = powerPlantListOld.Count - powerPlantListCurrent.Count;
var differentObjects = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.mwWeb == l.mwWeb)).ToList();
var differentObjects2 = powerPlantListCurrent.Where(p => !powerPlantListOld.Any(l => p.shortName == l.shortName)).ToList();
foreach (var differentObject in differentObjects)
{
powerPlantListDifferences.Add(differentObject);
}
foreach (var differentObject in differentObjects2)
{
powerPlantListDifferences2.Add(differentObject);
}
return powerPlantListDifferences;
Is there a way to prevent this? or to make more querys and get only 1 List with all different Objects back?
I tried it with except and intersect but that didnt worked.
So any help or advise would be great and thx for your time.
PS: If there is something wrong with my question-style please say it to me becouse i try to learn to ask better questions.
You may be able to simply chain the properties that you wanted to compare within your Where() clause using OR statements :
// This should get you any elements that have different A properties, B properties, etc.
var different = current.Where(p => !old.Any(l => p.A == l.A || p.B == l.B))
.ToList();
If that doesn't work and you really want to use the Except() or Intersect() methods to properly compare the objects, you could write your own custom IEqualityComparer<YourPowerPlant> to use to properly compare them :
class PowerPlantComparer : IEqualityComparer<YourPowerPlant>
{
// Powerplants are are equal if specific properties are equal.
public bool Equals(YourPowerPlant x, YourPowerPlant y)
{
// Check whether the compared objects reference the same data.
if (Object.ReferenceEquals(x, y)) return true;
//Check whether any of the compared objects is null.
if (Object.ReferenceEquals(x, null) || Object.ReferenceEquals(y, null))
return false;
// Checks the other properties to compare (examples using mwWeb and shortName)
return x.mwWeb == y.mwWeb && x.shortName == y.shortName;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(YourPowerPlant powerPlant)
{
// Check whether the object is null
if (Object.ReferenceEquals(powerPlant, null)) return 0;
// Get hash code for the mwWeb field if it is not null.
int hashA = powerPlant.mwWeb == null ? 0 : powerPlant.mwWeb.GetHashCode();
// Get hash code for the shortName field if it is not null.
int hashB = powerPlant.shortName == null ? 0 : powerPlant.shortName.GetHashCode();
// Calculate the hash code for the product.
return hashA ^ hashB;
}
}
and then you could likely use something like one of the following depending on your needs :
var different = current.Except(old,new PowerPlantComparer());
or :
var different = current.Intersect(old,new PowerPlantComparer());
One way is to use IEqualityComparer as Rion Williams suggested, if you'd like a more flexible solution you can split logic in to two parts. First create helper method that accepts two lists, and function where you can define what properties you wish to compare. For example :
public static class Helper
{
public static SPpowerPlantList GetDifference(this SPpowerPlantList current, SPpowerPlantList old, Func<PowerPlant, PowerPlant, bool> func)
{
var diff = current.Where(p => old.All(l => func(p, l))).ToList();
var result = new SPpowerPlantList();
foreach (var item in diff) result.Add(item);
return result;
}
}
And use it :
public SPpowerPlantList compareTwoLists(string sqlServer, string database,
DateTime timestampCurrent, string noteCurrent,
DateTime timestampOld, string noteOld)
{
var powerPlantListCurrent = ...;
var powerPlantListOld = ...;
var diff = powerPlantListCurrent.GetDifference(
powerPlantListOld,
(x, y) => x.mwWeb != y.mwWeb ||
x.shortName != y.shortName);
return diff;
}
P.S. if it better suits your needs, you could move method inside of existing class :
public class MyClass
{
public SPpowerPlantList GetDifference(SPpowerPlantList current, SPpowerPlantList old, Func<PowerPlant, PowerPlant, bool> func)
{
...
}
}
And call it (inside of class) :
var result = GetDifference(currentValues, oldValues, (x, y) => x.mwWeb != y.mwWeb);
The easiest way to do this would be to compare some unique identifier (ID)
var differentObjects = powerPlantListCurrent
.Where(p => !powerPlantListOld.Any(l => p.Id == l.Id)
.ToList();
If the other properties might have been updated and you want to check that too, you'll have to compare all of them to detect changes made to existing elements:
Implement a camparison-method (IComparable, IEquatable, IEqualityComparer, or override Equals) or, if that's not possible because you didn't write the class yourself (code generated or external assembly), write a method to compare two of those SPpowerPlantList elements and use that instead of comparing every single property in Linq. For example:
public bool AreThoseTheSame(SPpowerPlantList a,SPpowerPlantList b)
{
if(a.mwWeb != b.mwWeb) return false;
if(a.shortName != b.shortName) return false;
//etc.
return true;
}
Then replace your difference call with this:
var differentObjects = powerPlantListCurrent
.Where(p => !powerPlantListOld.Any(l => AreThoseTheSame(p,l))
.ToList();
I have some massive searches happening for my AutoComplete and was wondering if someone could give any ideas to improve the performance.
What happens:
1) At application launch I am saving all database entries on the memory.
2) User types in the search box to initiate AutoComplete:
$("#MatterCode").width(110).kendoAutoComplete({
minLength: 3,
delay: 10,
dataTextField: "MatterCode",
template: '<div class="autoCompleteResultsCode"> ${ data.ClientCode } - ${ data.MatterCode } - ${ data.ClientName } - ${ data.MatterName }</div>',
dataSource: {
serverFiltering: true,
transport: {
read: "/api/matter/AutoCompleteByCode",
parameterMap: function() {
var matterCode = $("#MatterCode").val();
return { searchText: matterCode };
}
}
}, //More Stuff here
3) It goes to my controller class:
public JsonResult AutoCompleteByCode(string searchText)
{
if (string.IsNullOrEmpty(searchText))
{
Response.StatusCode = 500;
return Json(new
{
Error = "search string can't be empty"
});
}
var results = _publishedData.GetMattersForAutoCompleteByCode(searchText).Select(
matter => new
{
MatterCode = matter.Code,
MatterName = matter.Name,
ClientCode = matter.Client.Code,
ClientName = matter.Client.Name
});
return Json(results);
}
4) Which goes into the DAL (objects starting with '_' are Memory Objects)
public virtual IEnumerable<Matter> GetMattersForAutoCompleteByCode(string input)
{
InvalidateCache();
IEnumerable<Matter> results;
//Searching Matter Object on all 4 given parameters by input.
if (_lastMatters != null && input.StartsWith(_lastSearch) && _lastMatters.Any())
{
results = _lastMatters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
else
{
results = _matters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
_lastSearch = input;
return results.Take(10).ToList();
}
5) isInputLike is an internal bool method
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input.Trim().ToLower())
|| Name.ToLower().Contains(input.Trim().ToLower())
|| ClientCode.ToLower().Contains(input.Trim().ToLower())
|| ClientName.ToLower().Contains(input.Trim().ToLower()));
return check;
}
Now the result set that I have to work with can range over 100,000. Now the first Autocomplete of any new query has to search through 400,000 records and I can't think of a way to improve the performance without sacrificing the feature.
Any ideas?
Is SQL stored proc calls faster than LINQ?
I think the main issue here is you placing the 400k objects in memory to start with.
SQL is not all that slow, it's better to start with a limited set of data in the first place.
one obvious optimisation is:
internal bool IsInputLike(string input)
{
string input = input.Trim().ToLower();
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input)
|| Name.ToLower().Contains(input)
|| ClientCode.ToLower().Contains(input)
|| ClientName.ToLower().Contains(input));
return check;
}
but personally, I would keep the data where it belongs, in the SQL server (if that's what you are using).
Some indexing and the proper queries could make this faster.
When I see this code I start wondering:
public virtual IEnumerable<Matter> GetMattersForAutoCompleteByCode(string input)
{
InvalidateCache();
IEnumerable<Matter> results;
//Searching Matter Object on all 4 given parameters by input.
if (_lastMatters != null && input.StartsWith(_lastSearch) && _lastMatters.Any())
{
results = _lastMatters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
else
{
results = _matters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
_lastSearch = input;
return results.Take(10).ToList();
}
why do you need to order? Why does a dropdown autocomplete need to filter on 4 items? if you only take 10 anyway can't you just not order? See if removing the orderby gives you any better results, especially in the else statement where you'll have many results.
personally i'd go all in for LINQ to SQL and let the SQL server do the searching. optimize the indexing on this table and it'll be much faster.
I'm not much of an asp/http guy but when I see this:
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input.Trim().ToLower())
|| Name.ToLower().Contains(input.Trim().ToLower())
|| ClientCode.ToLower().Contains(input.Trim().ToLower())
|| ClientName.ToLower().Contains(input.Trim().ToLower()));
return check;
}
I think you are creating a lot of new string; and that has to take some time. Try this and see if this improves your performance
var inp = input.Trim();
bool chk = (Code.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (Name.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (ClientCode.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (ClientName.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1);
This first line (that creates inp) isn't that important since the compiler should optimize repeated usage, but I think it reads better.
The IndexOf method will not create new strings and with the StringComparison parameter you can avoid creating all the ToLower strings.
Well i recommend you to create a view that contains all of the names e.g. (code, name, Clientcode, ClientName) into a single column concatenated say FullName and replace your IsInputLike(..) as below:
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
return FullName.Contains(input);
}
I am trying to use Linq2Sql to return all rows that contain values from a list of strings. The linq2sql class object has a string property that contains words separated by spaces.
public class MyObject
{
public string MyProperty { get; set; }
}
Example MyProperty values are:
MyObject1.MyProperty = "text1 text2 text3 text4"
MyObject2.MyProperty = "text2"
For example, using a string collection, I pass the below list
var list = new List<>() { "text2", "text4" }
This would return both items in my example above as they both contain "text2" value.
I attempted the following using the below code however, because of my extension method the Linq2Sql cannot be evaluated.
public static IQueryable<MyObject> WithProperty(this IQueryable<MyProperty> qry,
IList<string> p)
{
return from t in qry
where t.MyProperty.Contains(p, ' ')
select t;
}
I also wrote an extension method
public static bool Contains(this string str, IList<string> list, char seperator)
{
if (str == null) return false;
if (list == null) return true;
var splitStr = str.Split(new char[] { seperator },
StringSplitOptions.RemoveEmptyEntries);
bool retval = false;
int matches = 0;
foreach (string s in splitStr)
{
foreach (string l in list)
{
if (String.Compare(s, l, true) == 0)
{
retval = true;
matches++;
}
}
}
return retval && (splitStr.Length > 0) && (list.Count == matches);
}
Any help or ideas on how I could achieve this?
Youre on the right track. The first parameter of your extension method WithProperty has to be of the type IQueryable<MyObject>, not IQueryable<MyProperty>.
Anyways you dont need an extension method for the IQueryable. Just use your Contains method in a lambda for filtering. This should work:
List<string> searchStrs = new List<string>() { "text2", "text4" }
IEnumerable<MyObject> myFilteredObjects = dataContext.MyObjects
.Where(myObj => myObj.MyProperty.Contains(searchStrs, ' '));
Update:
The above code snippet does not work. This is because the Contains method can not be converted into a SQL statement. I thought a while about the problem, and came to a solution by thinking about 'how would I do that in SQL?': You could do it by querying for each single keyword, and unioning all results together. Sadly the deferred execution of Linq-to-SQL prevents from doing that all in one query. So I came up with this compromise of a compromise. It queries for every single keyword. That can be one of the following:
equal to the string
in between two seperators
at the start of the string and followed by a seperator
or at the end of the string and headed by a seperator
This spans a valid expression tree and is translatable into SQL via Linq-to-SQL. After the query I dont defer the execution by immediatelly fetch the data and store it in a list. All lists are unioned afterwards.
public static IEnumerable<MyObject> ContainsOneOfTheseKeywords(
this IQueryable<MyObject> qry, List<string> keywords, char sep)
{
List<List<MyObject>> parts = new List<List<MyObject>>();
foreach (string keyw in keywords)
parts.Add((
from obj in qry
where obj.MyProperty == keyw ||
obj.MyProperty.IndexOf(sep + keyw + sep) != -1 ||
obj.MyProperty.IndexOf(keyw + sep) >= 0 ||
obj.MyProperty.IndexOf(sep + keyw) ==
obj.MyProperty.Length - keyw.Length - 1
select obj).ToList());
IEnumerable<MyObject> union = null;
bool first = true;
foreach (List<MyObject> part in parts)
{
if (first)
{
union = part;
first = false;
}
else
union = union.Union(part);
}
return union.ToList();
}
And use it:
List<string> searchStrs = new List<string>() { "text2", "text4" };
IEnumerable<MyObject> myFilteredObjects = dataContext.MyObjects
.ContainsOneOfTheseKeywords(searchStrs, ' ');
That solution is really everything else than elegant. For 10 keywords, I have to query the db 10 times and every time catch the data and store it in memory. This is wasting memory and has a bad performance. I just wanted to demonstrate that it is possible in Linq (maybe it can be optimized here or there, but I think it wont get perfect).
I would strongly recommend to swap the logic of that function into a stored procedure of your database server. One single query, optimized by the database server, and no waste of memory.
Another alternative would be to rethink your database design. If you want to query contents of one field (you are treating this field like an array of keywords, seperated by spaces), you may simply have chosen an inappropriate database design. You would rather want to create a new table with a foreign key to your table. The new table has then exactly one keyword. The queries would be much simpler, faster and more understandable.
I haven't tried, but if I remember correctly, this should work:
from t in ctx.Table
where list.Any(x => t.MyProperty.Contains(x))
select t
you can replace Any() with All() if you want all strings in list to match
EDIT:
To clarify what I was trying to do with this, here is a similar query written without linq, to explain the use of All and Any
where list.Any(x => t.MyProperty.Contains(x))
Translates to:
where t.MyProperty.Contains(list[0]) || t.MyProperty.Contains(list[1]) ||
t.MyProperty.Contains(list[n])
And
where list.Any(x => t.MyProperty.Contains(x))
Translates to:
where t.MyProperty.Contains(list[0]) && t.MyProperty.Contains(list[1]) &&
t.MyProperty.Contains(list[n])