Improve the performance of an AutoComplete LINQ Query - c#

I have some massive searches happening for my AutoComplete and was wondering if someone could give any ideas to improve the performance.
What happens:
1) At application launch I am saving all database entries on the memory.
2) User types in the search box to initiate AutoComplete:
$("#MatterCode").width(110).kendoAutoComplete({
minLength: 3,
delay: 10,
dataTextField: "MatterCode",
template: '<div class="autoCompleteResultsCode"> ${ data.ClientCode } - ${ data.MatterCode } - ${ data.ClientName } - ${ data.MatterName }</div>',
dataSource: {
serverFiltering: true,
transport: {
read: "/api/matter/AutoCompleteByCode",
parameterMap: function() {
var matterCode = $("#MatterCode").val();
return { searchText: matterCode };
}
}
}, //More Stuff here
3) It goes to my controller class:
public JsonResult AutoCompleteByCode(string searchText)
{
if (string.IsNullOrEmpty(searchText))
{
Response.StatusCode = 500;
return Json(new
{
Error = "search string can't be empty"
});
}
var results = _publishedData.GetMattersForAutoCompleteByCode(searchText).Select(
matter => new
{
MatterCode = matter.Code,
MatterName = matter.Name,
ClientCode = matter.Client.Code,
ClientName = matter.Client.Name
});
return Json(results);
}
4) Which goes into the DAL (objects starting with '_' are Memory Objects)
public virtual IEnumerable<Matter> GetMattersForAutoCompleteByCode(string input)
{
InvalidateCache();
IEnumerable<Matter> results;
//Searching Matter Object on all 4 given parameters by input.
if (_lastMatters != null && input.StartsWith(_lastSearch) && _lastMatters.Any())
{
results = _lastMatters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
else
{
results = _matters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
_lastSearch = input;
return results.Take(10).ToList();
}
5) isInputLike is an internal bool method
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input.Trim().ToLower())
|| Name.ToLower().Contains(input.Trim().ToLower())
|| ClientCode.ToLower().Contains(input.Trim().ToLower())
|| ClientName.ToLower().Contains(input.Trim().ToLower()));
return check;
}
Now the result set that I have to work with can range over 100,000. Now the first Autocomplete of any new query has to search through 400,000 records and I can't think of a way to improve the performance without sacrificing the feature.
Any ideas?
Is SQL stored proc calls faster than LINQ?

I think the main issue here is you placing the 400k objects in memory to start with.
SQL is not all that slow, it's better to start with a limited set of data in the first place.
one obvious optimisation is:
internal bool IsInputLike(string input)
{
string input = input.Trim().ToLower();
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input)
|| Name.ToLower().Contains(input)
|| ClientCode.ToLower().Contains(input)
|| ClientName.ToLower().Contains(input));
return check;
}
but personally, I would keep the data where it belongs, in the SQL server (if that's what you are using).
Some indexing and the proper queries could make this faster.
When I see this code I start wondering:
public virtual IEnumerable<Matter> GetMattersForAutoCompleteByCode(string input)
{
InvalidateCache();
IEnumerable<Matter> results;
//Searching Matter Object on all 4 given parameters by input.
if (_lastMatters != null && input.StartsWith(_lastSearch) && _lastMatters.Any())
{
results = _lastMatters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
else
{
results = _matters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
_lastSearch = input;
return results.Take(10).ToList();
}
why do you need to order? Why does a dropdown autocomplete need to filter on 4 items? if you only take 10 anyway can't you just not order? See if removing the orderby gives you any better results, especially in the else statement where you'll have many results.
personally i'd go all in for LINQ to SQL and let the SQL server do the searching. optimize the indexing on this table and it'll be much faster.

I'm not much of an asp/http guy but when I see this:
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input.Trim().ToLower())
|| Name.ToLower().Contains(input.Trim().ToLower())
|| ClientCode.ToLower().Contains(input.Trim().ToLower())
|| ClientName.ToLower().Contains(input.Trim().ToLower()));
return check;
}
I think you are creating a lot of new string; and that has to take some time. Try this and see if this improves your performance
var inp = input.Trim();
bool chk = (Code.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (Name.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (ClientCode.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (ClientName.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1);
This first line (that creates inp) isn't that important since the compiler should optimize repeated usage, but I think it reads better.
The IndexOf method will not create new strings and with the StringComparison parameter you can avoid creating all the ToLower strings.

Well i recommend you to create a view that contains all of the names e.g. (code, name, Clientcode, ClientName) into a single column concatenated say FullName and replace your IsInputLike(..) as below:
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
return FullName.Contains(input);
}

Related

How to speed up LINQ WHERE?

I have run a profiler on my .NET winforms app (compiled with .NET 4.7.1) and it is pointing at the following function as consuming 73% of my application's CPU time, which seems like far too much for a simple utility function:
public static bool DoesRecordExist(string keyColumn1, string keyColumn2, string keyColumn3,
string keyValue1, string keyValue2, string keyValue3, DataTable dt)
{
if (dt != null && dt.Rows.Count > 0) {
bool exists = dt.AsEnumerable()
.Where(r =>
string.Equals(SafeTrim(r[keyColumn1]), keyValue1, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn2]), keyValue2, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn3]), keyValue3, StringComparison.CurrentCultureIgnoreCase)
)
.Any();
return exists;
} else {
return false;
}
}
The purpose of this function is to pass in some key column names and matching key values, and checking whether any matching record exists in the in-memory c# DataTable.
My app is processing hundreds of thousands of records and for each record, this function must be called multiple times. The app is doing a lot of inserts, and before any insert, it must check whether that record already exists in the database. I figured that an in-memory check against the DataTable would be much faster than going back to the physical database each time, so that's why I'm doing this in-memory check. Each time I do a database insert, I do a corresponding insert into the DataTable, so that subsequent checks as to whether the record exists will be accurate.
So to my question: Is there a faster approach? (I don't think I can avoid checking for record existence each and every time, else I'll end up with duplicate inserts and key violations.)
EDIT #1
In addition to trying the suggestions that have been coming in, which I'm trying now, it occurred to me that I should also maybe do the .AsEnumerable() only once and pass in the EnumerableRowCollection<DataRow> instead of the DataTable. Do you think this will help?
EDIT #2
I just did a controlled test and found that querying the database directly to see if a record already exists is dramatically slower than doing an in-memory lookup.
You should try parallel execution, this should be a very good case for that as you mentioned you are working with a huge set, and no orderliness is needed if you just want to check if a record already exists.
bool exists = dt.AsEnumerable().AsParallel().Any((r =>
string.Equals(SafeTrim(r[keyColumn1]), keyValue1, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn2]), keyValue2, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn3]), keyValue3, StringComparison.CurrentCultureIgnoreCase)
)
Your solution find all occurences which evaluates true in the condition and then you ask if there is any. Instead use Any directly. Replace Where with Any. It will stop processing when hits first true evaulation of the condition.
bool exists = dt.AsEnumerable().Any(r => condition);
I suggest that you are keeping the key columns of the existing records in a HashSet. I'm using tuples here, but you could also create your own Key struct or class by overriding GetHashCode and Equals.
private HashSet<(string, string, string)> _existingKeys =
new HashSet<(string, string, string)>();
Then you can test the existence of a key very quickly with
if (_existingKeys.Contains((keyValue1, keyValue2, keyValue3))) {
...
}
Don't forget to keep this HashSet in sync with your additions and deletions. Note that tuples cannot be compared with CurrentCultureIgnoreCase. Therefore either convert all the keys to lower case, or use the custom struct approach where you can use the desired comparison method.
public readonly struct Key
{
public Key(string key1, string key2, string key3) : this()
{
Key1 = key1?.Trim() ?? "";
Key2 = key2?.Trim() ?? "";
Key3 = key3?.Trim() ?? "";
}
public string Key1 { get; }
public string Key2 { get; }
public string Key3 { get; }
public override bool Equals(object obj)
{
if (!(obj is Key)) {
return false;
}
var key = (Key)obj;
return
String.Equals(Key1, key.Key1, StringComparison.CurrentCultureIgnoreCase) &&
String.Equals(Key2, key.Key2, StringComparison.CurrentCultureIgnoreCase) &&
String.Equals(Key3, key.Key3, StringComparison.CurrentCultureIgnoreCase);
}
public override int GetHashCode()
{
int hashCode = -2131266610;
unchecked {
hashCode = hashCode * -1521134295 + StringComparer.CurrentCultureIgnoreCase.GetHashCode(Key1);
hashCode = hashCode * -1521134295 + StringComparer.CurrentCultureIgnoreCase.GetHashCode(Key2);
hashCode = hashCode * -1521134295 + StringComparer.CurrentCultureIgnoreCase.GetHashCode(Key3);
}
return hashCode;
}
}
Another question is whether it is a good idea to use the current culture when comparing db keys. Users with different cultures might get different results. Better explicitly specify the same culture used by the db.
It might be that you want to transpose your data structure. Instead of having a DataTable where each row has keyColumn1, keyColumn2 and keyColumn3, have 3 HashSet<string>, where the first contains all of the keyColumn1 values, etc.
Doing this should be a lot faster than iterating through each of the rows:
var hashSetColumn1 = new HashSet<string>(
dt.Rows.Select(x => x[keyColumn1]),
StringComparison.CurrentCultureIgnoreCase);
var hashSetColumn2 = new HashSet<string>(
dt.Rows.Select(x => x[keyColumn2]),
StringComparison.CurrentCultureIgnoreCase);
var hashSetColumn3 = new HashSet<string>(
dt.Rows.Select(x => x[keyColumn3]),
StringComparison.CurrentCultureIgnoreCase);
Obviously, create these once, and then maintain them (as you're currently maintaining your DataTable). They're expensive to create, but cheap to query.
Then:
bool exists = hashSetColumn1.Contains(keyValue1) &&
hashSetColumn2.Contains(keyValue2) &&
hashSetColumn3.Contains(keyValue3);
Alternatively (and more cleanly), you can define your own struct which contains values from the 3 columns, and use a single HashSet:
public struct Row : IEquatable<Row>
{
// Convenience
private static readonly IEqualityComparer<string> comparer = StringComparer.CurrentCultureIngoreCase;
public string Value1 { get; }
public string Value2 { get; }
public string Value3 { get; }
public Row(string value1, string value2, string value3)
{
Value1 = value1;
Value2 = value2;
Value3 = value3;
}
public override bool Equals(object obj) => obj is Row row && Equals(row);
public bool Equals(Row other)
{
return comparer.Equals(Value1, other.Value1) &&
comparer.Equals(Value2, other.Value2) &&
comparer.Equals(Value3, other.Value3);
}
public override int GetHashCode()
{
unchecked
{
int hash = 17;
hash = hash * 23 + comparer.GetHashCode(Value1);
hash = hash * 23 + comparer.GetHashCode(Value2);
hash = hash * 23 + comparer.GetHashCode(Value3);
return hash;
}
}
public static bool operator ==(Row left, Row right) => left.Equals(right);
public static bool operator !=(Row left, Row right) => !(left == right);
}
Then you can make a:
var hashSet = new HashSet<Row>(dt.Select(x => new Row(x[keyColumn1], x[keyColumn2], x[keyColumn3]));
And cache that. Query it like:
hashSet.Contains(new Row(keyValue1, keyValue2, keyValue3));
In some cases using LINQ won't optimize as good as a sequential query, so you might be better of writing the query just the old-fashined way:
public static bool DoesRecordExist(string keyColumn1, string keyColumn2, string keyColumn3,
string keyValue1, string keyValue2, string keyValue3, DataTable dt)
{
if (dt != null)
{
foreach (var r in dt.Rows)
{
if (string.Equals(SafeTrim(r[keyColumn1]), keyValue1, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn2]), keyValue2, StringComparison.CurrentCultureIgnoreCase) &&
string.Equals(SafeTrim(r[keyColumn3]), keyValue3, StringComparison.CurrentCultureIgnoreCase)
{
return true;
}
}
}
return false;
}
But there might be more structural improvements, but this depends on the situation whether you can use it.
Option 1: Making the selection already in the database
You are using a DataTable, so there is a chance that you fetch the data from the database. If you have a lot of records, then it might make more sense to move this check to the database. When using the proper indexes it might be way faster then an in-memory tablescan.
Option 2: Replace string.Equals+SafeTrim with a custom method
You are using SafeTrim up to three times per row, which creates a lot of new strings. When you create your own method that compares both strings (string.Equals) with respect to leading/trailing whitespaces (SafeTrim), but without creating a new string then this could be way faster, reduce memory load and reduce garbage collection. If the implementation is good enough to inline, then you'll gain a lot of performance.
Option 3: Check the columns in the proper order
Make sure you use the proper order and specify the column that has the least probability to match as keyColumn1. This will make the if-statement result to false sooner. If keyColumn1 matches in 80% of the cases, then you need to perform a lot more comparisons.

Most efficient way to search enumerable

I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.

Slowness when chaining LINQ queries

I am doing chaining LINQ queries as show below. I am trying to find out the cause for the slowness of query.ToList();. The SQL queries are fast (milliseconds), but the code takes a minute. The reason for chaining is to reuse the repository function.
Is there any obvious reasons for slowness here?
How could I optimize this ?
How can I check the actual SQL query executed when
running query.ToList();?
//Client
var query = _service.GetResultsByStatus(status, bType, tType);
var result = query.ToList(); //takes a long time to execute
//Service function
public IEnumerable<CustomResult> GetResultsByStatus(string status, string bType, string tType) {
IEnumerable<CustomResult> result = null;
result = repo.GetResults(bType).Where(item => item.tStatus == status && (tType == null || item.tType == tType))
.Select(item => new CustomResult {
A = item.A,
B = item.B,
});
return result;
}
// Repository Function (reused in many places)
public IEnumerable<my_model> GetResults(string bType) {
return from p in dbContext.my_model()
where p.bType.Equals(bType)
select p;
}
Your .Where(item => item.tStatus == status && (tType == null || item.tType == tType)) and the .Select are being done "locally" on your PC... Tons of useless rows and columns are being returned by the SQL to be then "filtered" on your PC.
public IEnumerable<my_model> GetResults(string bType) {
return from p in dbContext.my_model()
where p.bType.Equals(bType)
select p;
}
Change it to
public IQueryable<my_model> GetResults(string bType) {
Normally IEnumerable<> means "downstream LINQ will be executed locally", IQueryable<> means "downstream LINQ will be executed on a server". In this case the Where and the Select are "downstream" from the transformation of the query in a IEnumerable<>. Note that while it is possible (and easy) to convert an IQueryable<> to an IEnumerable<>, the opposite normally isn't possible. The AsQueryable<> creates a "fake" IQueryable<> that is executed locally and is mainly useful in unit tests.

filter IQueryable in a loop with multiple Where statements

i have this code
private static IQueryable<Persoon> Filter(IQueryable<Persoon> qF, IDictionary<string, string> filter)
{
IQueryable<Persoon> temp;
temp = qF;
foreach (var key in filter)
{
if (key.Key == "naam")
{
temp = temp.Where(f => f.Naam == key.Value);
}
else if (key.Key == "leeftijd")
{
temp = temp.Where(af => af.Leeftijd != null && af.Leeftijd.AantalJaarOud.ToString() == key.Value);
}
}
return temp;
}
what it does (it's a simplified version which is made to test the behaviour) is that you give this function a IQueryable of Persoon (from the database) and a list of filters.
So you give the filter naam,john and leefttijd,30 you get all Persoon objecten named John and Age 30.
When i enter the loop first, right after i do the first where (the leeftijd where) at the } after it, i see that tmp has 3 objects. Then the code goes for the second time in the loop, enters the first If (where filter eq naam) and right there, when i look at tmp, it only has 0 objects.
what the first view of it not working was, was the fact that the function returns no results (should be 2: 3 30's and 2 Johns of those). So i concluded that the multiple .Where was the problem.
But now i see that the temp is empty even BEFORE i do the second where.
What am i doing wrong?
LINQ's lambda expressions use late parameter binding so when the xpression is finally processed the variable "key" no longer points at the right values.
Try changing your code to store key in a local variable and use it instead:
private static IQueryable<Persoon> Filter(IQueryable<Persoon> qF, IDictionary<string, string> filter)
{
IQueryable<Persoon> temp;
temp = qF;
foreach (var key in filter)
{
var currentKeyValue = key.Value;
if (key.Key == "naam")
{
temp = temp.Where(f => f.Naam == currentKeyValue);
}
else if (key.Key == "leeftijd")
{
temp = temp.Where(af => af.Leeftijd != null && af.Leeftijd.AantalJaarOud == Int32.Parse(currentKeyValue));
}
}
return temp;
}
Another thing I think you should change is the casting of the age field to string rather than the opposite direction. therefore the database is comparing numbers rather than strings.

c# finding matching words in table column using Linq2Sql

I am trying to use Linq2Sql to return all rows that contain values from a list of strings. The linq2sql class object has a string property that contains words separated by spaces.
public class MyObject
{
public string MyProperty { get; set; }
}
Example MyProperty values are:
MyObject1.MyProperty = "text1 text2 text3 text4"
MyObject2.MyProperty = "text2"
For example, using a string collection, I pass the below list
var list = new List<>() { "text2", "text4" }
This would return both items in my example above as they both contain "text2" value.
I attempted the following using the below code however, because of my extension method the Linq2Sql cannot be evaluated.
public static IQueryable<MyObject> WithProperty(this IQueryable<MyProperty> qry,
IList<string> p)
{
return from t in qry
where t.MyProperty.Contains(p, ' ')
select t;
}
I also wrote an extension method
public static bool Contains(this string str, IList<string> list, char seperator)
{
if (str == null) return false;
if (list == null) return true;
var splitStr = str.Split(new char[] { seperator },
StringSplitOptions.RemoveEmptyEntries);
bool retval = false;
int matches = 0;
foreach (string s in splitStr)
{
foreach (string l in list)
{
if (String.Compare(s, l, true) == 0)
{
retval = true;
matches++;
}
}
}
return retval && (splitStr.Length > 0) && (list.Count == matches);
}
Any help or ideas on how I could achieve this?
Youre on the right track. The first parameter of your extension method WithProperty has to be of the type IQueryable<MyObject>, not IQueryable<MyProperty>.
Anyways you dont need an extension method for the IQueryable. Just use your Contains method in a lambda for filtering. This should work:
List<string> searchStrs = new List<string>() { "text2", "text4" }
IEnumerable<MyObject> myFilteredObjects = dataContext.MyObjects
.Where(myObj => myObj.MyProperty.Contains(searchStrs, ' '));
Update:
The above code snippet does not work. This is because the Contains method can not be converted into a SQL statement. I thought a while about the problem, and came to a solution by thinking about 'how would I do that in SQL?': You could do it by querying for each single keyword, and unioning all results together. Sadly the deferred execution of Linq-to-SQL prevents from doing that all in one query. So I came up with this compromise of a compromise. It queries for every single keyword. That can be one of the following:
equal to the string
in between two seperators
at the start of the string and followed by a seperator
or at the end of the string and headed by a seperator
This spans a valid expression tree and is translatable into SQL via Linq-to-SQL. After the query I dont defer the execution by immediatelly fetch the data and store it in a list. All lists are unioned afterwards.
public static IEnumerable<MyObject> ContainsOneOfTheseKeywords(
this IQueryable<MyObject> qry, List<string> keywords, char sep)
{
List<List<MyObject>> parts = new List<List<MyObject>>();
foreach (string keyw in keywords)
parts.Add((
from obj in qry
where obj.MyProperty == keyw ||
obj.MyProperty.IndexOf(sep + keyw + sep) != -1 ||
obj.MyProperty.IndexOf(keyw + sep) >= 0 ||
obj.MyProperty.IndexOf(sep + keyw) ==
obj.MyProperty.Length - keyw.Length - 1
select obj).ToList());
IEnumerable<MyObject> union = null;
bool first = true;
foreach (List<MyObject> part in parts)
{
if (first)
{
union = part;
first = false;
}
else
union = union.Union(part);
}
return union.ToList();
}
And use it:
List<string> searchStrs = new List<string>() { "text2", "text4" };
IEnumerable<MyObject> myFilteredObjects = dataContext.MyObjects
.ContainsOneOfTheseKeywords(searchStrs, ' ');
That solution is really everything else than elegant. For 10 keywords, I have to query the db 10 times and every time catch the data and store it in memory. This is wasting memory and has a bad performance. I just wanted to demonstrate that it is possible in Linq (maybe it can be optimized here or there, but I think it wont get perfect).
I would strongly recommend to swap the logic of that function into a stored procedure of your database server. One single query, optimized by the database server, and no waste of memory.
Another alternative would be to rethink your database design. If you want to query contents of one field (you are treating this field like an array of keywords, seperated by spaces), you may simply have chosen an inappropriate database design. You would rather want to create a new table with a foreign key to your table. The new table has then exactly one keyword. The queries would be much simpler, faster and more understandable.
I haven't tried, but if I remember correctly, this should work:
from t in ctx.Table
where list.Any(x => t.MyProperty.Contains(x))
select t
you can replace Any() with All() if you want all strings in list to match
EDIT:
To clarify what I was trying to do with this, here is a similar query written without linq, to explain the use of All and Any
where list.Any(x => t.MyProperty.Contains(x))
Translates to:
where t.MyProperty.Contains(list[0]) || t.MyProperty.Contains(list[1]) ||
t.MyProperty.Contains(list[n])
And
where list.Any(x => t.MyProperty.Contains(x))
Translates to:
where t.MyProperty.Contains(list[0]) && t.MyProperty.Contains(list[1]) &&
t.MyProperty.Contains(list[n])

Categories