c# finding matching words in table column using Linq2Sql - c#

I am trying to use Linq2Sql to return all rows that contain values from a list of strings. The linq2sql class object has a string property that contains words separated by spaces.
public class MyObject
{
public string MyProperty { get; set; }
}
Example MyProperty values are:
MyObject1.MyProperty = "text1 text2 text3 text4"
MyObject2.MyProperty = "text2"
For example, using a string collection, I pass the below list
var list = new List<>() { "text2", "text4" }
This would return both items in my example above as they both contain "text2" value.
I attempted the following using the below code however, because of my extension method the Linq2Sql cannot be evaluated.
public static IQueryable<MyObject> WithProperty(this IQueryable<MyProperty> qry,
IList<string> p)
{
return from t in qry
where t.MyProperty.Contains(p, ' ')
select t;
}
I also wrote an extension method
public static bool Contains(this string str, IList<string> list, char seperator)
{
if (str == null) return false;
if (list == null) return true;
var splitStr = str.Split(new char[] { seperator },
StringSplitOptions.RemoveEmptyEntries);
bool retval = false;
int matches = 0;
foreach (string s in splitStr)
{
foreach (string l in list)
{
if (String.Compare(s, l, true) == 0)
{
retval = true;
matches++;
}
}
}
return retval && (splitStr.Length > 0) && (list.Count == matches);
}
Any help or ideas on how I could achieve this?

Youre on the right track. The first parameter of your extension method WithProperty has to be of the type IQueryable<MyObject>, not IQueryable<MyProperty>.
Anyways you dont need an extension method for the IQueryable. Just use your Contains method in a lambda for filtering. This should work:
List<string> searchStrs = new List<string>() { "text2", "text4" }
IEnumerable<MyObject> myFilteredObjects = dataContext.MyObjects
.Where(myObj => myObj.MyProperty.Contains(searchStrs, ' '));
Update:
The above code snippet does not work. This is because the Contains method can not be converted into a SQL statement. I thought a while about the problem, and came to a solution by thinking about 'how would I do that in SQL?': You could do it by querying for each single keyword, and unioning all results together. Sadly the deferred execution of Linq-to-SQL prevents from doing that all in one query. So I came up with this compromise of a compromise. It queries for every single keyword. That can be one of the following:
equal to the string
in between two seperators
at the start of the string and followed by a seperator
or at the end of the string and headed by a seperator
This spans a valid expression tree and is translatable into SQL via Linq-to-SQL. After the query I dont defer the execution by immediatelly fetch the data and store it in a list. All lists are unioned afterwards.
public static IEnumerable<MyObject> ContainsOneOfTheseKeywords(
this IQueryable<MyObject> qry, List<string> keywords, char sep)
{
List<List<MyObject>> parts = new List<List<MyObject>>();
foreach (string keyw in keywords)
parts.Add((
from obj in qry
where obj.MyProperty == keyw ||
obj.MyProperty.IndexOf(sep + keyw + sep) != -1 ||
obj.MyProperty.IndexOf(keyw + sep) >= 0 ||
obj.MyProperty.IndexOf(sep + keyw) ==
obj.MyProperty.Length - keyw.Length - 1
select obj).ToList());
IEnumerable<MyObject> union = null;
bool first = true;
foreach (List<MyObject> part in parts)
{
if (first)
{
union = part;
first = false;
}
else
union = union.Union(part);
}
return union.ToList();
}
And use it:
List<string> searchStrs = new List<string>() { "text2", "text4" };
IEnumerable<MyObject> myFilteredObjects = dataContext.MyObjects
.ContainsOneOfTheseKeywords(searchStrs, ' ');
That solution is really everything else than elegant. For 10 keywords, I have to query the db 10 times and every time catch the data and store it in memory. This is wasting memory and has a bad performance. I just wanted to demonstrate that it is possible in Linq (maybe it can be optimized here or there, but I think it wont get perfect).
I would strongly recommend to swap the logic of that function into a stored procedure of your database server. One single query, optimized by the database server, and no waste of memory.
Another alternative would be to rethink your database design. If you want to query contents of one field (you are treating this field like an array of keywords, seperated by spaces), you may simply have chosen an inappropriate database design. You would rather want to create a new table with a foreign key to your table. The new table has then exactly one keyword. The queries would be much simpler, faster and more understandable.

I haven't tried, but if I remember correctly, this should work:
from t in ctx.Table
where list.Any(x => t.MyProperty.Contains(x))
select t
you can replace Any() with All() if you want all strings in list to match
EDIT:
To clarify what I was trying to do with this, here is a similar query written without linq, to explain the use of All and Any
where list.Any(x => t.MyProperty.Contains(x))
Translates to:
where t.MyProperty.Contains(list[0]) || t.MyProperty.Contains(list[1]) ||
t.MyProperty.Contains(list[n])
And
where list.Any(x => t.MyProperty.Contains(x))
Translates to:
where t.MyProperty.Contains(list[0]) && t.MyProperty.Contains(list[1]) &&
t.MyProperty.Contains(list[n])

Related

How to replace multiple characters with one character in linq c#

We are trying to replace multiple character in property(string) that we query from database.
var list = _context.Users.Where(t => t.Enable).AsQueryable();
list = list.Where(t => t.Name.ToLower().Contains(searchValue));
Property Name should be without characters (.,-').
We have tried:
list = list.Where(t => t.Name.ToLower().Replace(".","").Replace(",","").Replace("-","").Contains(searchValue));
and it works like this, but we don't want to use replace multiple times.
Is there any other ways that works with IQueryable? Thanks.
We have decided to do it in database,SQL, creating View like this:
CREATE OR ALTER VIEW Users_View
AS
SELECT Id,CreationDate, UserName = REPLACE(TRANSLATE(Users.UserName, '_'',.-', '#####'), '#', '')
FROM Users;
and than we just do query on view, like this UserName is already without special characters.
Well, the first thing you want to do is add a column to your table called "CanonicalName"
Then your query becomes just:
var list = (from t in _context.Users
where t.Enable && t.CanonicalName.Contains(searchValue)
select t).ToList();
Now, you need to populate CanonicalName. Since you only have to do this once ever for each record, it doesn't have to be that efficient, but here goes:
public string Canonicalize(string str)
{
var sb = new StringBuilder(str.Length);
foreach(var c in str)
{
if (c == '.' ||
c == ',' ||
c == '-') // you may wish to add others
continue;
c = Char.ToLower(c);
sb.Append(c);
}
return sb.ToString();
}
UPDATE: Since people need everything spelled out...
foreach(var u in _context.Users)
u.CanonicalName = Canonicalize(u.Name);
_context.SaveChanges();

Improve the performance of an AutoComplete LINQ Query

I have some massive searches happening for my AutoComplete and was wondering if someone could give any ideas to improve the performance.
What happens:
1) At application launch I am saving all database entries on the memory.
2) User types in the search box to initiate AutoComplete:
$("#MatterCode").width(110).kendoAutoComplete({
minLength: 3,
delay: 10,
dataTextField: "MatterCode",
template: '<div class="autoCompleteResultsCode"> ${ data.ClientCode } - ${ data.MatterCode } - ${ data.ClientName } - ${ data.MatterName }</div>',
dataSource: {
serverFiltering: true,
transport: {
read: "/api/matter/AutoCompleteByCode",
parameterMap: function() {
var matterCode = $("#MatterCode").val();
return { searchText: matterCode };
}
}
}, //More Stuff here
3) It goes to my controller class:
public JsonResult AutoCompleteByCode(string searchText)
{
if (string.IsNullOrEmpty(searchText))
{
Response.StatusCode = 500;
return Json(new
{
Error = "search string can't be empty"
});
}
var results = _publishedData.GetMattersForAutoCompleteByCode(searchText).Select(
matter => new
{
MatterCode = matter.Code,
MatterName = matter.Name,
ClientCode = matter.Client.Code,
ClientName = matter.Client.Name
});
return Json(results);
}
4) Which goes into the DAL (objects starting with '_' are Memory Objects)
public virtual IEnumerable<Matter> GetMattersForAutoCompleteByCode(string input)
{
InvalidateCache();
IEnumerable<Matter> results;
//Searching Matter Object on all 4 given parameters by input.
if (_lastMatters != null && input.StartsWith(_lastSearch) && _lastMatters.Any())
{
results = _lastMatters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
else
{
results = _matters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
_lastSearch = input;
return results.Take(10).ToList();
}
5) isInputLike is an internal bool method
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input.Trim().ToLower())
|| Name.ToLower().Contains(input.Trim().ToLower())
|| ClientCode.ToLower().Contains(input.Trim().ToLower())
|| ClientName.ToLower().Contains(input.Trim().ToLower()));
return check;
}
Now the result set that I have to work with can range over 100,000. Now the first Autocomplete of any new query has to search through 400,000 records and I can't think of a way to improve the performance without sacrificing the feature.
Any ideas?
Is SQL stored proc calls faster than LINQ?
I think the main issue here is you placing the 400k objects in memory to start with.
SQL is not all that slow, it's better to start with a limited set of data in the first place.
one obvious optimisation is:
internal bool IsInputLike(string input)
{
string input = input.Trim().ToLower();
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input)
|| Name.ToLower().Contains(input)
|| ClientCode.ToLower().Contains(input)
|| ClientName.ToLower().Contains(input));
return check;
}
but personally, I would keep the data where it belongs, in the SQL server (if that's what you are using).
Some indexing and the proper queries could make this faster.
When I see this code I start wondering:
public virtual IEnumerable<Matter> GetMattersForAutoCompleteByCode(string input)
{
InvalidateCache();
IEnumerable<Matter> results;
//Searching Matter Object on all 4 given parameters by input.
if (_lastMatters != null && input.StartsWith(_lastSearch) && _lastMatters.Any())
{
results = _lastMatters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
else
{
results = _matters.Where(m => m.IsInputLike(input)).OrderBy(m => m.Code);
_lastMatters = results;
}
_lastSearch = input;
return results.Take(10).ToList();
}
why do you need to order? Why does a dropdown autocomplete need to filter on 4 items? if you only take 10 anyway can't you just not order? See if removing the orderby gives you any better results, especially in the else statement where you'll have many results.
personally i'd go all in for LINQ to SQL and let the SQL server do the searching. optimize the indexing on this table and it'll be much faster.
I'm not much of an asp/http guy but when I see this:
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
bool check = (Code.ToLower().Contains(input.Trim().ToLower())
|| Name.ToLower().Contains(input.Trim().ToLower())
|| ClientCode.ToLower().Contains(input.Trim().ToLower())
|| ClientName.ToLower().Contains(input.Trim().ToLower()));
return check;
}
I think you are creating a lot of new string; and that has to take some time. Try this and see if this improves your performance
var inp = input.Trim();
bool chk = (Code.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (Name.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (ClientCode.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1)
|| (ClientName.IndexOf(inp, StringComparison.CurrentCultureIgnoreCase) > -1);
This first line (that creates inp) isn't that important since the compiler should optimize repeated usage, but I think it reads better.
The IndexOf method will not create new strings and with the StringComparison parameter you can avoid creating all the ToLower strings.
Well i recommend you to create a view that contains all of the names e.g. (code, name, Clientcode, ClientName) into a single column concatenated say FullName and replace your IsInputLike(..) as below:
internal bool IsInputLike(string input)
{
//Check to see if the input statement exists in any of the 4 fields
return FullName.Contains(input);
}

Check Linq Result for value

How can I check the results of LINQ query for a specific string value?
I have the following linq query:
IEnumerable<DataRow> rows = searchParamsTable.AsEnumerable()
.Where(r => r.Field<String>("TABLE") == tableNumbers[i].ToString()
&& r.Field<String>("FIELD ") == fieldName[i]);
I want to see if the result of that query contains a string(passed in form a text box) "wildcardSearchString".
var searchRows =
rows.Where(tr => tr.ItemArray
.Any(ti => ti.ToString().IndexOf("wildcardSearchString", StringComparison.CurrentCultureIgnoreCase) > 0))
This will go through each of the rows that was returned, and see if "wildcardSearchString" is in the rows item string representation (ignoring case). Here's the problem though, this won't get you wildcard search support, so you'll have to figure that one out yourself. You can try to use Regex, which would require a slight modification:
string searchPattern = "some*string".Replace("*", ".*");
var searchRows =
rows.Where(tr => tr.ItemArray
.Any(ti => Regex.IsMatch(ti.ToString(), searchPattern)))
Hope that helps. Just be warned that if they decide to try supplying a Regex pattern than this might really mess up whatever they were searching for, so you just need to be careful of input.
try with this code
DataRow[] array = rows.ToArray();
array.Contains(yourIndex, yourTextBox.Text);
Add this extension
public static bool Contains(this DataRow[] dataRows, string value, int index)
{
foreach(var row in dataRows)
{
if(row[index].ToString().Contains(value))
{
return true;
}
}
return false;
}
Boolean found = false;
foreach(Datarow d in rows)
{
foreach(object o in d.ItemArray)
{
if(o.ToString().Contains("test")
{
found=true;
break;
}
}
}
Do you mean something like this?
I don't know if you're aware of the built-in search capabilities of a DataTable? You could use its Select method:
DataRow[] rows = searchParamsTable
.Select("TABLE = 'Table1' AND FIELD like '%wildcardSearchString%'");
Linq is OK but not always required :).

Compose LINQ-to-SQL predicates into a single predicate

(An earlier question, Recursively (?) compose LINQ predicates into a single predicate, is similar to this but I actually asked the wrong question... the solution there satisfied the question as posed, but isn't actually what I need. They are different, though. Honest.)
Given the following search text:
"keyword1 keyword2 ... keywordN"
I want to end up with the following SQL:
SELECT [columns] FROM Customer
WHERE (
Customer.Forenames LIKE '%keyword1%'
OR
Customer.Forenames LIKE '%keyword2%'
OR
...
OR
Customer.Forenames LIKE '%keywordN%'
) AND (
Customer.Surname LIKE '%keyword1%'
OR
Customer.Surname LIKE '%keyword2%'
OR
....
OR
Customer.Surname LIKE '%keywordN%'
)
Effectively, we're splitting the search text on spaces, trimming each token, constructing a multi-part OR clause based on each , and then AND'ing the clauses together.
I'm doing this in Linq-to-SQL, and I have no idea how to dynamically compose a predicate based on an arbitrarily-long list of subpredicates. For a known number of clauses, it's easy to compose the predicates manually:
dataContext.Customers.Where(
(
Customer.Forenames.Contains("keyword1")
||
Customer.Forenames.Contains("keyword2")
) && (
Customer.Surname.Contains("keyword1")
||
Customer.Surname.Contains("keyword2")
)
);
In short, I need a technique that, given two predicates, will return a single predicate composing the two source predicates with a supplied operator, but restricted to the operators explicitly supported by Linq-to-SQL. Any ideas?
You can use the PredicateBuilder class
IQueryable<Customer> SearchCustomers (params string[] keywords)
{
var predicate = PredicateBuilder.False<Customer>();
foreach (string keyword in keywords)
{
// Note that you *must* declare a variable inside the loop
// otherwise all your lambdas end up referencing whatever
// the value of "keyword" is when they're finally executed.
string temp = keyword;
predicate = predicate.Or (p => p.Forenames.Contains (temp));
}
return dataContext.Customers.Where (predicate);
}
(that's actually the example from the PredicateBuilder page, I just adapted it to your case...)
EDIT:
Actually I misread your question, and my example above only covers a part of the solution... The following method should do what you want :
IQueryable<Customer> SearchCustomers (string[] forenameKeyWords, string[] surnameKeywords)
{
var predicate = PredicateBuilder.True<Customer>();
var forenamePredicate = PredicateBuilder.False<Customer>();
foreach (string keyword in forenameKeyWords)
{
string temp = keyword;
forenamePredicate = forenamePredicate.Or (p => p.Forenames.Contains (temp));
}
predicate = PredicateBuilder.And(forenamePredicate);
var surnamePredicate = PredicateBuilder.False<Customer>();
foreach (string keyword in surnameKeyWords)
{
string temp = keyword;
surnamePredicate = surnamePredicate.Or (p => p.Surnames.Contains (temp));
}
predicate = PredicateBuilder.And(surnamePredicate);
return dataContext.Customers.Where(predicate);
}
You can use it like that:
var query = SearchCustomers(
new[] { "keyword1", "keyword2" },
new[] { "keyword3", "keyword4" });
foreach (var Customer in query)
{
...
}
Normally you would chain invocations of .Where(...). E.g.:
var a = dataContext.Customers;
if (kwd1 != null)
a = a.Where(t => t.Customer.Forenames.Contains(kwd1));
if (kwd2 != null)
a = a.Where(t => t.Customer.Forenames.Contains(kwd2));
// ...
return a;
LINQ-to-SQL would weld it all back together into a single WHERE clause.
This doesn't work with OR, however. You could use unions and intersections, but I'm not sure whether LINQ-to-SQL (or SQL Server) is clever enough to fold it back to a single WHERE clause. OTOH, it won't matter if performance doesn't suffer. Anyway, it would look something like this:
<The type of dataContext.Customers> ff = null, ss = null;
foreach (k in keywords) {
if (keywords != null) {
var f = dataContext.Customers.Where(t => t.Customer.Forenames.Contains(k));
ff = ff == null ? f : ff.Union(f);
var s = dataContext.Customers.Where(t => t.Customer.Surname.Contains(k));
ss = ss == null ? s : ss.Union(s);
}
}
return ff.Intersect(ss);

How can I make this LINQ search method handle more than two terms?

The following search method works fine for up to two terms.
How can I make it dynamic so that it is able to handle any number of search terms?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TestContains82343
{
class Program
{
static void Main(string[] args)
{
List<string> tasks = new List<string>();
tasks.Add("Add contract to Customer.");
tasks.Add("New contract for customer.");
tasks.Add("Create new contract.");
tasks.Add("Go through the old contracts.");
tasks.Add("Attach files to customers.");
var filteredTasks = SearchListWithSearchPhrase(tasks, "contract customer");
filteredTasks.ForEach(t => Console.WriteLine(t));
Console.ReadLine();
}
public static List<string> SearchListWithSearchPhrase(List<string> tasks, string searchPhrase)
{
string[] parts = searchPhrase.Split(new char[] { ' ' });
List<string> searchTerms = new List<string>();
foreach (string part in parts)
{
searchTerms.Add(part.Trim());
}
switch (searchTerms.Count())
{
case 1:
return (from t in tasks
where t.ToUpper().Contains(searchTerms[0].ToUpper())
select t).ToList();
case 2:
return (from t in tasks
where t.ToUpper().Contains(searchTerms[0].ToUpper()) && t.ToUpper().Contains(searchTerms[1].ToUpper())
select t).ToList();
default:
return null;
}
}
}
}
How about replacing
switch (searchTerms.Count())
{
case 1:
return (from t in tasks
where t.ToUpper().Contains(searchTerms[0].ToUpper())
select t).ToList();
case 2:
return (from t in tasks
where t.ToUpper().Contains(searchTerms[0].ToUpper()) && t.ToUpper().Contains(searchTerms[1].ToUpper())
select t).ToList();
default:
return null;
}
By
(from t in tasks
where searchTerms.All(term => t.ToUpper().Contains(term.ToUpper()))
select t).ToList();
Just call Where repeatedly... I've changed the handling of searchTerms as well to make this slightly more LINQ-y :)
public static List<string> SearchListWithSearchPhrase
(List<string> tasks, string searchPhrase)
{
IEnumerable<string> searchTerms = searchPhrase.Split(' ')
.Select(x => x.Trim());
IEnumerable<string> query = tasks;
foreach (string term in searchTerms)
{
// See edit below
String captured = term;
query = query.Where(t => t.ToUpper().Contains(captured));
}
return query.ToList();
}
You should note that by default, ToUpper() will be culture-sensitive - there are various caveats about case-insensitive matching :( Have a look at this guidance on MSDN for more details. I'm not sure how much support there is for case-insensitive Contains though :(
EDIT: I like konamiman's answer, although it looks like it's splitting somewhat differently to your original code. All is definitely a useful LINQ operator to know about...
Here's how I would write it though:
return tasks.Where(t => searchTerms.All(term => t.ToUpper().Contains(term)))
.ToList();
(I don't generally use a query expression when it's a single operator applied to the outer query.)
EDIT: Aargh, I can't believe I fell into the captured variable issue :( You need to create a copy of the foreach loop variable as otherwise the closure will always refer to the "current" value of the variable... which will always be the last value by the time ToList is executed :(
EDIT: Note that everything so far is inefficient in terms of uppercasing each task several times. That's probably fine in reality, but you could avoid it by using something like this:
IEnumerable<string> query = tasks.Select
(t => new { Original = t, Upper = t.ToUpper });
return query.Where(task => searchTerms.All(term => task.Upper.Contains(term)))
.Select(task => task.Original)
.ToList();
Can't test code right now, but you could do something similar to this:
from t in tasks
let taskWords=t.ToUpper().Split(new char[] { ' ' });
where searchTerms.All(term => taskWords.Contains(term.ToUpper()))
select t
Replace the switch statement with a for loop :)
[TestMethod]
public void TestSearch()
{
List<string> tasks = new List<string>
{
"Add contract to Customer.",
"New contract for customer.",
"Create new contract.",
"Go through the old contracts.",
"Attach files to customers."
};
var filteredTasks = SearchListWithSearchPhrase(tasks, "contract customer new");
filteredTasks.ForEach(Console.WriteLine);
}
public static List<string> SearchListWithSearchPhrase(List<string> tasks, string searchPhrase)
{
var query = tasks.AsEnumerable();
foreach (var term in searchPhrase.Split(new[] { ' ' }))
{
string s = term.Trim();
query = query.Where(x => x.IndexOf(s, StringComparison.InvariantCultureIgnoreCase) != -1);
}
return query.ToList();
}
why not use a foreach and AddRange() after splitting the terms and saving it into a list.
List<ItemsImLookingFor> items = new List<ItemsImLookingFor>();
// search for terms
foreach(string term in searchTerms)
{
// add users to list
items.AddRange(dbOrList(
item => item.Name.ToLower().Contains(str)).ToList()
);
}
that should work for any amount of terms.

Categories