I have some data coming in from a webpage and I need to filter it based on what comes back.
I have a pre-defined amount of keywords that I want to search for, around 30.
What is the most efficient way to match them up? Because I can have ~2000 records coming in I don't think searching through a list/array/switch-case for every record is too efficient right?
Besides list/array/switch-case, the only thing I can think of is Linq.
List<string> found = (from str in listOfStringsToSearch
where listOfKeywords.Any(keyword => str.Contains(keyword))
select str).ToList<string>();
If you just want to know which serach terms have matching strings, you can use Enumerable.Intersect:
var both = records.Intersect(searchTerms);
It is deferred executed, hence does not create a new collection and is not executed until you use it in some way(f.e. ToList or foreach or string.Join).
It internally uses a Set, hence it is very efficient.
Here are more informations on set operations in LINQ:
http://msdn.microsoft.com/en-us/library/bb546153.aspx
Related
I will start working on xamarin shortly and will be transferring a lot of code from android studio's java to c#.
In java I am using a custom classes which are given arguments conditions etc, convert them to SQL statements and then loads the results to the objects in the project's model
What I am unsure of is wether linq is a better option for filtering such data.
For example what would happen currently is somethng along these lines
List<Customer> customers = (new CustomerDAO()).get_all()
Or if I have a condition
List<Customer> customers = (new CustomerDAO()).get(new Condition(CustomerDAO.Code, equals, "code1")
Now let us assume I have transferred the classes to c# and I wish to do somethng similar to the second case.
So I will probably write something along the lines of:
var customers = from customer
in (new CustomerDAO()).get_all()
where customer.code.equals("code1")
select customer
I know that the query will only be executed when I actually try to access customers, but if I have multiple accesses to customers ( let us say that I use 4 foreach loops later on) will the get_all method be called 4 times? or are the results stored at the first execution?
Also is it more efficient (time wise because memory wise it is probably not) to just keep the get_all() method and use linq to filter the results? Or use my existing setup which in effect executes
Select * from Customers where code = 'code1'
And loads the results to an object?
Thanks in advance for any help you can provide
Edit: yes I do know there is sqlite.net which pretty much does what my daos do but probably better, and at some point I will probably convert all my objects to use it, I just need to know for the sake of knowing
if I have multiple accesses to customers ( let
us say that I use 4 foreach loops later on) will the get_all method be
called 4 times? or are the results stored at the first execution?
Each time you enumerate the enumerator (using foreach in your example), the query will re-execute, unless you store the materialized result somewhere. For example, if on the first query you'd do:
var customerSource = new CustomerDAO();
List<Customer> customerSource.Where(customer => customer.Code.Equals("code1")).ToList();
Then now you'll be working with an in-memory List<Customer> without executing the query over again.
On the contrary, if each time you'd do:
var filteredCustomers = customerSource.Where(customer => customer.Code.Equals("code1"))
foreach (var customer in filteredCustomers)
{
// Do stuff
}
Then for each enumeration you'll be exeucting the said query over again.
Also is it more efficient (time wise because memory wise it is
probably not) to just keep the get_all() method and use linq to filter
the results? Or use my existing setup which in effect executes
That really depends on your use-case. Lets imagine you were using LINQ to EF, and the customer table has a million rows, do you really want to be bringing all of them in-memory and only then filtering them out to use a subset of data? It would usually be better to full filtered query.
In kentico the standard way to get documents in below (which I believe is based on ObjectQuery and has linq commands). Im trying to filter it by one more field "newsCategory" which contains data like "1|2|3". So I cant add .Search("newsCategory", 1) etc because I need to split the list before I can search it. What direction should I be looking? A select sub-query? (Im new to linq)
// Get documents
var news = DocumentHelper.GetDocuments("CMS.News")
.OnSite("CorporateSite")
.Path("/News", PathTypeEnum.Children)
.Culture("en-us")
.CombineWithDefaultCulture(false);
As far as this is a field from the coupled table, you can't access it through property, but have to use GetValue() instead. Once you've got, you can work with it like with regular string:
var news = DocumentHelper.GetDocuments("CMS.News")
.OnSite("CorporateSite")
.Path("/News", PathTypeEnum.Children)
.Culture("en-us")
.CombineWithDefaultCulture(false)
.Where(d => d.GetStringValue("newsCategory","").Split('|').Contains("1"));
Are you sure your data is 1|2|3 and not 1|2|3| or |1|2|3 ?
If it is, you could do .Where("NewsCategory", QueryOperator.Like, "%" + id + "|%")
Otherwise you may have to get back more results, and then loop through them and split the values to find the exact one you want.
EDIT: Check out this article that shows some more advanced where commands you can use with the Data Query API. You should be able to MacGyver a proper filter with those options.
I believe you're looking for:
.WhereLike("DocumentCategoryID", "CategoryID");
//OR
.WhereLike("DocumentCategory","CategoryName");
I don't have v8 installed to double check which exact key/value pair to filter by, but according to this Document Query API article you filter document sets with the WhereLike() method.
According to the API documentation, GetDocuments() returns a MultiDocumentQuery object. I'm not 100% certain if that implements IEnumerable, so you may not even be able to use LINQ with it.
I believe something like this would work. There is a wherein property that should be able to pull the value out. Not exactly sure how it would handle the scenario of having a 1 and then an 11, but it may be work looking into.
// Get documents
var news = DocumentHelper.GetDocuments("CMS.News")
.OnSite("CorporateSite")
.Path("/News", PathTypeEnum.Children)
.Culture("en-us")
.CombineWithDefaultCulture(false)
.WhereIn("NewsCategory",1);
I have a database of strings that contain IDs. I need to pass that list into a LINQ query so I can pull the correct records.
model.SelectedDealers = db.dealers.Any(a => a.sdealer_name.Contains(UserToEdit.UserViewAccesses.Select(s => s.ViewReferenceNumber)));
SelectedDealers is of type dealers
ViewReferenceNumber should be a list of strings which should match sdealer_name
So essentially I am trying to find all of the dealers whos sdealer_name matches the list of sdealer_names I have in my UserToEdit.UserViewAccesses
I've tried moving parts around and switching them in different spots and can't seem to figure this out.
Any() is just a boolean indicating if there are any results. It doesn't actually return the results.
If I understand what you are after correctly, then this might work:
var dealerNames = UserToEdit.UserViewAccesses.Select(s => s.ViewReferenceNumber).ToList();
model.SelectedDealers = db.dealers.Where(a => dealerNames.Contains(a.sdealer_name));
So essentially I am trying to find all of the dealers whos
sdealer_name matches the list of sdealer_names I have in my
UserToEdit.UserViewAccesses
var dealersthatmatched = (from d in UserToEdit.UserViewAccesses.sdealer_names
where d == sdealer_name
select d).ToList()
Wish I could have made a comment instead, but as I don't have enough rep I can't. I wish I understood the requirement better, but you seem ready and able to try stuff so perhaps you find this useful.
I have two IList<CustomObject>, where CustomObject has a Name property that's a string. Call the first one set, and the second one subset. set contains a list of things that I just displayed to the user in a multiselect list box. The ones the user selected have been placed in subset (so subset is guaranteed to be a subset of set, hence the clever names ;) )
What is the most straightforward way to generate a third IList<CustomObject>, inverseSubset, containing all the CustomObjects the user DIDN'T select, from these two sets?
I've been trying LINQ things like this
IEnumerable<CustomObject> inverseSubset = set.Select<CustomObject,CustomObject>(
sp => !subset.ConvertAll<string>(p => p.Name).Contains(sp.Name));
...based on answers to vaguely similar questions, but so far nothing is even compiling, much less working :P
Use the LINQ Except for this:
Produces the set difference of two sequences.
Aha, too much SQL recently - I didn't want Select, I wanted Where:
List<string> subsetNames = subset.ConvertAll<string>(p => p.Name);
IEnumerable<CustomObject> inverseSubset =
set.Where<CustomObject>(p => !subsetNames.Contains(p.Name));
Lets say we have an expression:
var prices = from p in PriceDB.Prices
where p.TypeID == 12
orderby p.PriceType.Title
select p;
Is it possible to modify the select list?
I imagine it looking something like this:
var newPriceList = prices.Select( p => p.ExchangeRate );
This may be an odd request, but in my code (which is too long and complex to post here) I want to conditionally add fields to be output depending on a CheckBoxList.
I assume, of course, that I'm trying to go about this the wrong way...
I imagine it looking something like this:
Actually it would look exactly like that. First, build a query, selecting the entire record. Then add a select (using the Select() method seem the easiest way) to limit the selection. Linq-to-Sql will sort out the two selects, and use the proper reselt, so theres just one select in the final SQL.
There's no really good way to choose between multiple selects. I would probably use a switch/case.
While you could go down the dynamic route, I would strongly consider not doing so. What is the cost of fetching the extra values if you don't need them, in your particular case? Is the problem that they're being displayed dynamically and you only want them displayed in certain cases? If so, I'd suggest modifying the display code somehow.
It's hard to stay strongly typed (which has various advantages) while being dynamic in terms of what you fetch. Of course, if you always want to fetch the same "shape" of data (e.g. always just a decimal value from each row) then that's reasonably easy - let me know if that's something you'd like to see demonstrated.
If you could tell us more about your problem, we may be able to suggest alternative solutions.
If I understood you correct this is explaining how to build dynamic queries:
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
You might want to look at this Dynamic LINQ and Dynamic Lambda expressions?
Or the Dynamic Expression API (System.Linq.Dynamic).