Adding a search to a linq query - c#

I have a basic datatable, it can be completely generic for this example, except for that it contains a Username column.
I'd like to have a simple textbox and a button that performs a similarity search on the Username field. I know I can use the .Contains() method and it will translate to LIKE in sql, but is this the correct way of doing this?
Secondly, suppose that I have another item in a many to many relationship that I also want to search, in this case Label.
Data
{
ID,
Name,
...
}
Many
{
DataID,
OtherID
}
Other
{
ID,
Label
}
I'd eventually like to find all of the Data items with a Label similar to some search clause. Do I again just use .Contains?
I'd then like to sort to get the best matches for Username and Label in the same query; how can the combined likeness of {Username and Label} be sorted?
Edit: How are a LIKE query's results sorted? It is simply based on the index, and a binary it matches vs it does not match? I guess I'm not that concerned with a similarity score per say, I was more or less just wondering about the mechanism. It seems as though its pretty easy to return LIKE queries, but I've always thought that LIKE was a poor choice because it doesn't use indexes in the db. Is this true, if so does it matter?

String similarity isn't something SQL can do well. Your best bet may be to find all the matches with the same first two (or three if necessary) characters and then, assuming this is a manageable number, compute the similarity score client-side using the Levenshtein distance or similar (see http://en.wikipedia.org/wiki/Levenshtein_distance).
Or if you are feeling brave you could try something like this! http://anastasiosyal.com/archive/2009/01/11/18.aspx

Related

Return Values That Are In Lowercase

We recently discovered a bug in our system whereby any serial numbers that have been entered in lowercase have not been processed correctly.
To correct this, we need to add a one off function that will run through the database and re-process all items with lower case serial numbers.
In linq, is there a query I can run that will return a list of such items?
Note: I am not asking how to convert lowercase to uppercase or reverse, which is all google will return. I need to generate a list of all database entries where the serial number has been entered in lowercase.
EDIT: I am using Linq to MS SQL, which appears to be case insensitive.
Yes, there is. You can try something like this:
var result = serialnumber.Any(c => char.IsLower(c));
[EDIT]
Well, in case of Linq to Entities...
As is stated here: Regex in Linq (EntityFramework), String processing in database, there's few ways to workaround it.
Change database table structure. E.g. create table Foo_Filter which will link your entities to filters. And then create table Filters
which will contain filters data.
Execute query in memory and use Linq to Objects. This option will be slow, because you have to fetch all data from database to memory
Note: link to MSDN documentation has been added by me.
For example:
var result = context.Serials.ToList().Where(sn => sn.Any(c => char.IsLower(c)));
Another way is to use SqlMethods.Like Method
Finally, i'd strongly recommend to read this: Case sensitive search using Entity Framework and Custom Annotation

Most efficient way to sum data in C#

I am trying to create a friendly report summing enrollment for number of students by time of day. I initially started with loops for campusname, then time, then day and hibut it was extremely inefficient and slow. I decided to take another approach and select all the data I need in one select and organize it using c#.
Raw Data View
My problem is I am not sure whether to put this into arrays, or lists, or a dictionary or datatable to sum the enrollment and organize it as seen below(mockup, not calculated). Any guidance would be appreciated.
Friendly View
Well, if you only need to show the user some data (and not edit it) you may want to create a report.
Otherwise, if you only need sums, you could get all the data in an IEnumerable and call .Sum(). And as pointed out by colinsmith, you can use Linq in parallel.
But one thing is definite though... If you have a lot of data, you don't want to do many queries. You could either use a sum query in SQL (if the data is stored in a database) or do the sum from a collection you've fetched.
You don't want to fetch the data in a loop. Processing data in memory is way faster than querying multiple times the database and then process it.
Normally I would advise you to do this in the database, i.e. a select using group by etc, I'm having a bit of trouble figuring out how your first picture relates to the second with regards to the days so I can't offer an example.
You could of course do this in C# as well using LINQ to objects but I would first try and solve it in the DB, you are better of performance and bandwidth wise that way.
I am not quite sure what you are exactly after. But from my understanding, i would suggest you to create a class to represent your enrollment
public class Enrollment
{
public string CampusName { set;get;}
public DateTime DateEnrolled { set;get;}
}
And Get all enrollment details from the database to a collection of this class
List<Enrollment> enrollments=db.GetEnrollments();
Now you can do so many operations on this Collection to get your desired data
Ex : If you want to get all Enrollment happened on Fridays
var fridaysEnrollMent = enrollments.
Where(x => x.DateEnrolled.DayOfWeek == DayOfWeek.Friday).ToList();
If you want the Count of Enrollments happened in AA campus
var fridayCount = fridaysEnrollMent .Where(d => d.CampusName == "AA").Count();
something like
select campusname, ssrmeet_begin_time, count(ssrmeet_monday), count(ssrmeet_tue_day) ..
from the_table
group by campusname, ssrmeet_begin_time
order by campusname, ssrmeet_begin_time
/
should be close to what you want. The count only counts the values, not the NULL's. It is also thousands of times faster than first fetching all data to the client. Let the database do the analysis for you, it already has all the data.
BTW: instead of those pics, it is smarter to give some ddl and insert statements with data to work on. That would invite more people to help to answer the question.

How scalable is using .Contains for searching and auto-complete search in asp.net MVc web applications

i have found a lot of tutorials and books that implements the auto-complete search in MVC web applications as in :-
public ActionResult ArtistSearch(string q)
{
var artists = GetArtists(q);
return PartialView(artists);
}
private List<Artist> GetArtists(string searchString)
{
return storeDB.Artists
.Where(a => a.Name.Contains(searchString))
.ToList();
}
but this raised a question of how much is this approach scalable in real applications that might have thousands of records ???,, so will using Contains() scale well Or there is a much better approach??
BR
If I remember correctly string.Contains() is translated into a LIKE Query with a wildcard on each side of the query string. This makes it very difficult/impossible to use an index, so you can expect your performance to be O(n) on your dataset since SQL Server does a full table scan (see Does SQL Server optimize LIKE ('%%') query?).
To optmize your query, you might want to take a look at the Full-Text Indexing Capabilities, more info here: SQL Server: how to optimize "like" queries?).
If you could use .StartsWith instead of .Contains you will have a LIKE query with a wildcard at the end, and you can use an Index on the queried column to get fast lookups (be sure to check the query execution plan!).
I guess you will have much better perceived performance if you try to focus on the UX of your auto-complete feature: Start auto-complete search after a short lock period (when the user stops typing), and make sure it doesn't block (happens in the background).
It depends on the data your serving. In a "real world" application, you need to think about noise/stop words ("and", "the"), abbreviations ("st" for "street"), etc and difficult languages.
In these scenarios, .Contains won't fit the bill, and you'll need to employ a full-text search indexing engine, such as Lucene.NET or SQL Server Full Text Search.
Substring queries cannot use index seeks. But they still can use indexes. Scanning a few thousand records is really nothing if you don't do it too often.
So I recommend that you create an index on Name and you'll be fine for an awful lot of data.
Looks like you are using LINQ with the Entity Framework. LINQ gets converted into SQL and the call to contains gets converted into a LIKE WHERE clause, so you can simply run SELECT * FROM Artists WHERE Name LIKE '%whatever%' to get an idea about performance.
Note there are a couple of things you can do to reduce the impact. One you can limit the number of results .Take(20). Also you can wait until the user takes at least a couple of characters before triggering auto complete. Finally you can 'throttle' the call to auto complete, so that you don't call auto complete every time they type a character, instead of wait until they go say a half a second without typing an additional character.

Dynamically adding select fields using LINQ lambda

Lets say we have an expression:
var prices = from p in PriceDB.Prices
where p.TypeID == 12
orderby p.PriceType.Title
select p;
Is it possible to modify the select list?
I imagine it looking something like this:
var newPriceList = prices.Select( p => p.ExchangeRate );
This may be an odd request, but in my code (which is too long and complex to post here) I want to conditionally add fields to be output depending on a CheckBoxList.
I assume, of course, that I'm trying to go about this the wrong way...
I imagine it looking something like this:
Actually it would look exactly like that. First, build a query, selecting the entire record. Then add a select (using the Select() method seem the easiest way) to limit the selection. Linq-to-Sql will sort out the two selects, and use the proper reselt, so theres just one select in the final SQL.
There's no really good way to choose between multiple selects. I would probably use a switch/case.
While you could go down the dynamic route, I would strongly consider not doing so. What is the cost of fetching the extra values if you don't need them, in your particular case? Is the problem that they're being displayed dynamically and you only want them displayed in certain cases? If so, I'd suggest modifying the display code somehow.
It's hard to stay strongly typed (which has various advantages) while being dynamic in terms of what you fetch. Of course, if you always want to fetch the same "shape" of data (e.g. always just a decimal value from each row) then that's reasonably easy - let me know if that's something you'd like to see demonstrated.
If you could tell us more about your problem, we may be able to suggest alternative solutions.
If I understood you correct this is explaining how to build dynamic queries:
http://weblogs.asp.net/scottgu/archive/2008/01/07/dynamic-linq-part-1-using-the-linq-dynamic-query-library.aspx
You might want to look at this Dynamic LINQ and Dynamic Lambda expressions?
Or the Dynamic Expression API (System.Linq.Dynamic).

Is this an efficient way of doing a linq query with dynamic order by fields?

I have the following method to apply a sort to a list of objects (simplified it for the example):
private IEnumerable<Something> SetupOrderSort(IEnumerable<Something> input,
SORT_TYPE sort)
{
IOrderedEnumerable<Something> output = input.OrderBy(s => s.FieldA).
ThenBy(s => s.FieldB);
switch (sort)
{
case SORT_TYPE.FIELD1:
output = output.ThenBy(s => s.Field1);
break;
case SORT_TYPE.FIELD2:
output = output.ThenBy(s => s.Field2);
break;
case SORT_TYPE.UNDEFINED:
break;
}
return output.ThenBy(s => s.FieldC).ThenBy(s => s.FieldD).
AsEnumerable();
}
What I needs is to be able to insert a specific field in the midst of the orby clause. By default the ordering is: FIELDA, FIELDB, FIELDC, FIELDD.
When a sort field is specified though I need to insert the specified field between FIELDB and FIELDC in the sort order.
Currently there is only 2 possible fields to sort by but could be up to 8. Performance wise is this a good approach? Is there a more efficient way of doing this?
EDIT: I saw the following thread as well: Dynamic LINQ OrderBy on IEnumerable<T> but I thought it was overkill for what I needed. This is a snippet of code that executes a lot so I just want to make sure I am not doing something that could be easily done better that I am just missing.
Don't try and "optimize" stuff you haven't proved slow with a profiler.
It's highly unlikely that this will be slow enough to notice. I strongly suspect the overhead of actually sorting the list is higher than switching on one string.
The important question is: Is this code maintainable? Will you forget to add another case the next time you add a property to Something? If that will be a problem, consider using the MS Dynamic Query sample, from the VS 2008 C# samples page.
Otherwise, you're fine.
There's nothing inefficient about your method, but there is something unintuitive about it, which is that you can't sort by multiple columns - something that end users are almost sure to want to do.
I might hand-wave this concern away on the chance that both columns are unique, but the fact that you subsequently hard-code in another sort at the end leads me to believe that Field1 and Field2 are neither related nor unique, in which case you really should consider the possibility of having an arbitrary number of levels of sorting, perhaps by accepting an IEnumerable<SORT_TYPE> or params SORT_TYPE[] argument instead of a single SORT_TYPE.
Anyway, as far as performance goes, the OrderBy and ThenBy extensions have deferred execution, so each successive ThenBy in your code is probably no more than a few CPU instructions, it's just wrapping one function in another. It will be fine; the actual sorting will be far more expensive.

Categories