MVC Efficient dynamic EF database calls - c#

I'm working on an MVC app. A lot of the views are search engine like, allowing the user to select parameters to get the data he wants.
I am looking for an efficient way to make dynamic calls to the database so that I can retrieve only the wanted data instead of taking a bunch of data and sorting them out.
So far I have been using Dynamic Linq, but I have had huge troubles working with this and I was wondering if there was anything better and less troublesome. The only factor is that I know the fields of the table I'm looking on, and I need to be able to use operators like >, < or =.
EDIT
Here's a table example based on my app:
TABLE CARD
(
CARD_IDE INT NOT NULL IDENTITY,
CARD_NAME VARCHAR(50) NOT NULL,
CARD_NUMBER NUMERIC(4) NOT NULL,
CARD_COLOR VARCHAR(10),
CARD_MANA_COST VARCHAR(30),
CARD_MANA_CONVT VARCHAR(3),
CARD_TYPE VARCHAR(50),
CARD_POWER VARCHAR(2),
CARD_TOUGH VARCHAR(2),
CARD_RARTY VARCHAR(1) NOT NULL,
CARD_TEXT_ABILT VARCHAR(800),
CARD_TEXT_FLAVR VARCHAR(800),
CARD_ARTST_NAME VARCHAR(100),
CARD_SET_IDE INT NOT NULL,
CARD_FLAG_FACE INT NOT NULL DEFAULT 0,
CARD_CHILD_IDE INT,
CARD_MASTER_IDE INT,
CARD_UNIT_COST NUMERIC(5,2) NOT NULL DEFAULT 0
)
And a few examples:
A user look for any item which type is "Creature" (String), number is 3 and card set IDE is 6;
Any cards which contains the word Rat;
All the cards of color Blue and White which unit cost is higher than 3.00
Any cards which power is less than 3 but higher than 1.
EDIT 2
After much research (and thanks to Chris Pratt below), I've managed to look a bit and dive into something.
Based on this:
First I create my object context like this:
var objectContext = ((IObjectContextAdapter) mDb).ObjectContext;
Then I create the ObjectSet:
ObjectSet<CARD> priceList = objectContext.CreateObjectSet<CARD>();
Then I check if any values as been chosen by the user:
if (keyValuePair.Key == CARDNAME)
{
queryToLoad = TextBuilder.BuildQueryStringForTextboxValue(keyValuePair.Value);
//valuesToUse.Add("CARD_NAME.Contains(\"" + queryToLoad + "\")");
priceList = priceList.Where(_item => _item.CARD_NAME.Contains(queryToLoad)) as ObjectSet<PRICE_LIST>;
}
Where the queryToLoad is, in fact, the value to look for. Example: if my user search for an Angel, the queryToLoad will be "Angel". I'm trying to get to the result without having to rewrite my whole code.
And then I gather the result in a List like this:
listToReturn.AddRange(priceList.ToList());
HOWEVER: I have a problem using this approach. As the priceList = priceList.Where(_item => _item.CARD_NAME.Contains(queryToLoad)) as ObjectSet<PRICE_LIST>; like is struck, the value is always null and I don't know why.

There is no way to optimize something that is inherently dynamic in nature. The only thing you can do in your app is feed whatever filters the end-user chooses into a Where clause and let Entity Framework fetch the result from your database in the most efficient way it deems fit.
However, there's some things you can potentially do if there's certain known constraints. At the very least, if you know which fields will be searched on, you can add proper indexes to your database so that searches on those particular fields will be optimized. You can reach a point of over-optimization, though (if you index every field, you might as well not have an index). It's usually better to monitor the queries the database handles and add indexes for the most used fields based on real-world user behavior.
You can also investigate using stored procedures for your queries. Depending on the complexity and number of filters being applied, creating a stored procedure to handle the queries could be difficult, but if you can condense the logic enough to make it possible, using a stored procedure will highly optimize the queries.
UPDATE
Here's what I mean by building a query. I'll take your first use case scenario above. You have three filters: type, number, and IDE. The user may specify any one or two, all three, or none. (We'll assume that cardType is a string and cardNumber and cardIde are nullable ints.
var cards = db.Cards;
if (!string.IsNullOrEmpty(cardType))
{
cards = cards.Where(m => m.Type == cardType);
}
if (cardNumber.HasValue)
{
cards = cards.Where(m => m.Number == cardNumber);
}
if (cardIde.HasValue)
{
cards = cards.Where(m => m.IDE == cardIde);
}
return View(cards);
Entity Framework will not actually issue the query until you do something that requires the data from the query (iterate over the list, count the items, etc.). Up until that point, anything additional you do to the DbSet is just tacked on to the query EF will eventually send. You can therefore build your query by conditionally handling each potential filter one at a time before finally utilizing the end result.
UPDATE #2
Sorry, I apparently hadn't consumed enough coffee yet when I did my last update. There's obviously a type issue there. If you store db.Cards into cards it will be a DbSet<Card> type, while the result of any call to Where would be an IQueryable<Card> type, which pretty obviously can't be stored in the same variable.
So you just need to initially cast the variable to something that works for both: IEnumerable<Card>:
IEnumerable<Card> cards = db.Cards;
Then you shouldn't get any type errors.

Related

Do I need to index on a id in EF Core if I'm searching for an id in 2 different columns?

If I do a query like below where I'm searching for the same ID but on two different columns. Should I have an index like this? Or should I create 2 separate indexes, one for each column?
modelBuilder.Entity<Transfer>()
.HasIndex(p => new { p.SenderId, p.ReceiverId });
Query:
var transfersCount = await _dbContext.Transfers
.Where(p => p.ReceiverId == user.Id || p.SenderId == user.Id)
.CountAsync();
What if I have a query like this below, would I need a multicolumn index on all 4 columns?
var transfersCount = await _dbContext.Transfers
.Where(p => (p.SenderId == user.Id || p.ReceiverId == user.Id) &&
(!transferParams.Status.HasValue || p.TransferStatus == (TransferStatus)transferParams.Status) &&
(!transferParams.Type.HasValue || p.TransferType == (TransferType)transferParams.Type))
.CountAsync();
I recommend two single-column indices.
The two single-column indices will perform better in this query because both columns would be in a fully ordered index. By contrast, in a multi-column index, only the first column is fully ordered in the index.
If you were using an AND condition for the sender and receiver, then you would benefit from a multi-column index. The multi-column index is ideal for situations where multiple columns have conditional statements that must all be evaluated to build the result set (e.g., WHERE receiver = 1 AND sender = 2). In an OR condition, a multi-column index would be leveraged as though it were a single-column index only for the first column; the second column would be unindexed.
The full intricacies of index design would take well more than an SO answer to explain; there are probably books about it, and it will feature as a reasonable proportion of a database administrator's job
Indexes have a cost to maintain so you generally strive to have the fewest possible that offer you the most flexibility with what you want to do. Generally an index will have some columns that define its key and a reference to rows in the table that have those keys. When using an index the database engine can quickly look up the key, and discover which rows it needs to read from. It then looks up those rows as a secondary operation.
Indexes can also store table data that aren't part of the lookup key, so you might find yourself creating indexes that also track other columns from the row so that by the time the database has found the key it's looking for in the index it also has access to the row data the query wants and doesn't then need to launch a second lookup operation to find the row. If a query wants too many rows from a table, the database might decide to skip using the index at all; there's some threshold beyond which it's faster to just read all the rows direct from the table and search them rather than suffer the indirection of using the index to find which rows need to be read
The columns that an index indexes can serve more than one query; order is important. If you always query a person by name and also sometimes query by age, but you never query by age alone, it would be better to index (name,age) than (age,name). An index on (name,age) can serve a query for just WHERE name = ..., and also WHERR name = ... and age = .... If you use an OR keyword in a where clause you can consider that as a separate query entirely that would need its own index. Indeed the database might decide to run "name or age" as two parallel queries and combine the results to remove duplicates. If your app needs later change so that instead of just querying a mix of (name), (name and age) it is now frequently querying (name), (name and age), (name or age), (age), (age and height) then it might make sense to have two indexes: (name, age) plus (age, height). The database can use part or all of both of these to server the common queries. Remember that using part of an index only works from left to right. An index on (name, age) wouldn't typically serve a query for age alone.
If you're using SQLServer and SSMS you might find that showing the query plan also reveals a missing index recommendation and it's worth considering carefully whether an index needs to be added. Apps deployed to Microsoft azure also automatically look at common queries where performance suffers because of a lack of an index and it can be the impetus to take a look at the query being run and seeing how existing or new indexes might be extended or rearranged to cover it; as first noted it's not really something a single SO answer of a few lines can prep you for with a "always do this and it will be fine" - companies operating at large scale hire people whose sole mission is to make sure the database runs well they usually grumble a lot about the devs and more so about things like entity framework because an EF LINQ query is a layer disconnected from the actual SQL being run and may not be the most optimal approach to getting the data. All these things you have to contend with.
In this particular case it seems like indexes on SenderId+TransferStatus+TransferType and another on ReceiverId+TransferStatus+TransferType could help the two queries shown, but I wouldn't go as far as to say "definitely do that" without taking a holistic view of everything this table contains, how many different values there are in those columns and what it's used for in the context of the app. If Sender/Receiver are unique, there may be no point in adding more columns to the index as keys. If TransferStatus and Type change such that some combination of them helps uniquely identify some particular row out of hundreds then it may make sense, but then if this query only runs once a day compared to another that is used 10 times a second... There's too much variable and unknown to provide a concrete answer to the question as presented; don't optmize prematurely - indexing columns just because they're used in some where clause somewhere would be premature

Compare very large lists of database objects in c#

I have inherited a poorly designed database table (no primary key or indexes, oversized nvarchar fields, dates stored as nvarchar, etc.). This table has roughly 350,000 records. I get handed a list of around 2,000 potentially new records at predefined intervals, and I have to insert any of the potentially new records if the database does not already have a matching record.
I initially tried making comparisons in a foreach loop, but it quickly became obvious that there was probably a much more efficient way. After doing some research, I then tried the .Any(), .Contains(), and .Exclude() methods.
My research leads me to believe that the .Exclude() method would be the most efficient, but I get out of memory errors when trying that. The .Any() and .Contains() methods seem to both take roughly the same time to complete (which is faster than the foreach loop).
The structure of the two lists are identical, and each contain multiple strings. I have a few questions that I have not found satisfying answers to, if you don't mind.
When comparing two lists of objects (made up of several strings), is the .Exclude() method considered to be the most efficient?
Is there a way to use projection when using the .Exclude() method? What I would like to find a way to accomplish would be something like:
List<Data> storedData = db.Data;
List<Data> incomingData = someDataPreviouslyParsed;
// No Projection that runs out of memory
var newData = incomingData.Exclude(storedData).ToList();
// PsudoCode that I would like to figure out if is possible
// First use projection on db so as to not get a bunch of irrelevant data
List<Data> storedData = db.Data.Select(x => new { x.field1, x.field2, x.field3 });
var newData = incomingData.Select(x => new { x.field1, x.field2, x.field3 }).Exclude(storedData).ToList();
Using a raw SQL statement in SQL Server Studio Manager, the query takes slightly longer than 10 seconds. Using EF, it seems to take in excess of a minute. Is that poorly optimized SQL by EF, or is that overhead from EF that makes such a difference?
Would raw SQL in EF be a better practice in a situation like this?
Semi-Off-Topic:
When grabbing the data from the database and storing it in the variable storedData, does that eliminate the usefulness of any indexes (should there be any) stored in the table?
I hate to ask so many questions, and I'm sure that many (if not all) of them are quite noobish. However, I have nowhere else to turn, and I have been looking for clear answers all day. Any help is very much so appreciated.
UPDATE
After further research, I have found what seems to be a very good solution to this problem. Using EF, I grab the 350,000 records from the database keeping only the columns I need to create a unique record. I then take that data and convert it to a dictionary grouping the kept columns as the key (like can be seen here). This solves the problem of there already being duplicates in the returned data, and gives me something fast to work with to compare my newly parsed data to. The performance increase was very noticeable!
I'm still not sure if this would be approaching the best practice, but I can certainly live with the performance of this. I have also seen some references to ToLookup() that I may try to get working to see if there is a performance gain there as well. Nevertheless, here is some code to show what I did:
var storedDataDictionary = storedData.GroupBy(k => (k.Field1 + k.Field2 + k.Field3 + k.Field4)).ToDictionary(g => g.Key, g => g.First());
foreach (var item in parsedData)
{
if (storedDataDictionary.ContainsKey(item.Field1 + item.Field2 + item.Field3 + item.Field4))
{
// duplicateData is a previously defined list
duplicateData.Add(item);
}
else
{
// newData is a previously defined list
newData.Add(item);
}
}
No reason to use EF for that.
Grab only columns that are required for you to make decision if you should update or insert the record (so those which represent missing "primary key"). Don't waste memory for other columns.
Build a HashSet of existing primary keys (i.e. if primary key is a number, HashSet of int, if it has multiple keys - combine them to string).
Check your 2000 items against HashSet, that is very fast.
Update or insert items with raw sql.
I suggest you consider doing it in SQL, not C#. You don't say what RDBMS you are using, but you could look at the MERGE statement, e.g. (for SQL Server 2008):
https://technet.microsoft.com/en-us/library/bb522522%28v=sql.105%29.aspx
Broadly, the statement checks if a record is 'new' - if so, you can INSERT it; if not there is UPDATE and DELETE capabilities, or you just ignore it.

LINQ to Entities does not recognize the method 'Int32 IndexOf(System.String, System.StringComparison)' method

I have executed a linq query by using Entityframework like below
GroupMaster getGroup = null;
getGroup = DataContext.Groups.FirstOrDefault(item => keyword.IndexOf(item.Keywords,StringComparison.OrdinalIgnoreCase)>=0 && item.IsEnabled)
when executing this method I got exception like below
LINQ to Entities does not recognize the method 'Int32 IndexOf(System.String, System.StringComparison)' method, and this
method cannot be translated into a store expression.
Contains() method by default case sensitive so again I need to convert to lower.Is there any method for checking a string match other than the contains method and is there any method to solve the indexOf method issue?
The IndexOf method Of string class will not recognized by Entity Framework, Please replace this function with SQLfunction or Canonical functions
You can also take help from here or maybe here
You can use below code sample:
DataContext.Groups.FirstOrDefault(item =>
System.Data.Objects.SqlClient.SqlFunctions.CharIndex(item.Keywords, keyword).Value >=0 && item.IsEnabled)
You really only have four options here.
Change the collation of the database globally. This can be done in several ways, a simple google search should reveal them.
Change the collation of individual tables or columns.
Use a stored procedure and specify the COLATE statement on your query
perform a query and return a large set of results, then filter in memory using Linq to Objects.
number 4 is not a good option unless your result set is pretty small. #3 is good if you can't change the database (but you can't use Linq with it).
numbers 1 and 2 are choices you need to make about your data model as a whole, or if you only want to do it on specific fields.
Changing the Servers collation:
http://technet.microsoft.com/en-us/library/ms179254.aspx
Changing the Database Collation:
http://technet.microsoft.com/en-us/library/ms179254.aspx
Changing the Columns Collation:
http://technet.microsoft.com/en-us/library/ms190920(v=sql.105).aspx
Using the Collate statement in a stored proc:
http://technet.microsoft.com/en-us/library/ms184391.aspx
Instead you can use this method below for lowering the cases:
var lowerCaseItem = item.ToLower();
If your item is of type string. Then this might get you through that exception.
Erik Funkenbush' answer is perfectly valid when looking at it like a database problem. But I get the feeling that you need a better structure for keeping data regarding keywords if you want to traverse them efficiently.
Note that this answer isn't intended to be better, it is intended to fix the problem in your data model rather than making the environment adapt to the current (apparently flawed, since there is an issue) data model you have.
My main suggestion, regardless of time constraint (I realize this isn't the easiest fix) would be to add a separate table for the keywords (with a many-to-many relationship with its related classes).
[GROUPS] * ------- * [KEYWORD]
This should allow for you to search for the keyword, and only then retrieve the items that have that keyword related to it (based on ID rather than a compound string).
int? keywordID = DataContext.Keywords.Where(x => x.Name == keywordFilter).Select(x => x.Id).FirstOrDefault();
if(keywordID != null)
{
getGroup = DataContext.Groups.FirstOrDefault(group => group.Keywords.Any(kw => kw.Id == keywordID));
}
But I can understand completely if this type of fix is not possible anymore in the current project. I wanted to mention it though, in case anyone in the future stumbles on this question and still has the option for improving the data structure.

determining differences between generic lists

There are probably 10 duplicates of this question but I would like to know if there is a better way than I am currently doing this. This is a small example that I'm using to show how I'm determining differences:
//let t1 be a representation of the ID's in the database.
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
//let t2 be the list of ID's that are in memory.
//these changes need to be reflected to the database.
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var hash = new HashSet<int>(t1);
var hash2 = new HashSet<int>(t2);
//determines which ID's need to be removed from the database
hash.ExceptWith(t2);
//determines which ID's need to be added to the database.
hash2.ExceptWith(t1);
//remove contents of hash from database
//add contents of hash2 to database
I want to know if I can determine what to add and remove in ONE operation instead of the two that I currently have to do. Is there any way to increase the performance of this operation? Keep in mind in the actual database situation there are hundreds of thousands of ID's.
EDIT or second question, is there a LINQ query that I can do directly on the database so I can just supply the new list of ID's and have it automatically remove/add itself? (using mysql)
CLARIFICATION I know I need two SQL queries (or a stored procedure). The question is if I can determine the differences in the list in one action, and if it can be done faster than this.
EDIT2
This operation from SPFiredrake appears to be faster than my hashset version - however I have no idea how to determine which to add and which to remove from the database. Is there a way to include that information in the operation?
t1.Union(t2).Except(t1.Intersect(t2))
EDIT3
Nevermind, I forgot that this statement in-fact has the problem of delayed execution, although in-case anyone is wondering, I solved my prior problem with it by using a custom comparer and an added variable determining which list it was from.
Ultimately, you're going to use a full outer join (which in LINQ world, is two GroupJoins). However, we ONLY care about values that don't have a matching record in either table. Null right value (left outer join) indicates a removal, null left value (right outer join) indicates an addition. So to get it to work this way, we just perform two left outer joins (switching the input for the second case to emulate the right outer join), concat them together (can use union, but unnecessary since we'll be getting rid of any duplicates anyway).
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var operations =
t1.GroupJoin(
t2,
t1i => t1i,
t2i => t2i,
(t1i, t2join) => new { Id = t1i, Action = !t2join.Any() ? "Remove" : null })
.Concat(
t2.GroupJoin(
t1,
t2i => t2i,
t1i => t1i,
(t2i, t1join) => new { Id = t2i, Action = !t1join.Any() ? "Insert" : null })
.Where(tr => tr.Action != null)
This will give you the select statement. Then, you can feed this data into a stored procedure that removes values that already exist in the table and add the rest (or two lists to run removals and additions against). Either way, still not the cleanest way to do it, but at least this gets you thinking.
Edit: My original solution was to separate out the two lists based on what action was needed, which is why it's so ghastly. The same can be done using a one-liner (not caring about which action to take, however), although I think you'll still suffer from the same issues (using LINQ [enumeration] as opposed to Hashsets [hash collection]).
// XOR of sets = (A | B) - (A & B), - being set difference (Except)
t1.Union(t2).Except(t1.Intersect(t2))
I'm sure it'll still be slower than using the Hashsets, but give it a shot anyway.
Edit: Yes, it is faster, because it doesn't actually do anything with the collection until you enumerate over it (either in a foreach or by getting it into a concrete data type [IE: List<>, Array, etc]). It's still going to take extra time to sort out which ones to add/remove and that's ultimately the problem. I was able to get comparable speed by breaking down the two queries, but getting it into the in-memory world (via ToList()) made it slower than the hashset version:
t1.Except(t2); // .ToList() slows these down
t2.Except(t1);
Honestly, I would handle it on the SQL side. In the stored proc, store all the values in a table variable with another column indicating addition or removal (based on whether the value already exists in the table). Then you can just do a bulk deletion/insertion by joining back to this table variable.
Edit: Thought I'd expand on what I meant by sending the full list to the database and have it handled in the sproc:
var toModify = t1.Union(t2).Except(t1.Intersect(t2));
mods = string.Join(",", toModify.ToArray());
// Pass mods (comma separated list) to your sproc.
Then, in the stored procedure, you would do this:
-- #delimitedIDs some unbounded text type, in case you have a LOT of records
-- I use XQuery to build the table (found it's faster than some other methods)
DECLARE #idTable TABLE (ID int, AddRecord bit)
DECLARE #xmlString XML
SET #xmlString = CAST('<NODES><NODE>' + REPLACE(#delimitedIDs, ',', '</NODE><NODE>') + '</NODE></NODES>' as XML)
INSERT INTO #idTable (ID)
SELECT node.value('.','int')
FROM #xmlString.nodes('//NODE') as xs(node)
UPDATE id
SET AddRecord = CASE WHEN someTable.ID IS NULL THEN 1 ELSE 0 END
FROM #idTable id LEFT OUTER JOIN [SomeTable] someTable on someTable.ID = id.ID
DELETE a
FROM [SomeTable] a JOIN #idTable b ON b.ID = a.ID AND b.AddRecord = 0
INSERT INTO [SomeTable] (ID)
SELECT id FROM #idTable WHERE AddRecord = 1
Admittedly, this just inserts some ID, it doesn't actually add any other information. However, you can still pass in XML data to the sproc and use XQuery in a similar fashion to get the information you'd need to add.
even if you replace it with a Linq version you still need two operations.
let's assume you are doing this using pure SQL.
you would probably need two queries:
one for removing the records
another one for adding them
Using LINQ code it would be much more complicated and less readable than your solution

Accessing foreign keys through LINQ

I have a setup on SQL Server 2008. I've got three tables. One has a string identifier as a primary key. The second table holds indices into an attribute table. The third simply holds foreign keys into both tables- so that the attributes themselves aren't held in the first table but are instead referred to. Apparently this is common in database normalization, although it is still insane because I know that, since the key is a string, it would take a maximum of 1 attribute per 30 first table room entries to yield a space benefit, let alone the time and complexity problems.
How can I write a LINQ to SQL query to only return values from the first table, such that they hold only specific attributes, as defined in the list in the second table? I attempted to use a Join or GroupJoin, but apparently SQL Server 2008 cannot use a Tuple as the return value.
"I attempted to use a Join or
GroupJoin, but apparently SQL Server
2008 cannot use a Tuple as the return
value".
You can use anonymous types instead of Tuples which are supported by Linq2SQL.
IE:
from x in source group x by new {x.Field1, x.Field2}
I'm not quite clear what you're asking for. Some code might help. Are you looking for something like this?
var q = from i in ctx.Items
select new
{
i.ItemId,
i.ItemTitle,
Attributes = from map in i.AttributeMaps
select map.Attribute
};
I use this page all the time for figuring out complex linq queries when I know the sql approach I want to use.
VB http://msdn.microsoft.com/en-us/vbasic/bb688085
C# http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx
If you know how to write the sql query to get the data you want then this will show you how to get the same result translating it into linq syntax.

Categories