LINQ query retrieves whole record instead of one field

LINQ query retrieves whole record instead of one field - c#

I have this C# L2S code:
Table<ERPRaw> linqRawFile = db.GetTable<RawFile>();
var linqNameList =
from row in linqRawFile.AsEnumerable()
select row.fileName;
currentFileNameList = linqNameList.ToArray();
It's supposed to read only the fileName field, but when I check in the SQL server profiler, I see that this query triggers full record loading. The fields contain file binary data, so the full table loading takes considerable time. Retrieving just the fileName field takes only few milliseconds.
What would be the right way to retrieve only fileName field of RawFile in as an array of strings? I assume that the linq framework now loads each RawFile record in full, as it doesn't see I will retrieve only the fileName property from the list.
Perhaps I have to construct the query without referencing the linqRawFile at all? Wouldn't that be kind of ignoring the reason linq is introduced in the project, to abstract the database layer away?

It's supposed to read only the fileName field
No, it does not. Here is what is going on: your query execution happens in two places - in RDBMS and in memory. The db.GetTable<RawFile>() happens in RDBMS; everything after it happens in memory, because you use AsEnumerable().
The portion of the query where the projection happens (i.e. row.fileName column is extracted from the whole row) is happening in memory. The RDBMS part of the query does not know about this projection. db.GetTable<RawFile>() is all the SQL-generating LINQ provider sees, so naturally it returns the entire row.
If you write a combined query against your SQL source, the projection will happen in SQL:
var linqNameList =
from row in db.GetTable<RawFile>()
select row.fileName;

You should be able to replace the whole thing with this:
var currentFileNameList = db.GetTable<RawFile>().Select(r => r.fileName).ToArray();

Related

C# - Concatenate an in memory IList and IQueryable?

Suppose I have a List containing one string value. Suppose I also have an IQueryable that contains several strings from a database. I want to be able to concatenate these two containers into one list and then be able to call methods such as .Skip or .Take on the list. I want to be able to do this in such a way that when I combine the two containers I don't load all of the DB data into memory (only after I call .Skip and .Take). Basically, I want to do something like this (pseudocode):
IQueryable someQuery = myEntities.GetDBQuery(); // Gets "test2", "test3"
IList inMemoryList = new List();
inMemoryList.Add("test");
IList finalList = inMemoryList.Union(someQuery) // Can I do something like this without loading DB data into memory? finalList should contain all 3 strings.
// At this point it is fine to load the filtered query into memory.
foreach (string myString in finalList.Skip(100).Take(200))
{
// Do work...
}
How can I achieve this?

If I didn't misunderstand, you are trying to query the data, part of which comes from memory and others from database, like this:
//the following code will not compile, just for example
var dbQuery = BuildDbQuery();
var list = BuildListInMemory();
var myQuery = (dbQuery + list).OrderBy(aa).Skip(bb).Take(cc).Select(dd);
//and you don't want to load all records into memory by dbQuery
//because you only need some of them
The short answer is NO, you can't. Consider the .OrderBy method, all data have to be in a same "place", otherwise the code can't sort them. So the code loads all records in database by dbQuery into memory(now they are in a same place) and then sorts all of them including those in list. That probably causes a memory issue when dbQuery gives thousands of rows.
HOW TO RESOLVE
Pass the data in list into database (as parameters of dbQuery) so that the query happens in database. This is easy if your list has only a few items.
If list also has lots of records that will makes dbQuery too complex, you can try to query twice, one for dbQuery and one for list. For example, you have 10,000 users in database and 1,000 users in your memory list, and you want to get the top 10 youngest users. You don't need to load 10,000 users into memory and then find the youngest 10. Instead, you find 10 youngest (ResultA) in dbQuery and load into memory, and 10 youngest (ResultB) in memory list, and then compare between ResultA and ResultB.

I entirely agree with Danny's answer when he says you need to somehow find a way to include in memory user list into db so that you achieve what you want. As for the example which you sought in your comment, without knowing data structure of your User object, seems difficult. However assuming you would be able to connect the dots. Here is my suggested approach:
Create temporary table with identical structure that of your regular user table in your db and insert all your inmemory users into it
Write a query to Union temporary and regular table both identical in structure so that should be easy.
Return the result in your application and use it performing standard Linq operations
If you want exact code which you can use as it is then you will have to provide your User object structure - fields type etc in db to enable me to write the code.

You specify that your query and your list are both sequences of strings. someQuery can be performed completely on the database side (not in-memory)
Let's make your sequences less generic:
IQueryable<string> someQuery = ...
IList<string> myList = ...
You also specify that myList contains only one element.
string myOneAndOnlyString = myList.Single();
As your list is in-memory, this has to be performed in-memory. But because the list has only one element, this won't take any time.
The query that you request:
IQueryable<string> correctQuery = someQuery
.Where(item => item.Equals(myOneandOnlyString)
.Skip(skipCount)
.Take(takeCount)
Use your SQL server profiler to check the used SQL and see that the request is completely performed in one SQL statement.

How Dapper.NET works internally with .Count() and SingleOrDefault()?

I am new to Dapper though I am aware about ORMs and DAL and have implemented DAL with NHibernate earlier.
Example Query: -
string sql = "SELECT * FROM MyTable";
public int GetCount()
{
var result = Connection.Query<MyTablePoco>(sql).Count();
return result;
}
Will Dapper convert this query (internally) to SELECT COUNT(*) FROM MyTable looking at .Count() at the end?
Similarly, will it convert to SELECT TOP 1 * FROM MyTable in case of SingleOrDefault()?
I came from NHibernate world where it generates query accordingly. I am not sure about Dapper though. As I am working with MS Access, I do not see a way to check the query generated.

No, dapper will not adjust your query. The immediate way to tell this is simply: does the method return IEnumerable... vs IQueryable...? If it is the first, then it can only use local in-memory mechanisms.
Specifically, by default, Query will actually return a fully populated List<>. LINQ's Count() method recognises that and just accesses the .Count property of the list. So all the data is fetched from the database.
If you want to ask the database for the count, ask the database for the count.
As for mechanisms to view what is actually sent to the database: we use mini-profiler for this. It works great.
Note: when you are querying exactly one row: QueryFirstOrDefault (and the other variants you would expect) exist and have optimizations internally (including hints to ADO.NET, although not all providers can act on those things) to do things as efficiently as possible, but it does not adjust your query. In some cases the provider itself (not dapper) can help, but ultimately: if you only want the first row, ask the database for the first row (TOP or similar).

Efficiently paging large data sets with LINQ

When looking into the best ways to implement paging in C# (using LINQ), most suggestions are something along these lines:
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
// Get the total num records
var total = query.Count();
// Page the results
var paged = query.Skip((pageNum - 1) * pageSize).Take(pageSize);
This seems to be the commonly suggested strategy (simplified).
For me, my main purpose in paging is for efficiency. If my table contains 1.2 million records where Something == something, I don't want to retrieve all of them at the same time. Instead, I want to page the data, grabbing as few records as possible. But with this method, it seems that this is a moot point.
If I understand it correctly, the first statement is still retrieving the 1.2 million records, then it is being paged as necessary.
Does paging in this way actually improve performance? If the 1.2 million records are going to be retrieved every time, what's the point (besides the obvious UI benefits)?
Am I misunderstanding this? Any .NET gurus out there that can give me a lesson on LINQ, paging, and performance (when dealing with large data sets)?

The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.
It is when you call query.Count() that the first will be executed
SELECT COUNT(*) FROM Table WHERE Something = something
On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).
If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.
Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.

// Execute the query
var query = db.Entity.Where(e => e.Something == something);
For your information, nothing is called after the first statement.
// Get the total num records
var total = query.Count();
This count query will be translated to SQL, and it'll make a call to database.
This call will not get all records, because the generated SQL is something like this:
SELECT COUNT(*) FROM Entity where Something LIKE 'something'
For the last query, it doesn't get all the records neither. The query will be translated into SQL, and the paging run in the database.
Maybe you'll find this question useful: efficient way to implement paging

I believe Entity Framework might structure the SQL query with the appropriate conditions based on the linq statements. (e.g. using ROWNUMBER() OVER ...).
I could be wrong on that, however. I'd run SQL profiler and see what the generated query looks like.

lightswitch LINQ PreprocessQuery

I use the PreprocessQuery method to extend a query in lightswitch.
Something like this:
query = (from item in query
where (validIDs.Contains(item.tableIDs.myID)) &&
elementCount[item.ID] <= maxEleCount)
select item);
Where validIDs is a HashSet(int) and elementCount is a Dictionary(int, int).
the first where clause is working fine, but the second -> elementCount[item.ID] <= maxEleCount
is not working.
What i want to do is to filter a table by some IDs (validIDs) and check also if in another table the number of entries for every of this IDs does not exceed a limit.
Any ideas?
EDIT
I found a solution. Instead of a Dictionary I also used a HashSet for the second where clause. It seems it is not possible to do the Dictionary lookup inside the LINQ statement for some reason (?)

First, although being a bit pedantic, what you're doing in a PreProcessQuery method is "restricting" records in the query, not "extending" the query.
What you put in a LING query has to be able to be processed by the Entity Framework data provider (in the case of LS, the SQL Server Data Provider).
Sometimes you'll find that while your LINQ query compiles, it fails at runtime. This is because the data provider is unable to express it to the data store (again in this case SQL Server).
You're normally restricted to "primitive" values, so if you hadn't said that using a Dictionary actually worked, I would have said that it wouldn't.
Any time you have a static (as in non-changing) value, I'd suggest that you create a variable outside of your LINQ query, then use the variable in the LINQ query. By doing this, you're simply passing a value, the data provider doesn't have to try to figure out how to pass it to the data store.
Reading your code again, this might not be what you're doing, but hopefully this explanation will still be helpful.

C# Entity Framework Linq Query not recognizing new changes

I'm using C#, .NET (4.0) and Entity Framework to connect to SQL CE 4.0.
I query some objects with specific properties, but the query returns only objects that meet search criteria only if that data was already saved to database, which is not that problematic, bigger problem is that if data is changed, but not yet saved to database it will still meet search criteria.
Example:
var query = from location in mainDBContext.Locations
where location.InUse == true
select location;
This query returns also objects where location.InUse = false if InUse was true when loaded from DB and then changed later on in code.
This is screen capture from one of the query results objects.
I really don't understand why it does this. I would understand if this query would always query database and I would get the older version of this object (thus InUse would be true).
Thank you for your time and answers.

That is how EF works internally.
Every entity uniquely identified by its key can be tracked by the context only once - that is called identity map. So it doesn't matter how many times did you execute the query. If the query is returning tracked entities and if it is repeatedly executed on the same context instance it will always return the same instance.
If the instance was modified in the application but not saved to the database your query will be executed on the database where persisted state will be evaluated but materialization process will by default use the current data from the application instead of data retrieved from the database. You can force the query to return state from the database (by setting mainDBContext.Locations.MergeOption = MergeOption.OverwriteChagens) but because of identity map your current modifications will be lost.

I'm not really sure what exactly your problem is, but I think you have to know this:
That kind of query always return data that is submitted into DB. When you change some entities in your code but they are not submitted into database the LINQ query will query the data from database, without your in-code changes.
LINQ queries use Deferred Execution, so your 'query' variable is not a list of results, it's just a query definition that is evaluated each time results are needed. You should add .ToList() to evaluate that query and get a list of results in that certain line of code.
An example for .ToList():
var query = (from location in mainDBContext.Locations
where location.InUse == true
select location).ToList();

I just ran into the same thing myself. It's a bit messy, but another option is to examine the Local cache. You can do this, for example:
var query = from location in mainDBContext.Locations.Local
where location.InUse == true
select location;
This will only use the local cache not saved to the database. A combination of local and database queries should enable you to get what you want.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

LINQ query retrieves whole record instead of one field - c#

You should be able to replace the whole thing with this: var currentFileNameList = db.GetTable<RawFile>().Select(r => r.fileName).ToArray();

Related

C# - Concatenate an in memory IList and IQueryable?

How Dapper.NET works internally with .Count() and SingleOrDefault()?

Efficiently paging large data sets with LINQ

lightswitch LINQ PreprocessQuery

C# Entity Framework Linq Query not recognizing new changes

Categories

Resources