Collecting metadata into table - c#

I have tabluar data that passes through a C# program that I need to collect some metadata on before finishing. The metadata is always counts based on fields of the data. Also, I need them all grouped by one field in the data. Periodically, I need to add new counts to this collection of metadata.
I've been researching it for a little while, and I think what makes sense is to rework my program to store the data as a DataTable, then run LINQ queries on the table. The problem I'm having is being able to put the different counts into one table-like structure and then write that out.
I might run a query like this:
var query01 =
from record in records.AsEnumerable()
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Count = associationsGroup.Count<DataRow>() };
To get a count of all of the records grouped by the field Association Key. I'm going to want another count, grouped in the same way:
var query02 =
from record in records.AsEnumerable()
where record.Field<String>("Number 9") == "yes"
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Number9Count = associationsGroup.Count<DataRow>() };
And so on.
I thought about trying Union chain the queries but I was having trouble getting them to union since I'm projecting into anonymous types. I couldn't figure out how to do it differently to make a union work better.
So, how can I collect my metadata into one table-like structure?

Not going to union because you have different types. Add Number9Count and Count to both annonymous types and try union again.

I ended up solving the problem by creating a class that holds the set of records I need as a DataTable. A user can add queries through a method, taking an argument Func<DataRow, bool>. The method constructs the query supplying that argument as the where clause, maintaining the same grouping and properties in the resulting anonymous-typed object.
When retrieving the results, the class iterates over each query stored and enters the results into a new DataTable.

Related

Query ODataV4 connected service with LINQ - Get last record from table

Im trying to query my OData webservice from a C# application.
When i do the following:
var SecurityDefs = from SD in nav.ICESecurityDefinition.Take(1)
orderby SD.Entry_No descending
select SD;
i get an exception because .top() and .orderby is not supposed to be used together.
I need to get the last record in the dataset and only the last.
The purpose is to get the last used entry number in a ledger and then continue creating new entries incrementing the found entry no.
I cant seem to find anything online that explains how to do this.
Its very important that the service only returns the last record from the feed since speed is paramount in this solution.
i get an exception because .top() and .orderby is not supposed to be used together.
Where did you read that? In general .top() or .Take() should ONLY be used in conjunction WITH .orderby(), otherwise the record being retrieved is not guaranteed to be repeatable or predictable.
Probably the compounding issue here is mixing query and fluent expression syntax, which is valid, but you have to understand the order of precedence.
Your syntax is taking 1 record, then applying a sort order... you might find it easier to start with a query like this:
// build your query
var SecurityDefsQuery = from SD in nav.ICESecurityDefinition
orderby SD.Entry_No descending
select SD;
// Take the first item from the list, if it exists, will be a single record.
var SecurityDefs = SecurityDefsQuery.FirstOrDefault();
// Take an array of only the first record if it exists
var SecurityDefsDeferred = SecurityDefsQuery.Take(1);
This can be executed on a single line using brackets, but you can see how the query is the same in both cases, SecurityDefs in this case is a single ICESecurityDefinition typed record, where as SecurityDefsDeferred is an IQueryable<ICESecurityDefinition> that only has a single record.
If you only need the record itself, you this one liner:
var SecurityDefs = (from SD in nav.ICESecurityDefinition
orderby SD.Entry_No descending
select SD).FirstOrDefault();
You can execute the same query using fluent notation as well:
var SecurityDefs = nav.ICESecurityDefinition.OrderByDescending(sd => sd.Entry_No)
.FirstOrDefault();
In both cases, .Take(1) or .top() is being implemented through .FirstOrDefault(). You have indicated that speed is important, so use .First() or .FirstOrDefault() instead of .Single() or .SingleOrDefault() because the single variants will actually request .Take(2) and will throw an exception if it returns 1 or no results.
The OrDefault variants on both of these queries will not impact the performance of the query itself and should have negligble affect on your code, use the one that is appriate for your logic that uses the returned record and if you need to handle the case when there is no existing record.
If the record being returned has many columns, and you are only interested in the Entry_No column value, then perhaps you should simply query for that specific value itself:
Query expression:
var lastEntryNo = (from SD in nav.ICESecurityDefinition
orderby SD.Entry_No descending
select SD.Entry_No).FirstOrDefault();
Fluent expression:
var lastEntryNo = nav.ICESecurityDefinition.OrderByDescending(sd => sd.Entry_No)
.Select(sd => sd.Entry_No)
.FirstOrDefault();
If Speed is paramount then look at providing a specific custom endpoint on the service to either serve the record or do not process the 'Entry_No` in the client at all, make that the job of the code that receives data from the client and compute it at the time the entries are inserted.
Making the query perform faster is not the silver bullet you might be looking for though, Even if this is highly optimised, your current pattern means that X number of clients could all call the service to get the current value of Entry_No, meaning all of them would start incrementing from the same value.
If you MUST increment the Entry_No from the client then you should look at putting a custom endpoint on the service to simply return the Next Entry_No to use. This should be optimistic meaning that you don't care if the Entry_No actually gets used in the end, but you can implement the end point such that every call will increment the field in the database and return the next value.
Its getting a bit beyond the scope of your initial post, but SQL Server now has support for Sequences that formalise this type of logic from a database and schema point of view, using Sequence simplifies how we can manage these types of incrementations from the client, because we no longer rely on the outcome of data updates to be comitted to the table before the client can increment the next record. (which is what your TOP, Order By Desc solution is trying to do.

SQL generated from LINQ not consistent

I am using Telerik Open/Data Access ORM against an ORACLE.
Why do these two statements result in different SQL commands?
Statement #1
IQueryable<WITransmits> query = from wiTransmits in uow.DbContext.StatusMessages
select wiTransmits;
query = query.Where(e=>e.MessageID == id);
Results in the following SQL
SELECT
a."MESSAGE_ID" COL1,
-- additional fields
FROM "XFE_REP"."WI_TRANSMITS" a
WHERE
a."MESSAGE_ID" = :p0
Statement #2
IQueryable<WITransmits> query = from wiTransmits in uow.DbContext.StatusMessages
select new WITransmits
{
MessageID = wiTranmits.MessageID,
Name = wiTransmits.Name
};
query = query.Where(e=>e.MessageID == id);
Results in the following SQL
SELECT
a."MESSAGE_ID" COL1,
-- additional fields
FROM "XFE_REP"."WI_TRANSMITS" a
The query generated with the second statement #2 returns, obviously EVERY record in the table when I only want the one. Millions of records make this prohibitive.
Telerik Data Access will try to split each query into database-side and client-side (or in-memory LINQ if you prefer it).
Having projection with select new is sure trigger that will make everything in your LINQ expression tree after the projection to go to the client side.
Meaning in your second case you have inefficient LINQ query as any filtering is applied in-memory and you have already transported a lot of unnecessary data.
If you want compose LINQ expressions in the way done in case 2, you can append the Select clause last or explicitly convert the result to IEnumerable<T> to make it obvious that any further processing will be done in-memory.
The first query returns the full object defined, so any additional limitations (like Where) can be appended to it before it is actually being run. Therefore the query can be combined as you showed.
The second one returns a new object, which can be whatever type and contain whatever information. Therefore the query is sent to the database as "return everything" and after the objects have been created all but the ones that match the Where clause are discarded.
Even though the type were the same in both of them, think of this situation:
var query = from wiTransmits in uow.DbContext.StatusMessages
select new WITransmits
{
MessageID = wiTranmits.MessageID * 4 - 2,
Name = wiTransmits.Name
};
How would you combine the Where query now? Sure, you could go through the code inside the new object creation and try to move it outside, but since there can be anything it is not feasible. What if the checkup is some lookup function? What if it's not deterministic?
Therefore if you create new objects based on the database objects there will be a border where the objects will be retrieved and then further queries will be done in memory.

C# - Concatenate an in memory IList and IQueryable?

Suppose I have a List containing one string value. Suppose I also have an IQueryable that contains several strings from a database. I want to be able to concatenate these two containers into one list and then be able to call methods such as .Skip or .Take on the list. I want to be able to do this in such a way that when I combine the two containers I don't load all of the DB data into memory (only after I call .Skip and .Take). Basically, I want to do something like this (pseudocode):
IQueryable someQuery = myEntities.GetDBQuery(); // Gets "test2", "test3"
IList inMemoryList = new List();
inMemoryList.Add("test");
IList finalList = inMemoryList.Union(someQuery) // Can I do something like this without loading DB data into memory? finalList should contain all 3 strings.
// At this point it is fine to load the filtered query into memory.
foreach (string myString in finalList.Skip(100).Take(200))
{
// Do work...
}
How can I achieve this?
If I didn't misunderstand, you are trying to query the data, part of which comes from memory and others from database, like this:
//the following code will not compile, just for example
var dbQuery = BuildDbQuery();
var list = BuildListInMemory();
var myQuery = (dbQuery + list).OrderBy(aa).Skip(bb).Take(cc).Select(dd);
//and you don't want to load all records into memory by dbQuery
//because you only need some of them
The short answer is NO, you can't. Consider the .OrderBy method, all data have to be in a same "place", otherwise the code can't sort them. So the code loads all records in database by dbQuery into memory(now they are in a same place) and then sorts all of them including those in list. That probably causes a memory issue when dbQuery gives thousands of rows.
HOW TO RESOLVE
Pass the data in list into database (as parameters of dbQuery) so that the query happens in database. This is easy if your list has only a few items.
If list also has lots of records that will makes dbQuery too complex, you can try to query twice, one for dbQuery and one for list. For example, you have 10,000 users in database and 1,000 users in your memory list, and you want to get the top 10 youngest users. You don't need to load 10,000 users into memory and then find the youngest 10. Instead, you find 10 youngest (ResultA) in dbQuery and load into memory, and 10 youngest (ResultB) in memory list, and then compare between ResultA and ResultB.
I entirely agree with Danny's answer when he says you need to somehow find a way to include in memory user list into db so that you achieve what you want. As for the example which you sought in your comment, without knowing data structure of your User object, seems difficult. However assuming you would be able to connect the dots. Here is my suggested approach:
Create temporary table with identical structure that of your regular user table in your db and insert all your inmemory users into it
Write a query to Union temporary and regular table both identical in structure so that should be easy.
Return the result in your application and use it performing standard Linq operations
If you want exact code which you can use as it is then you will have to provide your User object structure - fields type etc in db to enable me to write the code.
You specify that your query and your list are both sequences of strings. someQuery can be performed completely on the database side (not in-memory)
Let's make your sequences less generic:
IQueryable<string> someQuery = ...
IList<string> myList = ...
You also specify that myList contains only one element.
string myOneAndOnlyString = myList.Single();
As your list is in-memory, this has to be performed in-memory. But because the list has only one element, this won't take any time.
The query that you request:
IQueryable<string> correctQuery = someQuery
.Where(item => item.Equals(myOneandOnlyString)
.Skip(skipCount)
.Take(takeCount)
Use your SQL server profiler to check the used SQL and see that the request is completely performed in one SQL statement.

determining differences between generic lists

There are probably 10 duplicates of this question but I would like to know if there is a better way than I am currently doing this. This is a small example that I'm using to show how I'm determining differences:
//let t1 be a representation of the ID's in the database.
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
//let t2 be the list of ID's that are in memory.
//these changes need to be reflected to the database.
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var hash = new HashSet<int>(t1);
var hash2 = new HashSet<int>(t2);
//determines which ID's need to be removed from the database
hash.ExceptWith(t2);
//determines which ID's need to be added to the database.
hash2.ExceptWith(t1);
//remove contents of hash from database
//add contents of hash2 to database
I want to know if I can determine what to add and remove in ONE operation instead of the two that I currently have to do. Is there any way to increase the performance of this operation? Keep in mind in the actual database situation there are hundreds of thousands of ID's.
EDIT or second question, is there a LINQ query that I can do directly on the database so I can just supply the new list of ID's and have it automatically remove/add itself? (using mysql)
CLARIFICATION I know I need two SQL queries (or a stored procedure). The question is if I can determine the differences in the list in one action, and if it can be done faster than this.
EDIT2
This operation from SPFiredrake appears to be faster than my hashset version - however I have no idea how to determine which to add and which to remove from the database. Is there a way to include that information in the operation?
t1.Union(t2).Except(t1.Intersect(t2))
EDIT3
Nevermind, I forgot that this statement in-fact has the problem of delayed execution, although in-case anyone is wondering, I solved my prior problem with it by using a custom comparer and an added variable determining which list it was from.
Ultimately, you're going to use a full outer join (which in LINQ world, is two GroupJoins). However, we ONLY care about values that don't have a matching record in either table. Null right value (left outer join) indicates a removal, null left value (right outer join) indicates an addition. So to get it to work this way, we just perform two left outer joins (switching the input for the second case to emulate the right outer join), concat them together (can use union, but unnecessary since we'll be getting rid of any duplicates anyway).
List<int> t1 = new List<int>() { 5, 6, 7, 8 };
List<int> t2 = new List<int>() { 6, 8, 9, 10 };
var operations =
t1.GroupJoin(
t2,
t1i => t1i,
t2i => t2i,
(t1i, t2join) => new { Id = t1i, Action = !t2join.Any() ? "Remove" : null })
.Concat(
t2.GroupJoin(
t1,
t2i => t2i,
t1i => t1i,
(t2i, t1join) => new { Id = t2i, Action = !t1join.Any() ? "Insert" : null })
.Where(tr => tr.Action != null)
This will give you the select statement. Then, you can feed this data into a stored procedure that removes values that already exist in the table and add the rest (or two lists to run removals and additions against). Either way, still not the cleanest way to do it, but at least this gets you thinking.
Edit: My original solution was to separate out the two lists based on what action was needed, which is why it's so ghastly. The same can be done using a one-liner (not caring about which action to take, however), although I think you'll still suffer from the same issues (using LINQ [enumeration] as opposed to Hashsets [hash collection]).
// XOR of sets = (A | B) - (A & B), - being set difference (Except)
t1.Union(t2).Except(t1.Intersect(t2))
I'm sure it'll still be slower than using the Hashsets, but give it a shot anyway.
Edit: Yes, it is faster, because it doesn't actually do anything with the collection until you enumerate over it (either in a foreach or by getting it into a concrete data type [IE: List<>, Array, etc]). It's still going to take extra time to sort out which ones to add/remove and that's ultimately the problem. I was able to get comparable speed by breaking down the two queries, but getting it into the in-memory world (via ToList()) made it slower than the hashset version:
t1.Except(t2); // .ToList() slows these down
t2.Except(t1);
Honestly, I would handle it on the SQL side. In the stored proc, store all the values in a table variable with another column indicating addition or removal (based on whether the value already exists in the table). Then you can just do a bulk deletion/insertion by joining back to this table variable.
Edit: Thought I'd expand on what I meant by sending the full list to the database and have it handled in the sproc:
var toModify = t1.Union(t2).Except(t1.Intersect(t2));
mods = string.Join(",", toModify.ToArray());
// Pass mods (comma separated list) to your sproc.
Then, in the stored procedure, you would do this:
-- #delimitedIDs some unbounded text type, in case you have a LOT of records
-- I use XQuery to build the table (found it's faster than some other methods)
DECLARE #idTable TABLE (ID int, AddRecord bit)
DECLARE #xmlString XML
SET #xmlString = CAST('<NODES><NODE>' + REPLACE(#delimitedIDs, ',', '</NODE><NODE>') + '</NODE></NODES>' as XML)
INSERT INTO #idTable (ID)
SELECT node.value('.','int')
FROM #xmlString.nodes('//NODE') as xs(node)
UPDATE id
SET AddRecord = CASE WHEN someTable.ID IS NULL THEN 1 ELSE 0 END
FROM #idTable id LEFT OUTER JOIN [SomeTable] someTable on someTable.ID = id.ID
DELETE a
FROM [SomeTable] a JOIN #idTable b ON b.ID = a.ID AND b.AddRecord = 0
INSERT INTO [SomeTable] (ID)
SELECT id FROM #idTable WHERE AddRecord = 1
Admittedly, this just inserts some ID, it doesn't actually add any other information. However, you can still pass in XML data to the sproc and use XQuery in a similar fashion to get the information you'd need to add.
even if you replace it with a Linq version you still need two operations.
let's assume you are doing this using pure SQL.
you would probably need two queries:
one for removing the records
another one for adding them
Using LINQ code it would be much more complicated and less readable than your solution

Accessing foreign keys through LINQ

I have a setup on SQL Server 2008. I've got three tables. One has a string identifier as a primary key. The second table holds indices into an attribute table. The third simply holds foreign keys into both tables- so that the attributes themselves aren't held in the first table but are instead referred to. Apparently this is common in database normalization, although it is still insane because I know that, since the key is a string, it would take a maximum of 1 attribute per 30 first table room entries to yield a space benefit, let alone the time and complexity problems.
How can I write a LINQ to SQL query to only return values from the first table, such that they hold only specific attributes, as defined in the list in the second table? I attempted to use a Join or GroupJoin, but apparently SQL Server 2008 cannot use a Tuple as the return value.
"I attempted to use a Join or
GroupJoin, but apparently SQL Server
2008 cannot use a Tuple as the return
value".
You can use anonymous types instead of Tuples which are supported by Linq2SQL.
IE:
from x in source group x by new {x.Field1, x.Field2}
I'm not quite clear what you're asking for. Some code might help. Are you looking for something like this?
var q = from i in ctx.Items
select new
{
i.ItemId,
i.ItemTitle,
Attributes = from map in i.AttributeMaps
select map.Attribute
};
I use this page all the time for figuring out complex linq queries when I know the sql approach I want to use.
VB http://msdn.microsoft.com/en-us/vbasic/bb688085
C# http://msdn.microsoft.com/en-us/vcsharp/aa336746.aspx
If you know how to write the sql query to get the data you want then this will show you how to get the same result translating it into linq syntax.

Categories