How can I write Join statements on dataset..
I have data in xml format..I can load that data into a dataset..
but how do I fetch data from two datatables using a join query?
Well, it partly depends on how you want to express that join. If you know the query beforehand, I would personally use LINQ to Objects via LINQ to DataSet - that's particularly handy if you're working with strongly typed datasets, but it can work even without that.
The sample code for C# in Depth has some examples in LINQ to DataSet you could have a look at.
Now, if you want to read the query dynamically as well, that makes it a lot harder.
Is this XML actually an XML-serialized dataset? Do you definitely need to get datasets involved at all? If it's just plain XML, have you tried using LINQ to XML with LINQ to Objects? It may be less efficient, but how vital is that for your application? How large is the data likely to be?
Related
I've recently studied some LINQ sources and decided to use it in the project I'm working on. Everything is almost clear except for one thing.
I'm making complicated reports that make up of several tables. Earlier I used stored procedures for the purpose. I formed several temporary pieces of data that I stored in temporary tables and then joined them together using a series of 2-table joins.
Trouble is: LINQ doesn't allow the creation of temporary tables. I know that complicated queries are built in LINQ in a "cascade" way, but if I do it this way,
Question is: what am I going to receive in DataContext.Log in the end? I assume it's going to be a really huge query that is impossible to understand and use for debugging. Am I right? If I am, how to find a workaround for this? DataLoadOptions and LoadWith won't do, because I am processing all the data at once and using it will lead to an avalanche of queries.
Thanks in advance
LINQ definitely allows the creation of temporary tables, it's just any data type that implements IQueryable (Lists being a good example) So if you need a temp table, you just do this...
var tempTable1 = [LINQ Query goes here].ToList()
and voila, you have a temp table. If you don't like generic variables, then you can create classes and use a List<tableClass> instead of a generic.
From this point, you can use the new Var as something to select from while running LINQ queries. If you need it to persist longer, you can store it in Session or pass the variable around.
I was reading this SO question but still I am not clear about one specific thing.
If I use NHibernate, why do I need LINQ?
The question in my mind becomes more aggravated when I knew that NHibernate also included LINQ support.
LINQ to NHibernate?
WTF!
LINQ is a query language. It allows you to express queries in a way that is not tied in to your persistence layer.
You may be thinking about the LINQ 2 SQL ORM.
Using LINQ in naming the two is causes unfortunate confusions like yours.
nHibernate, EF, LINQ2XML are all LINQ providers - they all allow you to query a data source using the LINQ syntax.
Well, you don't need Linq, you can always do without it, but you might want it.
Linq provides a way to express operations that behave on sets of data that can be queried and where we can then perform other operations based on the state of that data. It's deliberately written so as to be as agnostic as possible whether that data is in-memory collections, XML, database, etc. Ultimately it's always operating on some sort of in-memory object, with some means of converting between in-memory and the ultimate source, though some bindings go further than others in pushing some of the operations down to a different layer. E.g. calling .Count() can end up looking at a Count property, spinning through a collection and keeping a tally, sending a Count(*) query to a database or maybe something else.
ORMs provide a way to have in-memory objects and database rows reflect each other, with changes to one being reflected by changes to the other.
That fits nicely into the "some means of converting" bit above. Hence Linq2SQL, EF and Linq2NHibernate all fulfil both the ORM role and the Linq provider role.
Considering that Linq can work on collections you'd have to be pretty perverse to create an ORM that couldn't support Linq at all (you'd have to design your collections to not implement IEnumerable<T> and hence not work with foreach). More directly supporting it though means you can offer better support. At the very least it should make for more efficient queries. For example if an ORM gave us a means to get a Users object that reflected all rows in a users table, then we would always be able to do:
int uID = (from u in Users where u.Username == "Alice" select u.ID).FirstOrDefault();
Without direct support for Linq by making Users implement IQueryable<User>, then this would become:
SELECT * FROM Users
Followed by:
while(dataReader.Read())
yield return ConstructUser(dataReader);
Followed by:
foreach(var user in Users)
if(user.Username == "Alice")
return user.ID;
return 0;
Actually, it'd be just slightly worse than that. With direct support the SQL query produced would be:
SELECT TOP 1 id FROM Users WHERE username = 'Alice'
Then the C# becomes equivalent to
return dataReader.Read() ? dataReader.GetInt32(0) : 0;
It should be pretty clear how the greater built-in Linq support of a Linq provider should lead to better operation.
Linq is an in-language feature of C# and VB.NET and can also be used by any .NET language though not always with that same in-language syntax. As such, ever .NET developer should know it, and every C# and VB.NET developer should particularly know it (or they don't know C# or VB.NET) and that's the group NHibernate is designed to be used by, so they can depend on not needing to explain a whole bunch of operations by just implementing them the Linq way. Not supporting it in a .NET library that represents queryable data should be considered a lack of completeness at best; the whole point of an ORM is to make manipulating a database as close to non-DB related operations in the programming language in use, as possible. In .NET that means Linq supprt.
First of all LINQ alone is not an ORM. It is a DSL to query the objects irrespective of the source it came from.
So it makes perfect sense that you can use LINQ with Nhibernate too
I believe you misunderstood the LINQ to SQL with plain LINQ.
Common sense?
There is a difference between an ORM like NHibernate and compiler integrated way to express queries which is use full in many more scenarios.
Or: Usage of LINQ (not LINQ to SQL etc. - the langauge, which is what you talk about though I am not sure you meant what you said) means you dont have to deal with Nhibernate special query syntax.
Or: Anyone NOT using LINQ - regardless of NHibernate or not - de qualifies without good explanation.
You don't need it, but you might find it useful. Bear in mind that Linq, as others have said, is not the same thing as Linq to SQL. Where I work, we write our own SQL queries to retrieve data, but it's quite common for us to use Linq to manipulate that data in order to serve a specific need. For instance, you might have a data access method that allows you to retrieve all dogs owned by Dave:
new DogOwnerDal().GetForOwner(id);
If you're only interested in Dave's daschunds for one specific need, and performance isn't that much of an issue, you can use Linq to filter the response for all of Dave's dogs down to the specific data that you need:
new DogOwnerDal().GetForOwner(id).Where(d => d.Breed == DogBreeds.Daschund);
If performance was crucial, you might want to write a specific data access method to retrieve dogs by owner and breed, but in many cases the effort expended to create the new data access method doesn't increase efficiency enough to be worth doing.
In your example, you might want to use NHibernate to retrieve a lump of data, and then use Linq to break that data into lots of individual subsets for some form of processing. It might well be cheaper to get the data once and use Linq to split it up, instead of repeatedly interrogating the database for different mixtures of the same data.
I have quite a few datatables that are transported between processes by converting them to XML.
My question is, is it quicker to read the XML into a dataset / datatable on the other side and querying that (with LINQ) or should I just leave the XML and query with Linq?
Does the overhead in converting from XML to data table justify any search performance increases a data table may have?
The queries are mainly just finding a primary key.
If the number of queries per table is small, then my guess is that it is faster to query the XML. Reverse that advice if there are numerous queries.
What is the benefit of writing a custom LINQ provider over writing a simple class which implements IEnumerable?
For example this quesiton shows Linq2Excel:
var book = new ExcelQueryFactory(#"C:\Users.xls");
var administrators = from x in book.Worksheet<User>()
where x.Role == "Administrator"
select x;
But what is the benefit over the "naive" implementation as IEnumerable?
A Linq provider's purpose is to basically "translate" Linq expression trees (which are built behind the scenes of a query) into the native query language of the data source. In cases where the data is already in memory, you don't need a Linq provider; Linq 2 Objects is fine. However, if you're using Linq to talk to an external data store like a DBMS or a cloud, it's absolutely essential.
The basic premise of any querying structure is that the data source's engine should do as much of the work as possible, and return only the data that is needed by the client. This is because the data source is assumed to know best how to manage the data it stores, and because network transport of data is relatively expensive time-wise, and so should be minimized. Now, in reality, that second part is "return only the data asked for by the client"; the server can't read your program's mind and know what it really needs; it can only give what it's asked for. Here's where an intelligent Linq provider absolutely blows away a "naive" implementation. Using the IQueryable side of Linq, which generates expression trees, a Linq provider can translate the expression tree into, say, a SQL statement that the DBMS will use to return the records the client is asking for in the Linq statement. A naive implementation would require retrieving ALL the records using some broad SQL statement, in order to provide a list of in-memory objects to the client, and then all the work of filtering, grouping, sorting, etc is done by the client.
For example, let's say you were using Linq to get a record from a table in the DB by its primary key. A Linq provider could translate dataSource.Query<MyObject>().Where(x=>x.Id == 1234).FirstOrDefault() into "SELECT TOP 1 * from MyObjectTable WHERE Id = 1234". That returns zero or one records. A "naive" implementation would probably send the server the query "SELECT * FROM MyObjectTable", then use the IEnumerable side of Linq (which works on in-memory classes) to do the filtering. In a statement you expect to produce 0-1 results out of a table with 10 million records, which of these do you think would do the job faster (or even work at all, without running out of memory)?
You don't need to write a LINQ provider if you only want to use the LINQ-to-Objects (i.e. foreach-like) functionality for your purpose, which mostly works against in-memory lists.
You do need to write a LINQ provider if you want to analyse the expression tree of a query in order to translate it to something else, like SQL. The ExcelQueryFactory you mentioned seems to work with an OLEDB-Connection for example. This possibly means that it doesn't need to load the whole excel file into memory when querying its data.
In general performance. If you have some kind of index you can do a query much faster than what is possible on a simple IEnumerable<T>.
Linq-To-Sql is a good example for that. Here you transform the linq statement into another for understood by the SQL server. So the server will do the filtering, ordering,... using the indexes and doesn't need to send the whole table to the client which then does it with linq-to-objects.
But there are simpler cases where it can be useful too:
If you have a tree index over the propery Time then a range query like .Where(x=>(x.Time>=now)&&(x.Time<=tomorrow)) can be optimized a lot, and doesn't need to iterate over every item in the enumerable.
LINQ will provide deferred execution as much as maximum possible to improve the performance.
IEnumurable<> and IQueryable<> will totally provide different program implementations. IQueryable will give native query by building expression tree dynamically which provides good performance indeed then IEnumurable.
http://msdn.microsoft.com/en-us/vcsharp/ff963710.aspx
if we are not sure we may use var keyword and dynamically it will initialize a most suitable type.
Just a quick question.
Other than the way you manipulate them, are XMLDocuments and DataSets basically the same thing? I'm just wondering for speed issues.
I have come across some code that calls dataSet.getXML() and then traverses the new XMLDocument.
I'm just curious what's the performance difference and which is the best one to use!
Thanks,
Adam
Very different.
A DataSet is a collection of related tabular records (with a strong focus on databases), including change tracking.
An XmlDocument is a tree structure of arbitrary data. You can convert between the two.
For "which is best".... what are you trying to do? Personally I very rarely (if ever) use DataSet / DataTable, but some people like them. I prefer an object (class) representation (perhaps via deserialization), but xml processing is fine in many cases.
It does, however, seem odd to write a DataSet to xml and then query the xml. In that scenario I would just access the original data directly.
No they are not. A DataSet does not store its internal data in XML and than a XMLDocument does not use a table/row structure to store XML Elements. You can convert from one to the other within severe limits but that's it. One of the biggest limitations is that a DataSet requires data to fit in a strict Table/Column format where a XmlDocument can have a wildly different structure from one XmlElement to the next. Moreover, the hierarchical structure of a XmlDocument usually doesn't map well to the tabular structure of a DataSet.
.NET provides XmlDataDocument as a way handle XML data in a tabular way. You have to remember though that XmlDataDocument is an XmlDocument first. The generated DataSet is just an alternative and limited way to look at the underlying XML data.
Depending on the size of your tables linq to xml or xquery might be faster to query your data than looking through the table. Im not positive on this, it is something you are going to have to test against your own data.