I'm dynamically importing data into a database where I create new tables on the fly and store the metadata so that I can access those tables later via dynamically constructed SQL. My question is, for C#, is there a library out there that I can use that can abstract away some of the details of the SQL itself? The situation I'm running into is with sequences (although there are others). In Oracle accessing a sequence looks like this
select foo.nextVal from dual;
In Postgres...
select currval('foo_id_seq');
For my project I don't know what the final database will be and I don't like the idea of running through the project fixing a bunch of errors due to bad SQL.
I looked at NHibernate and it looks like tools like that (Linq to SQL) require an existing object model in place. I don't have an object model because all of my data is dynamically provided and I don't know the number of columns, data types, etc.
Any suggested approach to this problem is appreciated.
If the data you're trying to store has a dynamic structure, then it really sounds like a relational database may not be the best choice. It's strengths rely on data being statically structured and well defined. You might be better served with a document oriented store like MongoDB which is designed for dynamic schemas. If you used something like MongoDB, I think your question around abstracting query generation for dynamically changing schemas goes away.
That said, some relational databases like SQL Server have good support for XML data types which allow you to specify an arbitrary structure within your static schema. SQL Server also allows you to query directly into XML data types and even index them, which means you can query on the server side without the need for transferring the XML back to the client, deserializing, etc. To decide if this will perform well enough for your needs you'll have to test with data that will represent your production load.
Related
I usually work with MySql, but also with SQL Server, Oracle and Access, the database structure is almost the same. My database stores configuration and recorded data of a SCADA application ("Supervisory Control And Data Acquisition").
Most of the tables are usually the same but sometime my teammates adds fields, tables or changes some fields type.
I'm writing an application that need to load some config parameters from db, then load data, process it and store the new values on db. It also need to add new records.
I have a class that, independently from db type, given the correct connection params, gets a IDbConnection object. With some methods I can specified a SQL query and it give me and IDataReader or a also Dataset.
Now, how should i query data from the db, analyze, recalculate, and finally store them again?
I'm a bit scared of building a detailed object mapping because of the possibility of changed fields. A simple dataset/datatable/datarow should be ok but i'd like to use linq to query in a simpler way the extracted data from the database.
Finally, my db has about 60 tables but in this application I work only with a dozen of them. I have only a few time to build that application, so I need a fast way, also if it's not "very beautiful".
Thanks.
you should try an ORM that configures itself automatically according to schema
i have found this one. I didn't use similar things in c# but it works nicely in other (dynamic) languages.
http://www.codeproject.com/Articles/117666/Kerosene-ORM
Using an ORM would most probably be the fastest. You could use NHibernate which has multiple DB support. NHibernate does have a learning curve, so something like a micro ORM could be easier to use perhaps. Petapoco is a great micro ORM and supports SQL Server, SQL Server CE, MySQL, PostgreSQL and Oracle.
These ORMs would create a mapping file for each DB you use which needs to be updated or recreated when changes are made in the DB.
Currently, I'm sitting on an ugly business application written in Access that takes a spreadsheet on a bi-daily basis and imports it into a MDB. I am currently converting a major project that includes this into SQL Server and .net, specifically in c#.
To house this information there are two tables (alias names here) that I will call Master_Prod and Master_Sheet joined on an identity key parent to the Master_Prod table, ProdID. There are also two more tables to store history, History_Prod and History_Sheet. There are more tables that extend off of Master_Prod but keeping this limited to two tables for explanation purposes.
Since this was written in Access, the subroutine to handle this file is littered with manually coded triggers to deal with history that were and have been a constant pain to keep up with, one reason why I'm glad this is moving to a database server rather than a RAD tool. I am writing triggers to handle history tracking.
My plan is/was to create an object modeling the spreadsheet, parse the data into it and use LINQ to do some checks client side before sending the data to the server... Basically I need to compare the data in the sheet to a matching record (Unless none exist, then its new). If any of the fields have been altered I want to send the update.
Originally I was hoping to put this procedure into some sort of CLR assembly that accepts an IEnumerable list since I'll have the spreadsheet in this form already but I've recently learned this is going to be paired with a rather important database server that I am very concerned with bogging down.
Is this worth putting a CLR stored procedure in for? There are other points of entry where data enters and if I could build a procedure to handle them given the objects passed in then I could take a lot of business rule away from the application at the expense of potential database performance.
Basically I want to take the update checking away from the client and put it on the database so the data system manages whether or not the table should be updated so the history trigger can fire off.
Thoughts on a better way to implement this along the same direction?
Use SSIS. Use Excel Source to read the spreadsheets, perhaps use a Lookup Transformation to detect new items and finally use a SQL Server Destination to insert the stream of missing items into SQL.
SSIS is way better fit to these kind of jobs that writing something from scratch, no matter how much fun linq is. SSIS Packages are easier to debug, maintain and refactor than some dll with forgoten sources. Besides, you will not be able to match the refinements SSIS has in managing its buffers for high troughput Data Flows.
Originally I was hoping to put this
procedure into some sort of CLR
assembly that accepts an IEnumerable
list since I'll have the spreadsheet
in this form already but I've recently
learned this is going to be paired
with a rather important database
server that I am very concerned with
bogging down.
Does not work. Any input into a C# written CLR procedure STILL has to follow normal SQL semantics. All that can change is the internal setup. Any communication up with the client has to be done in SQL. Which means executions / method calls. No way to directly pass in an enumerable of objects.
My plan is/was to create an object
modeling the spreadsheet, parse the
data into it and use LINQ to do some
checks client side before sending the
data to the server... Basically I need
to compare the data in the sheet to a
matching record (Unless none exist,
then its new). If any of the fields
have been altered I want to send the
update.
You probably need to pick a "centricity" for your approach - i.e. data-centric or object-centric.
I would probably model the data appropriately first. This is because relational databases (or even non-normalized models represented in relational databases) will often outlive client tools/libraries applications. I would probably start trying to model in a normal form and think about the triggers to maintain audit/history as you mention during this time also.
I would typically then think of the data coming in (not an object model or an entity, really). So then I focus on the format and semantics of the inputs and see if there is misfit in my data model - perhaps there were assumptions in my data model which were incorrect. Yes, I'm not thinking of making an object model which validates the spreadsheet even though spreadsheets are notoriously fickle input sources. Like Remus, I would simply use SSIS to bring it in - perhaps to a staging table and then some more validation before applying it to production tables with some T-SQL.
Then I would think about a client tool which had an object model based on my good solid data model.
Alternatively, the object approach would mean modeling the spreadsheet, but also an object model which needs to be persisted to the database - and perhaps you now have two object models (spreadsheet and full business domain) and database model (storage persistence), if the spreadsheet object model is not as complete as the system's business domain object model.
I can think of an example where I had a throwaway external object model kind of like this. It read a "master file" which was a layout file describing an input file. This object model allowed the program to build SSIS packages (and BCP and SQL scripts) to import/export/do other operations on these files. Effectively it was a throwaway object model - it was not used as the actual model for the data in the rows or any kind of navigation between parent and child rows, etc., but simply an internal representation for internal purposes - it didn't necessarily correspond to a "domain" entity.
I'm looking for a good solution to make my life easier with regards to writing/reading to a SQL Server DB in a dynamic manner. I started with Entity-framework to make my life easier to begin with, but as the software become more general and config driven I'm finding that Entity becomes less and less appropriate because it relies on specific objects defined at design time.
What I'd like to do.
Generate Tables/Fields at runtime.
Select rows from tables by table name with unknown schema into a generic data type (eg Dictionary)
Insert rows to tables by table name using generic data types (dictonary, where the string maps to field name), where the data type mapping between typeof(object) and field type is taken care off.
I've started implementing this stuff myself, but I imagine someone has already has already done it before.
Any suggestions?
Thanks.
I'm having trouble understanding how what you are describing is any different than plain old ADO.NET. DataTables are dynamically constructed based on a SQL query and a DataRow is just a special case of an IndexedDictionary (sometimes called an OrderedDictionary where you can access values via a string name or an integer index like a list). I make no judgment as to whether choosing ADO.NET is actually right or wrong for your needs, but I'm trying to understand why you seem to have ruled it out.
You can use Sql.Net ( http://sqlom.sourceforge.net ) to easily generate dynamic SQL statements in C#.
The iBATIS.NET (now MyBatis.NET) Data Mapper framework doesn't automatically generate tables or fields at runtime, but it does allow you to select and commit data via Dictionary objects.
It's probably not going to suit your needs completely (it's kind of tedious to set up, but pretty easy to maintain once it is), but it might be worth a look. Here's a link to the online documentation.
Other popular frameworks might do the same or similar, such as NHibernate.
So far in my .Net coding adventures I've only had a need to save information to files. So I've used XmlSerializer and DataContractSerializer to serialize attributed classes to XML files. My next project, however, requires that I save and retrieve information from a SQL server database. I'm wondering what my options are for doing this.
The current version of the app, which was not created by me, uses a lot of hard coded SQL commands. But now I'm trying to avoid doing anything where I have to read or write individual fields to or from the database or objects. I especially want to avoid a lot of hard coded SQL in my code. I like how the serializer classes just figure out how to read and write XML files based on the attributes and or public properties of the class. Is there something similar for a database rather then XML?
Object Relational Mapping
There are bunch of products out there, most notorious one being NHibernate, there are couple of competing products offered by Microsoft in Linq 2 Sql and Entity Framework (you're supposed to use the later, but everyone uses the first as is waaaay simpler).
You can see a nice (although I suspect biased) comparison of ORM offerings at http://ormbattle.net/
I believe you're referring to Object Relational Mappers. These provide a wealth of functionality, including simple object CRUD plumbing.
Check out:
NHibernate
Entity Framework
Linq to SQL
There are many others, but that'll get you going.
There is no generic object type when you deal with databases. Only tables and fields.
The combination of these could make an object though. Your best bet is to use stored procedures if you are concerned with hard coded SQL on the client code.
I'm also mainly referring to the actual field types in a database. ORM's are a different story. If you want look into nHibernate if you want an object relational mapper that can help with INSERTs, SELECTs, etc.
Depending on the project an ORM like NHibernate might be what you're looking for. Something where you map your database information to classes and the ORM takes care of the inserts, deletes, and selects for you without hand-written SQL. This also allows for migration to a different database system without a ton of rewrite.
I say it depends on the project because other things come into play here like performance and how the data is actually structured.
I think you should read up on Linq to SQL. This will allow you to work "primarily" with classes that are representations of your database tables and their relations.
DataContext context = new DataContext();
var obj = context.Table1.Single(row => row.Id == 1234);
obj.Name = "Test1234";
context.SubmitChanges();
This could be a good place to start to learn about Linq to SQL
Hope this is what you are looking for.
I agree with (and prefer) the previous suggestions to use an ORM. Just to make sure you have a full menu of options here is another option. If you're comfortable with the XML representation, (de)serialization, etc... you could also look into using SQLXML. With that said, you should not use this to avoid doing proper database design although this can be totally reasonable for some solutions.
I am trying to leverage ORM given the following requirements:
1) Using .NET Framework (latest Framework is okay)
2) Must be able to use Sybase, Oracle, MSSQL interchangeably
3) The schema is mostly static, BUT there are dynamic parts.
I am somewhat familiar with SubSonic and NHibernate, but not deeply.
I get the nagging feeling that the ORM can do what I want, but I don't know how to leverage it at the moment.
SubSonic probably isn't optimal, since it doesn't currently support Sybase, and writing my own provider for it is beyond my resources and ability right now.
For #3 (above), there are a couple of metadata tables, which describe tables which the vendors can "staple on" to the existing database.
Let's call these MetaTables, and MetaFields.
There is a base static schema, which the ORM (NHibernate ATM) handles nicely.
However, a vendor can add a table to the database (physically) as long as they also add the data to the metadata tables to describe their structure.
What I'd really like is for me to be able to somehow "feed" the ORM with that metadata (in a way that it understands) and have it at that point allow me to manipulate the data.
My primary goal is to reduce the amount of generic SQL statement building I have to do on these dynamic tables.
I'd also like to avoid having to worry about the differences in SQL being sent to Sybase,Oracle, or MSSQL.
My primary problem is that I don't have a way to let ORM know about the dynamic tables until runtime, when I'll have access to the metadata
Edit: An example of the usage might be like the one outlined here:
IDataReader rdr=new Query("DynamicTable1").WHERE("ArbitraryId",2).ExecuteReader();
(However, it doesn't look like SubSonic will work, as there is no Sybase provider (see above)
Acording to this blog you can in fact use NHibernate with dynamic mapping. It takes a bit of tweaking though...
We did some of the using NHibernate, however we stopped the project since it didn't provide us with the ROI we wanted. We ended up writing our own ORM/SQL layer which worked very well (worked since I no longer work there, I'm guessing it still works).
Our system used a open source project to generate the SQL (don't remember the name any more) and we built all our queries in our own Xml based language (Query Markup Language - QML). We could then build an xmlDocument with selects, wheres, groups etc. and then send that to the SqlEngine that would turn it into a Sql statement and execute it. We discusse, but never implemented, a cache in all of this. That would've allowed us to cache the Qmls for frequently used queries.
I am a little confused as to how the orm would be used then at runtime? If the ORM would dynamically build something at runtime, how does the runtime code know what the orm did dynamically?
"have it at that point allow me to manipulate the data" - What is manipulating the data?
I may be missing something here and i aplogize if thats the case. (I only have really used bottom up approach with ORM)
IDataReader doesn't map anything to an object you know. So your example should be written using classic query builder.
Have you looked into using the ADO.NET Entity Framework?
MSDN: LINQ to Entities
It allows you to map database tables to an object model in such a manner that you can code without thinking about which database vendor is being used, and without worrying about minor variations made by a DBA to the actual tables. The mapping is kept in configuration files that can be modified when the db tables are modified without requiring a recompile.
Also, using LINQ to Entities, you can build queries in an OO manner, so you aren't writing actual SQL query strings.