I have multiple relational tables in the following format.
enter image description here
I'm trying to query that data in an efficient way in .Net so that I can perform transforms on the data (array to object) and insert into DocumentDb. Essentially doing some ETL work, but because the data has to be transformed a certain way and going into DocumentDb, we are using .Net.
We are inserting into one document collection data from all relational tables, so there will be lots of
// if still in same profile record, insert more relational data for each relational table.
We are trying to avoid cartesianing(sp?) the data so that one profile doesn't have 100 records or more. We were thinking about using some of the Oracle methods to convert child records to a Json Array, but can't upgrade the Oracle system to the release that allows for that feature. Another thought was to use create an xml document, but that feels pretty wrong.
Any ideas on essentially the best practice for handling ETL work within .Net? Most of the web sites I've worked on involve only pulling from a few tables at best and a lot are 1:1 relationships.
You could use EntityFramework for this. Simply create a DBContext, and all the POCO classes that represent your tables with their relationships. Then execute the necessary query on your DataSet and you'll have all the data mapped to your objects, which then you can serialize in any way you want.
Related
I have been trying to add as I do with MS SQL Server but its never appears in the list as a type to select from
That's not the answer you expect, and you probably already read about it, but even if there's a way to use Vertica with a C# Entity provider, you probably don't want to go down this way.
We tried to achieve the same thing in Java, using hibernate to store objects in Vertica. This was for the sake of convenience : storing all of our data in a single place seemed like a good idea. We created a couple of tables with a few thousand lines max, storing objects loaded by Hibernate. The content of the tables was updated several times per hour.
It quickly brought Vertica on its knees. The update process was filling the Vertica Write Optimized Store (WOS), causing a WOS overflow. Reading performance was bad, since all columns of each table were retrieved (given the column oriented structure of Vertica tables, this would be equivalent to creating a table for each column in a row oriented database, and then joining all these tables to retrieve a single row). We ended up storing hibernate objects in a MySQL database, in order to only keep in Vertica the large volume of data which was intended to be analyzed/aggregated.
In short, before looking for a Vertica C# Entity provider, just make sure you do know exactly how your provider will be used, the structure and complexity of your entities, and how Vertica will handle it.
Good luck :)
I'm dynamically importing data into a database where I create new tables on the fly and store the metadata so that I can access those tables later via dynamically constructed SQL. My question is, for C#, is there a library out there that I can use that can abstract away some of the details of the SQL itself? The situation I'm running into is with sequences (although there are others). In Oracle accessing a sequence looks like this
select foo.nextVal from dual;
In Postgres...
select currval('foo_id_seq');
For my project I don't know what the final database will be and I don't like the idea of running through the project fixing a bunch of errors due to bad SQL.
I looked at NHibernate and it looks like tools like that (Linq to SQL) require an existing object model in place. I don't have an object model because all of my data is dynamically provided and I don't know the number of columns, data types, etc.
Any suggested approach to this problem is appreciated.
If the data you're trying to store has a dynamic structure, then it really sounds like a relational database may not be the best choice. It's strengths rely on data being statically structured and well defined. You might be better served with a document oriented store like MongoDB which is designed for dynamic schemas. If you used something like MongoDB, I think your question around abstracting query generation for dynamically changing schemas goes away.
That said, some relational databases like SQL Server have good support for XML data types which allow you to specify an arbitrary structure within your static schema. SQL Server also allows you to query directly into XML data types and even index them, which means you can query on the server side without the need for transferring the XML back to the client, deserializing, etc. To decide if this will perform well enough for your needs you'll have to test with data that will represent your production load.
I usually work with MySql, but also with SQL Server, Oracle and Access, the database structure is almost the same. My database stores configuration and recorded data of a SCADA application ("Supervisory Control And Data Acquisition").
Most of the tables are usually the same but sometime my teammates adds fields, tables or changes some fields type.
I'm writing an application that need to load some config parameters from db, then load data, process it and store the new values on db. It also need to add new records.
I have a class that, independently from db type, given the correct connection params, gets a IDbConnection object. With some methods I can specified a SQL query and it give me and IDataReader or a also Dataset.
Now, how should i query data from the db, analyze, recalculate, and finally store them again?
I'm a bit scared of building a detailed object mapping because of the possibility of changed fields. A simple dataset/datatable/datarow should be ok but i'd like to use linq to query in a simpler way the extracted data from the database.
Finally, my db has about 60 tables but in this application I work only with a dozen of them. I have only a few time to build that application, so I need a fast way, also if it's not "very beautiful".
Thanks.
you should try an ORM that configures itself automatically according to schema
i have found this one. I didn't use similar things in c# but it works nicely in other (dynamic) languages.
http://www.codeproject.com/Articles/117666/Kerosene-ORM
Using an ORM would most probably be the fastest. You could use NHibernate which has multiple DB support. NHibernate does have a learning curve, so something like a micro ORM could be easier to use perhaps. Petapoco is a great micro ORM and supports SQL Server, SQL Server CE, MySQL, PostgreSQL and Oracle.
These ORMs would create a mapping file for each DB you use which needs to be updated or recreated when changes are made in the DB.
What is the standard way of copying data from one oracle database to another.
1) Read data from source table and copy to temp table on destination using configuration( i.e. there are more than 1 table and each table has separate temp table)
2) Right now there is no clob data, but in future clob data might be used.
3) Read everything to memory(if large data read in chunks)
Should not use Oracle links
Should not use files
Code should be only using C# but not any database procedures.
One way that I've used to do this is to use a DataReader on the source database and just perform inserts on the target database (using Bind Parameters for sure).
Note that the DataReader is excellent at not using much memory as it moves through a table (I believe that by default it uses a Fast Forward, Read Only cursor). This means that only a small amount of data is held in memory at a given time.
Here are the things to watch out for:
Relationships
If you're working with data that has relationships, you're going to need to deal with that. There are two ways that I've seen to deal with this:
Temporarily drop the relationships in the target database before doing the copy, then recreate them after.
Copy the data in the correct order for the relationships to work correctly (this is usually pretty difficult / inefficient)
Auto Generated Id Values
These columns are usually handled by disabling the auto increment functionality for the given table and allowing identity insert (I'm using some SQL Server terms, I can't remember how it works on Oracle).
Transactions
If you're moving a lot of data, transactions will be expensive.
Repeatability / Deleting Target Data
Unless you're way more awesome than the rest of us, you'll probably have to run this thing more than once (at least during development). That means you might want a way to delete the target data.
Platform Specific Methods
In SQL Server, there are ways to perform bulk inserts that are blazingly fast (by giving up little things like referential integrity checking). There might be a similar feature within the Oracle toolset.
Table / Column Metadata
I haven't had to do this in Oracle yet, but it looks like you can get metadata on tables and columns using the views mentioned here.
I want to write a library to store any objects to a database. I still did the mapping for known objects using code first and the classes DbContext and DbSet. But in this case I don't know the structure of the objects I have to map. Is there a way to do this with the classes above?
The only way you could do it is to create mapping classes dynamically and loading them into the ORM, but I'm not sure this is possible with EF4 since I'm a NHibernate guy.
By the way, I can't see a motive behind this. Why would you need such a thing?
If you need to store objects that have different (and unknown) schema you can use the Serialized LoB pattern (http://martinfowler.com/eaaCatalog/serializedLOB.html) using a TEXT field in a relational database or go schema less with a NoSql document database such as MongoDB.