Can HP Vertica be used as an C# Entity Framework provider?

Can HP Vertica be used as an C# Entity Framework provider? - c#

I have been trying to add as I do with MS SQL Server but its never appears in the list as a type to select from

That's not the answer you expect, and you probably already read about it, but even if there's a way to use Vertica with a C# Entity provider, you probably don't want to go down this way.
We tried to achieve the same thing in Java, using hibernate to store objects in Vertica. This was for the sake of convenience : storing all of our data in a single place seemed like a good idea. We created a couple of tables with a few thousand lines max, storing objects loaded by Hibernate. The content of the tables was updated several times per hour.
It quickly brought Vertica on its knees. The update process was filling the Vertica Write Optimized Store (WOS), causing a WOS overflow. Reading performance was bad, since all columns of each table were retrieved (given the column oriented structure of Vertica tables, this would be equivalent to creating a table for each column in a row oriented database, and then joining all these tables to retrieve a single row). We ended up storing hibernate objects in a MySQL database, in order to only keep in Vertica the large volume of data which was intended to be analyzed/aggregated.
In short, before looking for a Vertica C# Entity provider, just make sure you do know exactly how your provider will be used, the structure and complexity of your entities, and how Vertica will handle it.
Good luck :)

Related

Potential conflict between transferring data and '.ValueGeneratedOnAdd()'

I apologize if this is duplicative; I could find nothing directly pertaining.
The difficulty involves EF Core (v 3.1.8, if it matters), but is not specific or restricted thereto. I am doing code first, creating a number of entities, but the key point is that I am getting my initial data set from an app that I am trying to replace. My new app has a number of structural differences in every corresponding entity, but the data in the old app is still critical, so I will be transferring it to my new database. (Old db is hosted by MS SQL 2008; new db is hosted by MS SQL 2019, if it matters).
Most of the key fields are GUIDs, and the problem is that in EF Core, at the point in the future when I want to use the new app to do more data entry, I will also want the database to choose the GUID. In EF Core Fluent API parlance, that would be, for example:
modelBuilder.Entity("ReplaceOldApp.Models.Address", b =>
{
b.Property<Guid>("AddressID")
.ValueGeneratedOnAdd()
.HasColumnType("uniqueidentifier");
}
However, if I inform EF Core that I want the database to create the key, then it will create the tables such that when I try to transfer the data from the old database (whether using EF or some other means), the new database will ignore the old GUID and create a new, unrelated one. (Or at least, that's what I think will happen. I'm not ready to try it yet.) If that happens, then all of the data from, say, the old Person entity (such as the above-implied Address entity), will no longer be related between their corresponding entities in the new database, because all records will have shiny new GUIDs. I will have all the information, and no way to actually use it.
Obviously I can tell EF Core to inform the database that it will not be creating the GUIDs, and I can then read, unmunge and transfer the data from the old database to the new without fear of data loss (God willing). But then going forward, for any new data entry, the GUIDs will not be automatically genned. I can of course then mod my IEntityTypeConfiguration Fluent API classes for the various entities and do a second migration, re-genning the affected tables, but I'm worried that EF Core will decide that it needs to DROP the tables to accommodate such a change. (Again, I do not know for sure because I have not tried it: sorry.)
So my question is: How would you approach such a situation? Should I ignore EF and do something clever with MS SQL Studio? Should I do two migrations with a transfer in-between? Should I tell the database, even though it has been told to gen the keys, somehow to accept the old keys without changing things, perhaps via LINQ?
============== Edit:
I'm sure SSIS would work to transfer the data from old to new databases, but the learning curve appears daunting, and I am only trying to solve one problem, not gain a new career. Powershell ditto, although it may be a bit more of a hacker's tool, and as such knowledge of it might assist tweaking or help to solve a diverse set of one-time SQL Server headaches. However, again, as would you, I prefer to use what I know, or failing that, learn or learn more about a tool which promises to serve me consistently into the future.
With the very welcome new (to me) information about IDENTITY_INSERT, and information gained from Linq To Sql and identity_insert, I believe I should not use LINQ to SQL because it may assume that IDENTITY_INSERT is OFF and simply filter out the crucial GUID, failing therefore to provide it to the target server. Rather, it seems I can use C# to produce a series of generated SQL statements, and then run each one on the target server inside a TransactionScope(). Because each such insert will thereby run 'in the same connection', the state of IDENTITY_INSERT will be preserved for that entire insert transaction, and (creek don't rise) it should work.
Again, I appreciate your answer, Randy in Marin. It has, it seems, led me to an approach that will work within the potential constraints of my context (EF Core), while allowing me to preserve the crucial existing IDENTITY information. Peace.

Not being an EF programmer, I don't know if there is an option for identity insert that you can enable for a migration. You might search the term to see if it comes up.
Our team support database migrations. We can do it a number of ways. I would not even consider EF because it's not designed for data migrations - or for database design. (And because we tend to use what we know.)
This is not the way I would do it, but it might be better than SSIS if you have not used SSIS. If the tables are in the same database or in databases on the same server, you can use T-SQL to load each table one at a time. Even if not on the same server, a linked server would allow a distributed transaction. (I avoid linked servers like the plague, but for a one time thing like a migration I would tolerate it. I would rather restore a copy of the source database to the destination server to use as a source. Distributed transactions gone wrong have forced me to reboot critical servers.)
Each table can have a 4 part name. If the server part (e.g., using a linked server name) is not present, the local instance is used. If the database part is not present, the current database is used. This is the format I assume for the "src_table" and "dst_table".
[myserver\myinstance].[mydatabase].[myschema].[mytable]
Each table is loaded with T-SQL as follows:
TRUNCATE TABLE dst_table
SET IDENTITY_INSERT dst_table ON
INSERT dst_table (...) SELECT ... FROM src_table
SET IDENTITY_INSERT dst_table OFF -- must be turned off - only 1 table can have this ON
If there are foreign keys, some tables (e.g., def tables) would need to be loaded first.
If the table does not have an IDENTITY column (EF code creates all values), you don't use the IDENTITY_INSERT stuff. It will fail if you use it and there is not an identity column. It will fail if you don't use it and try to insert into an identity column.
If there is a lot of data in a table, the transaction might be too big or slow. Inserting in batches might be called for.
If it was something to run on a schedule, I would likely create a SSIS package to do the load.
If I wanted to try something new, I would use powershell and the DBATools module cmdlets to see if extracting to csv and importing the csv would be efficient. The import cmdlet has a column mapping parameter, among many others. PowerShell could be used to do transformation, but I think this crosses over into SSIS territory.
I have dealt with migrations where the GUIDs and IDs no longer related after the move. Using queries joining the new data to the old data, we were able to fix the related values. It's likely more work to fix it after than to plan for it to be correct from the start.

Forced usage of stored procedures with ADO.NET Entity Framework

I would like to have your advice.
I'm now developing a small WPF client application using C#, bindings, ADO.Net Entity Framework, ODP.net and an Oracle database.
The application is a small one, two XAML screens, about 15 tables. I was developing using entities by filling my entities through the application and using the SaveChanges method.
However our DBA said me that I don't have the right to make direct access to the but only using stored procedures. I asked him why and he said me that it is a security reason because using stored procedures forces to provide the row identifier when deleting a record in one table.
According him the risk is that the application will maybe delete all the rows in one table instead of only one row if the id is provided througe the stored procedure.
I find that is a lot of overkill for only 15 table.
What do you think about that?

Have you suggested to your DBA that you use Linq to SQL? That way you can extract objects, representing individual rows and it would make it far less likely you would accidentally delete multiple rows.
Personally I think EDM might be overkill for the size of DB.
I should say I'm a big proponent of LINQ to SQL and not a big fan of SPs however....

LINQ2SQL on top of ODP.NET is a great stack. And I agree with Andrew, because you would have to write code to load the records, delete all of them, and commit the changes, it's not exactly something that can happen "easily".
Forgetting a where clause in a LINQ statement is no easier or harder then forgetting a where clause in a stored procedure.

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.

You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).

You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)

If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.

You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.

Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

Auditing record changes in sql server databases

Using only microsoft based technologies (MS SQL Server, C#, EAB, etc) if you needed keep the track of changes done on a record in a database which strategy would you will use? Triggers, AOP on the DAL, Other? And how you will display the collected data? Is there a pattern about it? Is there a tool or a framework that help to implement this kind of solution?

The problem with Change Data capture is that it isn't flexible enough for real auditing. You can't add the columns you need. Also it dumps the records every three days by default (you can change this, but I don't think you can store forever) so you have to have a job dunping the records to a real audit table if you need to keep the data for a long time which is typical of the need to audit records (we never dump our audit records).
I prefer the trigger approach. You have to be careful when you write the triggers to ensure that they will capture the data if multiple records are changed. We have two tables for each table audited, one to store the datetime and id of the user or process that took the action and one to store the old and new data. Since we do a lot of multiple record processes this is critical for us. If someone reports one bad record, we want to be able to see if it was a process that made the change and if so, what other records might have been affected as well.
At the time you create the audit process, create the scripts to restore a set of audited data to the old values. It's a lot easier to do this when under the gun to fix things, if you already have this set up.

Sql Server 2008 R2 has this built-in - lookup Change Data Capture in books online

This is probably not a popular opinion, but I'm going to throw it out there anyhow.
I prefer stored procedures for all database writes. If auditing is required, it's right there in the stored procedure. There's no magic happening outside the code, everything that happens is documented right at the point where writes occur.
If, in the future, a table needs to change, one has to go to the stored procedure to make the change. The need to update the audit is documented right there. And because we used a stored procedure, it's simpler to "version" both the table and its audit table.

Copying data from one oracle database to another oracle database using C#

What is the standard way of copying data from one oracle database to another.
1) Read data from source table and copy to temp table on destination using configuration( i.e. there are more than 1 table and each table has separate temp table)
2) Right now there is no clob data, but in future clob data might be used.
3) Read everything to memory(if large data read in chunks)
Should not use Oracle links
Should not use files
Code should be only using C# but not any database procedures.

One way that I've used to do this is to use a DataReader on the source database and just perform inserts on the target database (using Bind Parameters for sure).
Note that the DataReader is excellent at not using much memory as it moves through a table (I believe that by default it uses a Fast Forward, Read Only cursor). This means that only a small amount of data is held in memory at a given time.
Here are the things to watch out for:
Relationships
If you're working with data that has relationships, you're going to need to deal with that. There are two ways that I've seen to deal with this:
Temporarily drop the relationships in the target database before doing the copy, then recreate them after.
Copy the data in the correct order for the relationships to work correctly (this is usually pretty difficult / inefficient)
Auto Generated Id Values
These columns are usually handled by disabling the auto increment functionality for the given table and allowing identity insert (I'm using some SQL Server terms, I can't remember how it works on Oracle).
Transactions
If you're moving a lot of data, transactions will be expensive.
Repeatability / Deleting Target Data
Unless you're way more awesome than the rest of us, you'll probably have to run this thing more than once (at least during development). That means you might want a way to delete the target data.
Platform Specific Methods
In SQL Server, there are ways to perform bulk inserts that are blazingly fast (by giving up little things like referential integrity checking). There might be a similar feature within the Oracle toolset.
Table / Column Metadata
I haven't had to do this in Oracle yet, but it looks like you can get metadata on tables and columns using the views mentioned here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.