migrating an access multi valued field column to c# - c#

I am attempting to use the Microsoft.ACE.OLEDB.12.0 driver to read data from an access database. came upon an odd situation. one of the columns in the access database shows as a comma delimited list of ids.
Wells
________
345,456,7
6,387
when I looked at the column definition in access I thought it would say string but it does not, it says number. so I guess it is storing an array of integers in a single column?
I'm having a tough time getting a data reader to pick this up.
using
var w = DB_Reader.GetValue(DB_Reader.GetOrdinal("Wells"));
results in the error
The provider could not determine the Object value. For example, the
row was just created, the default for the Object column was not
available, and the consumer had not yet set a new Object value.

Well, at the end of the day, you can think of the mutli-value column as in fact a child table.
So, if you looking to migrate a master and child table, then in YOUR database, you need a relational set of tables to re-create what Access is doing behind the scene.
So, lets take a multi-value example and query.
Say we have this sql query in Access:
SELECT ID, Person_Name, FavorateColors FROM tPerson;
But, "favorite colors" is one of those MV columns. (and I should point out with the HUGE movement towards no-sql databases - they also often work this way also - same for XML or JSON data for that matter. However, be it some XML, JSON or Access mutli-value features? Well, you need that child table if you going to adopt a relational data model to represent this data.
Ok, so we run the above query, and you get this output:
In fact, when I used the lookup wizard - I picked a child table called tblColors.
but, how can we explode the above query to dig out the data?
Change the above query to this:
SELECT ID, Person_Name, FavorateColors.Value FROM tPerson
Note how we added ".value" after the MV column name. Now, when you run the query, you get the SAME result as if you had two tables, and did a left join. The parent table rows will like any relational database simple repeat for each child table value, and you get this:
Note how now the PK value and the row is repeating for each child mv value.
So, you are quite much free to query as per above - you get what amounts to a left joined table, and of course the parent record repeats.
So, just like XML, JSON, or in fact a query or a table of data with repeating parent row, and child rows? Well, you quite much forced to write code to split out this data, or re-normalize the data. This of course is far more common when receiving say JSON/XML data, or in fact often say data from a Excel sheet.
So, you have to process out the child record data, and create a relation for that data.
And thus now our question becomes how can we import JSON/XML/Excel data that really should have used two relational database tables.
So, assuming we want to process this data? You process it the same as for any data you have that should have been two related tables in the first place.
it really depends if this is a one time import, or you have to do this all the time?
If it was a one time deal, then I would use Access, and use a make table query based on the above query. You would in fact have to pluck up the PK ID from the child table. In above there is a child table called colors - we just missing that "junction" table in between that Access automatic created. The hidden tables are not exposed, and thus I would simply use a make table query in access, and then add a FK column that is the PK value from the tblColors.

Related

Insert bulk data into tables that are in a one to many relationship

I have a .NET App connected to a Postgres DB using Npgsql and I am trying to import data into two tables, say Users and Todos. A user has many todos. The User table has an id column that is automatically set by the DB, and the Todos table has a foreign key to the Users table called user_id.
Now, I know how to insert Users, and I know how to insert Todos, but I do not know how to set the user_id for those Todos since the id column from User is only known after the users are inserted into the DB. Any idea?
This depends on how you are importing and which tool you are using. If you are using raw INSERT statements, PostgreSQL has a RETURNING clause which will send you back the ID of the inserted statements (see the docs).
If you are using binary COPY (which is the most efficient way to bulk-import data), there's no such option. This case, one good way is to "allocate" all the ids in one go, by incrementing the sequence backing the ID column, and then sending the IDs when you're importing. This means the database is longer generating those IDs - you're sending them explicitly like any other field.
In practical terms, say you have 100 users (and any number of todos). You can do one call to setval to increment the sequence by 100, and then you can import your users, explicitly setting their IDs to those 100 values. This allows you to also specify the user IDs on the todos. However, if you do this, be mindful of concurrency issues if someone else modifies the sequence at the same time.

How To can Select**DB Table

If I have a database in each table where the ID field and its appropriate function in any field do not take the administrator behavior so that tables and field contents can be fetched until the serial number is unified without duplicate values
Appropriate in this context using except.
Is there a code that can fetch tables either in sql or in the Entity Framework ؟
Eexcept_Admin_except_List
List<int> tempIdList = answeripE.Select(q => q.ID).ToList();
var quslist = db.Qustion.Where(q => !tempIdList.Contains(q.ID));
\Thanks for the creator of "daryal" Get All Except from SQL database using Entity Framework
I need to do this without asking for each table and querying it. And also request SQL from the database as a whole without exception such as
select*
IDfield
FROM
MSDB_Table T
WHERE
T.id == MaxBy(T.OrderBy(x => x.id);
can replace "where TABLE1.id 'OR' Table2.id" decode all the tables and give a result.
All I'm looking forward to is that I can query one database on a whole, get it on a list without the use of tables or a composite key because it serves me in analyzing a set of data converted to other data formats, for example when representing a database in the form of JSON There are a lot of them on more than one platform and in a single database and to avoid the repetition of the data I need to do this or a comprehensive query may be compared or to investigate or like Solver Tool in Excel, so far did not get the answer to show me the first step is because it does not exist originally or because it is not possible?
If you want Entity Framework to retrieve all columns except a subset of them, the best way to do that is either via a stored procedure or a view. With a view you can query it using LINQ and add your predicates in code, but in a stored procedure you will have to write it and feed your predicate conditions into it...so it sounds like a view would be better for you.
Old example, but should guide you through the process:
https://www.mssqltips.com/sqlservertip/1990/how-to-use-sql-server-views-with-the-entity-framework/

Enum Vs Inner Join / Where

I have defined various text value by int. I store int value in data table for better and fast search. I have three options to display text value:
I declare Enum in my codes and display text value according to int value. It is static and I have to change code if new values is to be added.
To make it dynamic, I can store int and text value in a table which is in another database and admin own it. New values can be updated by admin in this table. I use inner join to display text value whenever a record is fetched.
I store actual text in respective data table. This will make search slow.
My question is which option is best to use under following condition?
Data table has more than records between 1 and 10 millions.
There are more than 5000 users doing fetch, search, update process on table.
Maximum text values are 12 in number and length (max) 50 char.
There are 30 data tables having above conditions and functions.
I like combination of option #2 and option #1 - to use int's but have dictionary table in another database.
Let me explain:
to store int and text in a table which is in another database;
in origin table to store int only;
do not join table from another database to get text but cache dictionary on client and resolve text from that dictionary
I would not go for option 1 for the reason given. Enums are not there as lookups. You could replace 1 with creating a dictionary but again it would need to be recompiled each time a change is made which is bad.
Storing text in a table (ie option 3) is bad if it is guaranteed to be duplicated a lot as here. This is exactly where you should use a lookup table as you suggest in number 2.
So yes, store them in a database table and administer them through that.
The joining shouldn't take long to do at all if it is just to a small table. If you are worried though an alternative might be to load the lookup table into a dictionary in the code the first time you need it and look up the values on the code from your small lookup table. I doubt you'll have problems with just doing it by the join though.
And I'd do this approach no matter what the conditions are (ie number of records, etc.). The conditions do make it more sensible though. :)
If you have literally millions of records, there's almost certainly no point in trying to spin up such a structure in server code or on the client in any form. It needs to be kept in a database, IMHO.
The query that creates the list needs to be smart enough to constrain the count of returned records to a manageable number. Perhaps partitioned views or stored procedures might help in this regard.
If this is primarily a read-only list, with updates only done in the context of management activities, it should be possible to make queries against the table very rapid with proper indexes and queries on the client side.

Importing CSV data into application DB maintaining foreign key consistence

In my ASP.NET web app I'm trying to implement an import/export procedure to save or insert data in the application DB. My procedure generates some CSV files: one for each table.
Obviously there are relations between some of these tables and when I import CSV in my DB I'd like to maintain association between rows.
Say I have Table1 and Table2 with Table2 that has a foreign key to Table1. So I could have a row in Table1 with ID = 100 and a row in Table2 with Table1_ID = 100.
When I import CSV with Table1 data, new IDs are generated for Table1 rows, how can I maintain consistency of the foreign keys in Table2 when I import the corresponding CSV file?
I'm using Linq-to-SQL to retrieve data from DB... using DataSet and DataTable can help me?
NOTE I'd like to permit cumulative import, so when I import a CSV file there may already be data in the DB. So I cannot use 'Set Identity OFF'.
Add the items of Table1 first, so when you add the items of Table2 there are the corresponding records of Table1 already in the database. For more tables you will have figure out the order. If you are creating a system of arbitrary database schema, you will want to create a table graph (where each node is a table and each arc is a foreign key) in memory [There are no types for that in the base library] and then convert it to a tree such that you get the correct order by traversing the tree (breadth-first).
You can let the database handle the cases where there is a violation of the foreign key, because there is not such field. You will have to decide if you make a transaction of the whole import operation, or per item.
Although analisying the CSVs before hand is possible. To do that, you will want to store the values for the primary key of each table [Use a set for that] (again, iterate over the tables in the correct order), and then when you are reading a table that has a foreign key to a table that you have already read you can check if the key is there, also it will help you yo detect any possible duplicate. [If you have things already in the database to take into account, you would have to query too... although, take care if the database is in an active system where records could be deleted while you are still deciding if you can add the CSVs without problem].
To address that you are generating new IDs when you add...
The simplest solution that I can think of is: don't. In particular if it is an active system, where other requests are being processed, because then there is no way to predict the new IDs before hand. Your best bet would be to add them one by one, in that case, you will have to think your transaction strategy accordningly... it may be the case that you will not be able to roll back.
Although, I think your question is a bit deeper: If the ID of the Table1 did change, then how can I update the corresponding records in the Table2 so they point to the correct record in Table1?
To do that, I want to suggest to do the analysis as I described above, then you will have a group of sets that will works as indexes. This will help you locate the records that you need to update in Table2 for each ID in Table1. [It is also important to keep track if you have already updated a record, and don't do it twice, because it may happen the generated ID match an ID that is yet to be sent to the database].
To roll back, you can also use those sets, as they will end up having the new IDs that identify the records that you will have to pull out of the database if you want to abort the operation.
Edit: those sets (I recommend hashset) are only have the story, because they only have the primary key (for intance: ID in Table1). You will need bags to keep the foreing keys (in this case Table1_ID in Table2).

Best way of acquiring information from several database tables

I have a medical database that keeps different types of data on patients: examinations, lab results, x-rays... each type of record exists in a separate table. I need to present this data on one table to show the patient's history with a particular clinic.
My question: what is the best way to do it? Should I do a SELECT from each table where the patient ID matches, order them by date, and then keep them in some artificial list-like structure (ordered by date)? Or is there a better way of doing this?
I'm using WPF and SQL Server 2008 for this app.
As others have said, JOIN is the way you'd normally do this. However, if there are multiple rows in one table for a patient then there's a chance you'll get data in some columns repeated across multiple rows, which often you don't want. In that case it's sometimes easier to use UNION or UNION ALL.
Let's say you have two tables, examinations and xrays, each with a PatientID, a Date and some extra details. You could combine them like this:
SELECT PatientID, ExamDate [Date], ExamResults [Details]
FROM examinations
WHERE PatientID = #patient
UNION ALL
SELECT PatientID, XrayDate [Date], XrayComments [Details]
FROM xrays
WHERE PatientID = #patient
Now you have one big result set with PatientID, Date and Details columns. I've found this handy for "merging" multiple tables with similar, but not identical, data.
If this is something you're going to be doing often, I'd be tempted to create a denormalized view on all of patient data (join the appropriate tables) and index the appropriate column(s) in the view. Then use the appropriate method (stored procedure, etc) to retrieve the data for a passed-in patientID.
Use a JOIN to get data from several tables.
You can use a join (can't remember which type exactly) to get all the records from each table for a specific patient. The way this works depends on your database design.
I'd do it with separate SELECT statements, since a simple JOIN probably won't do due to the fact that some tables might have more than 1 row for the patient.
So I would retrieve multiple result-sets in a simple DataSet, add a DalaRelation, cache the object and query it down the line (by date, by exam type, subsets, ...)
The main point is that you have all the data handy, even cached if needed, in a structure which is easily queried and filtered.

Categories