I have a database with many tables and constraints (but not much data). The database contains a few separate entities that are bound together by an ID directly or indirectly, as illustrated below:
My target is to move one entire slice of data (including data from all tables in the database) to another physical database in an easy and safe way. It's OK if it doesn't perform very well. In the above example, I would want to move the company with a certain Id as well as all employees of that company and all data related to the employees etc. through all the tables.
I want to do it with a safe compile-checked method, as I want to catch errors whenever I change my database.
The IDs in the database are mostly guids, but there are a few tables using auto incremented IDs.
note
The "Companies" table contains perhaps 5 rows, one for each company. I need to move ONE row from that table, along with all data directly or indirectly related to that row.
Suppose you want to copy data from from a detailsview(tableName=Jobs) to another table(tablename=Company)
string apply = "INSERT INTO Company (JobTitle,CompanyName) select JobTitle,CompanyName from Jobs";
this is just an idea hope it help.
UPDATE :
So this will help you
MSDN - Multiple Bulk Copy Operations (ADO.NET)
With example
Related
I have a view which I've created by joining several tables whose records can be changed so the content of the columns of the view can also be changed.
Columns of the view contain data like address,random numbers,date,some random string etc.
I'm accepting search text from user and returns rows if any of its column contain text entered by the user.
My view have millions of records so normal like query won't work(takes long time) ?
What is the most efficient way to search this view as it changes as its tables get changed ?
I'm using oracle database, C#, entityframework.
For better performance you should properly add index in the original table .. these indexes are automatically refreshed by rdbms engine on each change .. so is impossible that you obtain wrong data by the index value .. the index value and the table data contain the same values..
You don't need to reindex every time ... sometimes (monthly) you can updated the related statistcs ..
so the index can change you performance in better a lot .. and this also for the view
The view in create on the top of the original table on fly and is not a stored copy of the original tables .. so the indexes help the view to render more fastly the expected result ..
the indexes Indexes when properly designed, serve for important purposes in a database server:
They let the rdbms
find groups of adjacent rows instead of single rows.
avoid sorting by reading the rows in a desired order.
let the server satisfy (sometimes) entire queries from the index alone, avoiding (when possible) the need to access the table at all.
from mysql https://dev.mysql.com/doc/refman/5.5/en/mysql-indexes.html
https://dev.mysql.com/doc/refman/5.5/en/column-indexes.html
https://dev.mysql.com/doc/refman/5.5/en/multiple-column-indexes.html
http://code.tutsplus.com/tutorials/top-20-mysql-best-practices--net-7855
http://use-the-index-luke.com
looking for examples/tutorial for custom user fields, not via EAV
EAV is going to be problematic for various reasons such as performance
there are many base entities/tables with over 100000 records each
there will likely be over a dozen attributes
the records are to be displayed in a flat ui grid incl. custom fields so flattening them would be an issue while maintaining performance
Looking at enabling this via DDL where all custom fields would go into a matching table such as
<tablename>_custom_<userid>
and all user attributes would map to a column each and all their metadata stored in a metadata table
the retrieval would be simpler where the query would simply be
select *
from <tablename> A, tableName_custom_userid B
where B.KeyField = A.KeyField --( perhaps using outer join, haven't gone that far yet )
Wondering if there are any gotchas down the road that i need to be aware of ?
of course any samples/pointers would be helpful to kickstart the effort
specifically would appreciate any advice on using DDL for Sql Server compact 4
One technique I have seen used is to use a sort of 'hard-coded' EAV pattern. Don't hang up! It worked well with the dataset sizes you were talking about and didn't actually use EAV - it was only EAV-esque.
The idea is to have a set of tables to store these custom attributes within it, with some triggers (described below) on them. The custom attributes tablesets store metadata about the attribute (what table it goes with, data type, constraints, etc). You can get very fancy with this but I did not haev the need.
The triggers on your meta-tables are there to re-generate views that rollup base+extension into first class objects within the DB. So instead of table person + employee extension table, you have an employee view that includes both. When you drop a new value into the custom attributes tables, the triggers will re-roll the views and include the new stuff. If you wanted to go nuts, you could also have the triggers re-write stored procedures as well. Depending on how your mid-tier code is structured, you would still be forced to re-code some, however this would be the case anyway should you be applying rules that read the data.
In testing, I found that for the relatively small # of records you're talking about, performance was somewhat slower but followed roughly the same pattern of degradation (2x the number of records, ~2x as slow).
-- edits --
How I saw it done, you had a table that represented your first class objects, so a row for 'person' and a row for 'employee,' etc. We'll call that FCO. Then you had a secondary table that stored what tables represented the FCO. We'll call that Srcs.. For person, there would be one row, which is the person table. For Employee, there would be two rows, the person table and the Employee extension. There is a third table, called Attribs, which stores the columns from the tables that constitute the FCO. For simplicity, we'll say Employee has ID, Name and Address, and Employee has Hire Date and Department, and obviously PersonID referring back to Person table. So, 2 rows in FCO table (person and employee), 3 rows in Src table, 8 rows in Attribs.
The view, we'll call it vw_Employee, selects PersonID, Name, Address, Hire Date, Department from the two tables. It is built by a SQL stored procedure we'll call OnMetadataChange.
This SP is fired (by trigger or batch process), and its purpose is to generate the CREATE VIEW statements. It will iterate through every First Class Object, collect which fields from which tables constitute the view, and will issue a CREATE statement based on that. So OnMetadataChange produces a DROP and CREATE for each view, it generates a dynamic SQL statement that is executed once per entry in FCO table. It is preferable to do this with Triggers but not necessary. Hopefully your FCO definitions won't change too often, and when they do, there will probably be a code release as well. You can run your OnMetadataChange SP at that time.
The end result is a 2-layer database. The views constitute the First Class Object layer, which is meaningful to the application. The application only uses views. The tables constitute the 'physical' layer, which the application shouldn't care about. The meta-tables are essentially your mapping between the FCO layer and the physical layer. It takes some time to set it up, but it's quite effective, and gives you many of the benefits of EAV, while at the same time giving you the concrete benefits of 3nf tables (indexability, etc).
If you'd like I can throw some sample SQL out there.
Part of the problem you are having is that you are trying to store schema-less data in a SQL database, which is not its strength. There are three approaches that would make your life far easier:
1) Have a column which stores the serialized custom fields, with whatever format is mst convenient. For example, this column could store xml. Upsides are that you can use SQL Server Compact and pulling back a record is trivial. Downsides are that you always have to pull/push the entire xml blob to do an update, and it is difficult to impossible to query on any custom fields.
2) Upgrade to SQL Server Express, and use XML columns. This is nearly the same as the first suggestion, except that any server ready version of SQL Server has native support for XML data. These columns can have indexes added and fields within the data can be used in queries.
3) Use a Schema-less Database, like MongoDB or CouchDB. These databases are all about storing schemaless data, so your custom fields will be no different than any other field. As such, you can index and query custom fields. Upsides are that custom data is incredibly easy to work with, downsides are that you would have to spend some time rethinking how you store data to fit within their model.
If you do not need to query based on custom fields, or if you can query custom fields within business logic, then the first option can work for you. In any other case, I would err towards something with more capabilities than compact. If cost is the deciding factor, both SQL Server Express and MongoDB are free.
I am developing an HRM application to import and export xml data from database. The application receives exported xml data for the employee entry. I imported the xml file using linq to xml, where I converted the xml into respective objects. Then I want to attach (update) the employee objects.
I tried to use
//linqoper class for importing xml data and converts into IEnumerable employees object.
var emp = linqoper.importxml(filename.xml);
Using (EmployeedataContext db = new EmployeedatContext){
db.attachAllonSubmit(emp);
db.submitchange();
}
But I got error
“An entity can only be attached as modified without original state if it declares as version member or doesn't have an update check policy”.
I have also an option to retrieve each employee, and assign value to the new employee from xml data using this format.
//import IEnumerable of Employee objects
var employees = = linqoper.importxml(filename.xml)
using(Employeedatacontext db = new Employeedatacontext){
foreach(var empobj in employees)
{
Employee emp = db.Employee.where(m=>m.id==empobj.Id);
emp.FirstName=empobj.FirstName;
emp.BirthDate=empobj.BirthDate;
//….continue
}
db.submitChanges();
}
But the problem with the above is I have to iterate through the whole employee objects, which is very tiresome.
So is there any other way, I could attach (update) the employee entity in the database using LINQ to SQL.
I have seen some similar links on SO, but none of them seems to help.
https://stackoverflow.com/questions/898267/linq-to-sql-attach-refresh-entity-object
When linq-to-sql saves the changes to the database, it has to know properties of the object has been changed. It also checks if a potentially conflicting update to the database have been done during the update (optimistic concurrency).
To handle those cases LINQ-to-SQL needs two copies of the object when attaching. One with the original values (as present in the DB) and one with the new, changed values. There is also a more advanced mechanism involving a version member which is mapped to a rowversion column.
The linq-to-sql way to update a set of data is to first read all data from the database, then update the objects retrieved form the database and finally call SubmitChanges(). That would be my first approach in your situation.
If you experience performance problems, then it's time to go outside of linq-to-sql's toolbox. A solution with better performance is to load the new data into a separate staging table (for best performance, use bulk insert). Then run a SQL command or Stored Procedure that does the actual merging of data. The SQL Merge clause is excellent for this kind of updates.
LINQ to SQL is proper ORM, but if you want to take control of create/update/delete in your hand; than you can try some simple ORMs which just provide ways to do CRUD operations. I can recommend one http://crystalmapper.codeplex.com, it is simple yet powerful.
Why CrystalMapper?
I built this for large financial transaction system with lots of insert and update operations. What I need is speed and control of insert/update serving complex business scenarios ... hitting multiple tables just for one transaction.
When I put this to use in social text processing platform, it serves very well there too.
i have a countries list. Each user can check multiple countries. Once saved, this "user country list" will be used to get whether other users fit into countries certain user chose.
Question is what would be the most efficient approach to this problem...
I have one, one to save user selection as delimited list like Canada,USA,France ... in single varchar(max) field but problem with it would be that once user from Germany enters page i perform this check on. To search for Germany i would be needed to get all items and un-delimit each field to check against value or to use sql 'like' which again is pretty damn slow..
If you have better solution or some tips i would be glad to hear.
Just to make sure, many users will have their own selections of countries from which and only they want to have users to land on their page. While millions of users will reach those pages. So the faster approach will be the better.
technology, MSSQL and ASP.NET
thanks
You should not store a list of values in one cell. Consider having a separate table that stores each of the selected countries with a foreign key reference to the user table. This is standard Database Normalization.
PLEASE don't go down the route you're thinking of, storing multiple entries in one field. I've had to re-write more applications because of bad database design than for any other reason, and that is a bad design.
Added
I have this poster on my wall at work: http://www.informationqualitysolutions.com/FreeStuff/rettigNormalizationPoster.pdf
One of my predecessors was a newbie to DB Design, and this helped her a lot. I keep it for any new hires that may need it. It explains normalization very nicely, with examples.
Do not save delimited fields into your database. Your database will not be normalized.
You need a many-to-many table for users and countries:
UserId
CountryId
If you do start using a delimited field, you end up needing to parse it (either in SQL or your Code). It is more difficult to query and optimize.
In this case, you want will want to create a table called UserCountries (or some such) which would store the UserID and CountryID. This is a standard relational construct. To beginners, it seems strange and too involved, but this structure makes it very easy and very fast to write flexible queries against this type of data. No delimiting required!
I think it would be better to use a UserCountry table, which contains a link to the User and the Country table. This creates a lot more possibilities to query against the database. Example queries that are much simpler this way:
Number of Countries per user
All users which selected a particular country
Sort all popular countries
Do not store multiple countries in a single field. Add 2 additional tables - Countries (ID, Name) and UserCountries (UserID, CountryID)
I have a project that requires user-defined attributes for a particular object at runtime (Lets say a person object in this example). The project will have many different users (1000 +), each defining their own unique attributes for their own sets of 'Person' objects.
(Eg - user #1 will have a set of defined attributes, which will apply to all person objects 'owned' by this user. Mutliply this by 1000 users, and that's the bottom line minimum number of users the app will work with.) These attributes will be used to query the people object and return results.
I think these are the possible approaches I can use. I will be using C# (and any version of .NET 3.5 or 4), and have a free reign re: what to use for a datastore. (I have mysql and mssql available, although have the freedom to use any software, as long as it will fit the bill)
Have I missed anything, or made any incorrect assumptions in my assessment?
Out of these choices - what solution would you go for?
Hybrid EAV object model. (Define the database using normal relational model, and have a 'property bag' table for the Person table).
Downsides: many joins per / query. Poor performance. Can hit a limit of the number of joins / tables used in a query.
I've knocked up a quick sample, that has a Subsonic 2.x 'esqe interface:
Select().From().Where ... etc
Which generates the correct joins, then filters + pivots the returned data in c#, to return a datatable configured with the correctly typed data-set.
I have yet to load test this solution. It's based on the EA advice in this Microsoft whitepaper:
SQL Server 2008 RTM Documents Best Practices for Semantic Data Modeling for Performance and Scalability
Allow the user to dynamically create / alter the object's table at run-time. This solution is what I believe NHibernate does in the background when using dynamic properties, as discussed where
http://bartreyserhove.blogspot.com/2008/02/dynamic-domain-mode-using-nhibernate.html
Downsides:
As the system grows, the number of columns defined will get very large, and may hit the max number of columns. If there are 1000 users, each with 10 distinct attributes for their 'Person' objects, then we'd need a table holding 10k columns. Not scalable in this scenario.
I guess I could allow a person attribute table per user, but if there are 1000 users to start, that's 1000 tables plus the other 10 odd in the app.
I'm unsure if this would be scalable - but it doesn't seem so. Someone please correct me if I an incorrect!
Use a NoSQL datastore, such as CouchDb / MongoDb
From what I have read, these aren't yet proven in large scale apps, based on strings, and are very early in development phase. IF I am incorrect in this assessment, can someone let me know?
http://www.eflorenzano.com/blog/post/why-couchdb-sucks/
Using XML column in the people table to store attributes
Drawbacks - no indexing on querying, so every column would need to be retrieved and queried to return a resultset, resulting in poor query performance.
Serializing an object graph to the database.
Drawbacks - no indexing on querying, so every column would need to be retrieved and queried to return a resultset, resulting in poor query performance.
C# bindings for berkelyDB
From what I read here: http://www.dinosaurtech.com/2009/berkeley-db-c-bindings/
Berkeley Db has definitely proven to be useful, but as Robert pointed out – there is no easy interface. Your entire wOO wrapper has to be hand coded, and all of your indices are hand maintained. It is much more difficult than SQL / linq-to-sql, but that’s the price you pay for ridiculous speed.
Seems a large overhead - however if anyone can provide a link to a tutorial on how to maintain the indices in C# - it could be a goer.
SQL / RDF hybrid.
Odd I didn't think of this before. Similar to option 1, but instead of an "property bag" table, just XREF to a RDF store?
Querying would them involve 2 steps - query the RDF store for people hitting the correct attributes, to return the person object(s), and use the ID's for these person object in the SQL query to return the relational data. Extra overhead, but could be a goer.
The ESENT database engine on Windows is used heavily for this kind of semi-structured data. One example is Microsoft Exchange which, like your application, has thousands of users where each user can define their own set of properties (MAPI named properties). Exchange uses a slightly modified version of ESENT.
ESENT has a lot of features that enable applications with large meta-data requirements: each ESENT table can have about ~32K columns defined; tables, indexes and columns can be added at runtime; sparse columns don't take up any record space when not set; and template tables can reduce the space used by the meta-data itself. It is common for large applications to have thousands of tables/indexes.
In this case you can have one table per user and create the per-user columns in the table, creating indexes on any columns that you want to query. That would be similar to the way that some versions of Exchange store their data. The downside of this approach is that ESENT doesn't have a query engine so you will have to hand-craft your queries as MakeKey/Seek/MoveNext calls.
A managed wrapper for ESENT is here:
http://managedesent.codeplex.com/
In a EAV model you don't have to have many joins, as you can just have the joins you need for the query filtering. For the resultset, return property entries as a separate rowset.
That is what we are doing in our EAV implementation.
For example, a query might return persons with extended property 'Age' > 18:
Properties table:
1 Age
2 NickName
First resultset:
PersonID Name
1 John
2 Mary
second resultset:
PersonID PropertyID Value
1 1 24
1 2 'Neo'
2 1 32
2 2 'Pocahontas'
For the first resultset, you need an inner join for the 'age' extended property
to query the basic Person object entity part:
select p.ID, p.Name from Persons p
join PersonExtendedProperties pp
on p.ID = pp.PersonID
where pp.PropertyName = 'Age'
and pp.PropertyValue > 18 -- probably need to convert to integer here
For the second resultset, we are making an outer join of the first resultset with PersonExtendedProperties table to get the rest of the extended properties. It's a 'narrow' resultset, we do not pivot the properties in sql, so we don't need multiple joins here.
Actually we use separate tables for different types to avoid data type conversion, to have extended properties indexed and easily queriable.
My recommendation:
Allow properties to be marked as indexable. Have a smallish hard limit on number of indexable properties, and on columns per object. Have a large hard limit on total column types in all objects.
Implement indexes as separate tables (one per index) joined with main table of data (main table has large unique key for object). (Index tables can then be created/dropped as required).
Serialize the data, including the index columns, plus put the index propertoes in first class relational columns in their dedicated index tables. Use JSON instead of XML to save space in the table. Enforce short column name policy (or long display name and short stored name policy) to save space and increase performance.
Use quarks for field identifiers (but only in the main engine to save RAM and speed some read operations -- don't rely on quark pointer comparison in all cases).
My thought on your options:
1 is a possible. Performance clearly will be lower than if field ID columns not stored.
2 is a no in general DB engines not all happy about dynamic schema changes. But a possible yes if your DB engine is good at this.
3 Possible.
4 Yes though I'd use JSON.
5 Seems like 4 only less optimized??
6 Sounds good; would go with if happy to try something new and also if happy about reliability and performance but usually would want to go with more mainstream technology. I'd also like to reduce the number of engines involved in coordinating a transaction to less then would be true here.
Edit: But of course though I've recommened something there can be no general right answer here -- profile various data models and approaches with your data to see what runs best for your application.
Edit: Changed last edit wording.
Assuming you an place a limit, N, on how many custom attributes each user can define; just add N extra columns to the Person table. Then have a separate table where you store per-user metadata to describe how to interpret the contents of those columns for each user. Similar to #1 once you've read in the data, but no joins needed to pull in the custom attributes.
For a problem similar to your problem, we have used the "XML Column" approach (the fourth one in your survey of methods). But you should note that many databases (DBMS) support index for xml values.
I recommend you to use one table for Person which contains one xml column along with other common columns. In other words, design the Person table with columns that are common for all person records and add a single xml column for dynamic and differing attributes.
We are using Oracle. it supports index for its xml-type. Two types of indices are supported: 1- XMLIndex for indexing elements and attributes within an xml, 2- Oracle Text Index for enabling full-text search in text fields of the xml.
For example, in Oracle you can create an index such as:
CREATE INDEX index1 ON table_name (XMLCast(XMLQuery ('$p/PurchaseOrder/Reference'
PASSING XML_Column AS "p" RETURNING CONTENT) AS VARCHAR2(128)));
and xml-query is supported in select queries:
SELECT count(*) FROM purchaseorder
WHERE XMLCast(XMLQuery('$p/PurchaseOrder/Reference'
PASSING OBJECT_VALUE AS "p" RETURNING CONTENT)
AS INTEGER) = 25;
As I know, other databases such as PostgreSQL and MS SQL Server (but not mysql) support such index models for xml value.
see also:
http://docs.oracle.com/cd/E11882_01/appdev.112/e23094/xdb_indexing.htm#CHDEADIH