PetaPoco over multiple databases

PetaPoco over multiple databases - c#

Don't ask why but there are four databases. One of which I have rights to modify the schema. Let's call it external. Again, it's a legacy deal but there are about 60 tables in one of the other three databases, called main. Each record in those tables has a field that links it to a record in a corresponding table in external.
PetaPoco will make quick work of a lot of the trouble. Tentatively, I've tried multiple Database.tt files to manipulate all four databases. Is there a better way?
Should I create synonyms or views in external that refer to the goods in the other databases? And then only use one Database.tt on external?
Is a combined POCO for the linked tables reasonable?

The Database.tt is only used to pre-generate some poco out of your schema. I can hardly believe you are going to leave it there without modification. Normally I would start there and change to make more reasonable linked (with property complex properties for linked tables)
As to linked table queries, as they must be executed in 1 query, thus you have to only keep connection to only 1 db, thus a linked table is necessary. But be ware of low performances. Cross database table joins can sometime be 10 times slower than local joins, depending on sqls. If you have nested select cross multiple db tables, better to make temp table to avoid performance issue.

Related

linq2sql C#: How to query from a table with changing schema name

I have a webservice which tries to connect to a database of a desktop accounting application.
It have tables with same name but with different schema names such as:
[DatabaseName].[202001].[CustomerCredit]
[DatabaseName].[202002].[CustomerCredit]
.
.
.
[DatabaseName].[202014].[CustomerCredit]
[DatabaseName].[202015].[CustomerCredit]
[DatabaseName].[202016].[CustomerCredit]
...
..
[DatabaseName].[2020xx].[CustomerCredit]
Schema name is in format [Year+IncrementalNumber] such as [202014], [202015],[202016] and etc.
Whenever I want to query customer credit information in database, I should fetch information from schema with biggest number such as [DatabaseName].[202016].[CustomerCredit] if 202016 is latest schema in my db.
Note:
Creation of new schema in accounting application database have no rules and is completely decided by user of accounting application and every instance of application installed on different place may have different number of schemas.
So when I'm developing my webservice I have no idea to connect to which schema prior to development. In run-time I can find correct schema to query from its tables but I don't know how to manage to fetch table information with correct schema name in query.
I ususally creat a linq-to-sql dbml class and use its definitions to read information from db but I don't know how to manage schema change in this way?
DBML designer manage Scehma names like this:
[global::System.Data.Linq.Mapping.TableAttribute(Name="[202001].CustomerCredit")]
However since my app can retrieve schema name in run time, I don't know how to fix table declaration in my special case.
It is so easy to handle in ADO.NET but I don't know its equivalent in Linq2SQL:
select count(*) from [" + Variables.FinancialYearSchemaName + "].CustomerCredit where SFC_Status = 100;

Ultimately, no: most ORMs do not expect the schema change to vary at runtime, so most - including EF and LINQ-to-SQL do not support this scenario. One possible option would be to have different connection strings, each with different user accounts, that each has a different default schema configured at the database - and intialize your DB-context with a connection-string or connection that matches the required account. Then if EF asks the RDBMS for [CustomerCredit], it will look first in that account's schema ([202014].[CustomerCredit]). You should probably avoid having a [202014].[CustomerCredit] in that scenario, to prevent confusion. This is, however, a pretty hacky and ugly solution. But... it should work.
Alternatively, you would have to take more control over the data access, essentially writing your own SQL (presumably with a token replacement for the schema, which has problems of its own).

That schema is essentially a manual partitioning of the CustomerCredit table. The best solution would one that makes partitioning transparent to all users. The code shouldn't know how the data is partitioned.
Database Solutions
The benefit of database solutions is that they are transparent or almost transparent to users and require minimal maintenance
Table Partitioning
The clean solution would be to use table partitioning, making the different partitions transparent to all users. Table partitioning used to be an Enterprise-only feature but it became available in all editions since SQL Server 2016 SP1, even Express. This means it's free in all versions still in mainstream support.
The table is partitioned based on a function (eg a date based function) and stored in different files. Whenever possible, the query optimizer can check the partition boundaries and the query conditions and use only the file that contains the relevant data. Eg in a date-partitioned table, queries that contain a date filter can search only the relevant partitions.
Partitioned views
Another option, available since 2000 at least, is to use partitionend views, essentially a UNION ALL view that combines all table partitions, eg :
SELECT <select_list1>
FROM [202001].[CustomerCredit]
UNION ALL
SELECT <select_list2>
FROM [202002].[CustomerCredit]
UNION ALL
...
SELECT <select_listn>
FROM Tn;
EF can map entities to views instead of tables. If the criteria for updatable views are met, the partitioned view itself will be updatable and any modifications will be made to the correct table.
The query optimizer can take advantage of CHECK constraints on the tables to search only one table at a time, similar to how partitioned tables work.
Code solutions
This requires raw SQL queries, and a way to identify the correct table/schema each time a change is made. It requires modifications to the application each time the table partitioning changes, whether those are code modifications, or changes in a configuration file.
In all cases, one query can only read from one table at a time
Keep ADO.NET
One possibility is to keep using ADO.NET, replacing the table/schema name in a query template. The code will have to map to objects if needed, the same way it already did.
EF Raw SQL
Another, is to use EF's raw SQL features, eg EF Core's FromSqlRaw to query from a specific table , the same way ADO.NET would. The benefit is that EF will map the query results to objects. In EF Core, the raw query can be combined with LINQ operators :
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit]"
var credits = context.CustomerCredits
.FromSqlRaw(query)
.Where(...)
.ToList();
Dapper
Another option is to use Dapper or another micro-ORM with an ad-hoc query, similar to ADO.NET, and map the results to objects:
var query=$"select * from [DatabaseName].[{schemaName}].[CustomerCredit] where customerID=#ID";
var credits=connection.Query<CustomerCredit>(query,new {ID=someID});

How to deal with master tables in asp. Net Mvc?

I am working on an Asp. NET MVC(Angular JS) project which is targeted for 500 users, we are not using entity framework as we thought it is not an option for such a large number of users and costs performance. we are creating Data Access Layer separately. I have multiple master tables around 40 like inventory, category, error type..etc.
My Question is for retrieving each transaction record we may need to refer 10 to 15 master tables. SQL joins will make the query complex and will definitely hit at least 10 master tables which costs performance?
Is there any way to avoid this?

SQL JOINS should not add that much overhead if the foreign keys are indexed correctly. I have queries with more than 20 joins and takes less than a second to run.
If the queries are getting too complex you may want to re-consider your database design. If this is not an option, it would be worth looking at indexing the right fields.

I would suggest go with D-Normalization. You need to create indexed views. They are good in performance.

SQL Server and Entity Framework - Dynamic Columns

I use SQL Server and Entity Framework as ORM.
Currently I have a table Product which contains all products of any kind. The different kinds of products possess different attributes.
For example:
All products of kind TV have attributes title, resolution and contrast
Where as all products of kind Car have attributes like model and horsepower
Based on this scenario I created a table called Attribute which contains all the attributes of a product.
Now to fetch a product from database I always have to join all the attributes.
To insert a product I have to insert all the attributes one by one as single rows.
The application is not just a shop or anything like it. It should be possible to add/remove an attribute to/from a kind of product on the fly without changing the db.
But my questions to you is still:
Is this a bad design?
Is there another way of doing it?
Will my solution slow down significant? (f.e. an insert takes several seconds assumed the product has hundreds of attributes...)
Update
The problem is that my application is really complex. There are a lot of huge algorithms. The software is used for statistical purposes.
One problem for example is the following one: In an algorithm-table I'm storing which attributes are used for filters. Say an administrator wants to filter all cars that have less than 100 horsepowers. The filters are dynamical, what means that I have a filter table which stores the filter type (lessThan) and the attribute (horsepowers). How can I keep this flexibility with the suggested approaches (with "hardcoded" columns)?

There is a thing about EF that I don't think everybody is aware of when designing the relations.
When you query something, EF (at least <= 4) wants to create a single SELECT for that query.
What that implies is that if you have entity A, that have a one-to-many relationship to entity B (say Item to Attributes) then EF joins the two together such there will be a returned row for all dependent Bs for each A. If A have many properties, multiple dependencies or even worse if B has many sub-dependencies, then the returned table will be quite massive, since all A-properties will be copied for each row of dependent B. Over time, when your entity models grow in complexity, this can turn into a real performance problem.
EF only includes the Bs if you explicitly tell to it to eager load the dependencies "include"s. If the includes are omitted, your stuff will initially load faster, but once you access your attributes, they will be lazy-loaded by EF. This is known as the SELECT N+1 problem (each A will require N times B-lazy queries, which can be a huge overhead).
While this is not a straight answer to your question, it is something to consider when designing your tables.
Also note, that EF supports several alternatives for base-classing. One strategy is to have a common table, that automatically joined together with the sub-entities. The alternative, which typically performs better, but is harder to upgrade, is to have one table with a super-set of all properties of all sub-classes.
More (over) generalized database design considerations:
The devil is in the details. You can make a whole career out of making good database design choices. There is no silver bullet database patterns.
EF comes with a lot of limitations. This is the price for the convenience. If the model suits EF well, then EF is quite good, but do consider more flexible alternatives like NHibernate. Sometimes even plain old data tables with views and stored procedures are to be preferred.
EF is not efficient if your model has a lot of small dependents (like a ton of attributes to an item table). It will result in either a monster query and return table or the select n+1 problem. You can write some tricky multi-part LINQ queries to somewhat compensate, but it is tricky.
SQL's strength is in integrity and reporting which works best for rather rigid data models.
Depending on the details, your model looks like a great candidate for a NoSql backend, like RavenDb and MongoDb. NoSql is much better for dynamic datamodels and scale really well.

Should I persist consolidated sums in a separate table?

I am developing a C# application working with millions of records retrieved from a relational database (SQL Server). My main table "Positions" contains the following columns:
PositionID, PortfolioCode, SecurityAccount, Custodian, Quantity
Users must be able to retrieve Quantities consolidated by some predefined set of columns e.g. {PortfolioCode, SecurityAccount}, {Porfolio, Custodian}
First, I simply used dynamic queries in my application code but, as the database grew, the queries became slower.
I wonder if it would be a good idea to add another table that will contain the consolidated quantities. I guess it depends on the distribution of those groups?
Besides, how to synchronize the source table with the consolidated one?

In SQL Server you could use indexed views to do this, it'd keep the aggregates synchronised with the underlying table, but would slow down inserts to the underlying table:
http://technet.microsoft.com/en-us/library/ms191432.aspx
If it's purely a count of grouped rows in a single table, would standard indexing not suffice here? More info on your structure would be useful.
Edit: Also, it sounds a little like you're using your OLTP server as a reporting server? If so, have you considered whether a data warehouse and an ETL process might be appropriate?

Linq-to-SQL: how many datacontexts?

I have a SQL Server 2008 database with > 300 tables. The application I have to design is an Windows Forms app, .NET 3.5, C#.
Which is the best way to work with Linq-to-SQL ?
I intend to make a datacontext for each business entity.
Is there any problem ?
I need to know if this way of working with Linq-to-SQL has any disadvantage or can create performance issues ?
Thanks.

You should typically have 1 single DBML file (=data context) per database. You should certainly not create a DataContext per business entity, because doing this would make you lose most of the useful capabilities of LINQ to SQL, like memory transactions (unit of work), lazy loading, and doing LINQ queries over multiple entities.
You have a pretty big model (+300 tables) which means a lot of entities. A lot of entities is not a big problem, except for the LINQ to SQL designer. Using the designer with such big models can be pretty annoying. This can be a reason to split a domain in multiple sub domains (with each a DBML file), but certainly not one per entity. However, keep in mind that you loose the L2S capabilities at the boundaries of the domains.
In the past I advised a team, who had split up their +150 entities domain in 5 DBML files, to merge them back together to a single DBML. The pain of editing the model went up, but the pain of using multiple DataContexts went away, which lowered the overall pain drastically for them.

There is no point in making a data context for each business entity, you only need one datacontext per database.

well it depends on how many users will use your database simultaneously not how many tables are there. So its all about typical database issues: number of connections, locking and other stuff.

I now use 1 for the entire database, but there are legitimate uses for having more. For example, I run a script when installing my site that connects to a remote DB and imports and converts data to the new format for deployment. The process uses some temporary tables.
By putting the temporary tables in a separate context, once the site is deployed I can simply delete these contexts and code as they are independent entities.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.