Keeping historical data in the database [duplicate] - c#

This question already has answers here:
How to best handle the storage of historical data?
(2 answers)
Closed 8 years ago.
What is the best practice of keeping past data in the database? For example lets say we have transaction tables like AttendanceTrans, SalaryTrans in a payroll solution. So every month we have to insert hundreds or thousands of new records to these tables. So all the past and current data in same table.
Another approach would be to keep AttendanceHistory and SalaryHistory tables. So that end of every period (month) we empty the Trans tables after coping the data to respective History tables.
When considering factors like performance, ease of report generation, ease of coding and maintenance, what would be the optimum solution?
Note : RDBMS is SQL Server 2008 and programming environment is .NET (C#)

In general you should keep all the data in the same table. SQL Server is great at handling large volumes of data and it will make your life a lot easier (reporting, querying, coding, maintenance) if it's all in one place. If your data is indexed appropriately then you'll be just fine with thousands of new records per month.

In my opinion, best solution in sql server is CDC (Change Data Capture). It very simple to use. You can change volume of historical data with changing schedule of clear job.
I think this is the best way for performance because CDC gets changes from transaction log (it is not triggers on table), but you need to use Full Recovery Model for you database.

Related

How to load data from mysql db on start up .Net Core Web API service application? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 26 days ago.
Improve this question
I am new to .Net Core and MySql.
I am trying to develop an service application which would provides multiple types of data to other applications via REST api call using .Net Core and MySql.
I am not able to figure out how to load all the data at start of the application, so that when api calls are triggered, the response can be generated with the data already loaded in the application instead of fetching it from database for each request.
Please suggest an efficient way to achieve this.
I would solve it by using a class that handles the connection to the database and then you have methods for each way you want to access data. Then on the first fetch you collect the data you need from the database and then cache it for the next call. That way you don't need to keep everything in memory if you don't need it.
But if you really need full speed preloading (if possible) is the way to go. Then it depends much of how your data i structured. I would probably load all data into classes so they are already in the correct format and then use a dictionary with the appropriate keys to get the data out of the dictionary. The problem will be if it will be possible to get data in many different ways. If you have database with persons you could create a Person class with all the information about the person and then use the email as the unique identifier so when your other application want all info of a Person by using email a dictionary will be superfast. But if you also want to get all persons in a city or with a specific first name the dictionary will be slow since you will have to loop through all items to look for the city or name.
I would put in my time on how all the searches will be and if you have an id as a unique identifier of a person you could use that in the dictionary and then use a seperate dictionary for each search. So one dictionary where the key is the email and the value points to the unique id in the dictionary with all users. And then another dictionary for cities where the dictionary contains a list of all ids of persons in that city and so forth. In this way you are creating a kind of index like the database uses to fetch data fast.
But it really depends on your data. Is it very complex. Are there a lot of items in the database? Are there a lot of tables which will have there own searches? Can you search for the items in many ways? Will it be allowed to search using wildcards?
The problem is that you are trying to create a temporary in-memory database using the real database as a starting point. And it's not possible to ask someone which is the most efficient way to create a database. To be able to answer that you will need more info.
If you need even more speed you could also pre-serialize all responses and keep them as strings so you can send json (or which format you now will use) straight away. The problem is the more speed you need the more "ugly" the code will get and you will pay for it in memory consumption.

Best practice for populating primary key values in SQL Server using Visual Studio [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am working on a project for college and stumbled upon a problem which I am not sure how to handle it in a best way.
I have a SQL Server database that is manually created, and a Winforms project in Visual Studio written in C#. The application should do CRUD operations on database.
My question is what is the best way for manipulate the primary key columns in the tables? Should I define them in database as auto increment integer and let the database management system to handle the primary keys or should I define them just as int and populate them programatically within Visual Studio project, and if so how to do it?
I am not looking for complete solution, just for hint what is the best way of doing this.
I am very much a beginner, so please be gentle...
In general, auto-incremented (or identity or serial) primary keys are the way to go. You don't generally want your application to be worrying about things like whether the values have been used already.
If your application is multi-threaded -- that is, multiple users at the same time -- then the database will take care of any conflicts. That is quite convenient.
I am a fan of surrogate keys created by the database. In databases that cluster (sort) the rows by the primary key, it is much more efficient to have an automatically incremented value. And the database can take care of that.
There are some cases where you want a natural key. Of course, that is also permissible. But if you are going to invent a primary key for a table, let the database do the work.
When defining structures for DB backing for CRUD operations, you need to ask yourself:
Does it matter that my primary key is highly predictable?
By that I mean, if I am launching a user to a screen such "whatever.com/something/edit/1"
Aside from obvious security, does it help or harm the business process that a user can manipulate the url and inject 2 or 3 or 4 into path?
If it doesn't matter then absolutely set it as auto increminitng int on the DB side and offload that area of responsibility to the database to handle. You now no longer have to highly concern yourself with dealing with key generation.
IF it does matter then set the primary key as a unique identifier. In code when adding a new record set then you will generate a new GUID and set that as the primary key (Guid.NewGuid()). This would prevent the user from traversing your data in an uncontrolled manner as randomly guessing GUIDs would then be problematic to them. EX:
New path: "whatever.com/something/edit/0f8fad5b-d9cb-469f-a165-70867728950e"
Not saying that it is impossible to stumble upon things but the regular person using your application would not be so inclined to go exploring with url manipulation since they would be wasting 99.99% of their time with invlaid posts trying to guess a valid GUID that is registered in your DB.
As an added comment, if you decided to keep the primary key as a int and not use auto-increment then you are just setting yourself up for a ton of unnecessary work where I have never personally ever seen any real return on investment for the logic you would right to check if the placeholder is already used. That and think about tracking history? You would be settings yourself up for a world of pain if you ever decided to remove records from the table and then reuse them. That is a whole other set of concerns you would have to manage on top of what you are doing.

multiple resultsets vs multiple calls for performance [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm working on a fairly high performance application, and I know database connections are usually one of the more expensive operations. I have a task that runs pretty frequently, and in the course of business it has to select data from Table1 and Table2. I have two options:
Keep making two entity framework queries like I am right now. select from Table1 and select from Table2 in linq queries. (What I'm currently doing now).
Created a stored procedure that returns both resultsets in one query, using multiple resultsets.
I'd imagine the cost to SQL Server is the same: the same IO is being performed. I'm curious if anyone can speak to the performance bump that may exist in a "hot" codepath where milliseconds matter.
and I know database connections are usually one of the more expensive operations
Unless you turn off connection pooling, then as long as there are connections already established in the pool and available to use, obtaining a connection is pretty cheap. It also really shouldn't matter here anyway.
When it comes to two queries (whether EF or not) vs one query with two result sets (and using NextResult on the data reader) then you will gain a little, but really not much. Since there's no need to re-establish a connection either way, there's only a very small reduction in the overhead of one over the other, that will be dwarfed by the amount of actual data if the results are large enough for you to care much about this impact. (Less overhead again if you could union the two resultsets, but then you could do that with EF too anyway).
If you mean the bytes going too and fro over the connection after it's been established, then you should be able to send slightly less to the database (but we're talking a handful of bytes) and about the same coming back, assuming that your query is only obtaining what is actually needed. That is you do something like from t in Table1Repository select new {t.ID, t.Name} if you need ids and names rather than pulling back complete entities for each row.
EntityFramework does a whole bunch of things, and doing anything costs, so taking on more of the work yourself should mean you can be tighter. However, as well as introducing new scope for error over the tried and tested, you also introduce new scope for doing things less efficiently than EF does.
Any seeking of commonality between different pieces of database-handling code gets you further and further along the sort of path that ends up with you producing your own version of EntityFramework, but with the efficiency of all of it being up to you. Any attempt to streamline a particular query brings you in the opposite direction of having masses of similar, but not identical, code with slightly different bugs and performance hits.
In all, you are likely better off taking the EF approach first, and if a particular query proves particularly troublesome when it comes to performance then first see if you can improve it while remaining with EF (optimise the linq, use AsNoTracking when appropriate and so on) and if it is still a hotspot then try to hand-roll with ADO for just that part and measure. Until then, saying "yes, it would be slightly faster to use two resultsets with ADO.NET" isn't terribly useful, because just what that "slightly" is depends.
If the query is a simple one to read from table1 and table2 then LinQ queries should give similar performance as executing stored procedure (plain SQL). But if the query runs across different databases then plain SQL is always better, where you can Union the result sets and have the data from all databases.
In MySQL "Explain" statement can be used to know the performance of query. See this link:
http://www.sitepoint.com/using-explain-to-write-better-mysql-queries/
Another useful tool, is to check the SQL generated for your LinQ query in the output window of Microsoft Visual Studio. You can execute this query directly in a SQL editor and check the performance.

Azure pricing for sorting or filtering table entities [duplicate]

This question already has an answer here:
How would Azure storage be billed?
(1 answer)
Closed 8 years ago.
I am very new to Windows azure. I was just curious how the calculation of number of transactions for sorting or filtering the azure storage tables work?
Sorting is not supported on the server, so it certainly isn't billed for. Filtering is not billed in itself, but depending on what you're doing it can influence the number of transactions needed for any given query.
A given transaction can return at most 1000 entities, and that only if they're all on the same partition server. So a very selective filter will result in fewer entities returned, which might require fewer transactions. A non-selective filter (or no filter) may return many entities that will require multiple transactions to retrieve. The number of transactions can also be influenced by the number and size of the partitions, which is controlled by your choice of PartitionKey for your entities.
That said, transactions are dirt cheap at one penny per hundred thousand transactions. It's unlikely that they will be a significant portion of your costs. Don't spend time worrying about transaction costs unless you're very sure you need to.

Insert large volume data into SQL Server via c# in a single database call [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Speed up update of 185k rows in SQL Server 2008?
I have more than 700.000 contact mails and I want to insert them into SQL Server using C#, in a few minutes, not hours, in a single database call.
I know that pre is impossible but, I know there's way to do it
how can I achieve this?
Have a look at using SqlBulkCopy Class
Lets you efficiently bulk load a SQL Server table with data from
another source.
Microsoft SQL Server includes a popular command-prompt utility named
bcp for moving data from one table to another, whether on a single
server or between servers. The SqlBulkCopy class lets you write
managed code solutions that provide similar functionality. There are
other ways to load data into a SQL Server table (INSERT statements,
for example), but SqlBulkCopy offers a significant performance
advantage over them.

Categories