I am currently facing an issue where it takes quite a while to process information from a server, and I would like to provide active feedback for the user to know what is going on while the application appears to be just sitting around.
A little about the application: it allows the user to pull all databases from a specified server along with all content within those databases. This can take quite a while sometimes, since some of our databases can reach 2TB in size. Each server can contain hundreds of databases and so as a result, if I try to load a server with 100 databases, and 30% of those databases are over 100GB in size, it takes a good couple of minutes before the application is able to run effectively again.
Currently I just have a simple loading message that says: "Please wait, this could take a while...". However, in my opinion, this is not really sufficient for something that can take a few minutes.
So as a result, I am wondering if there is a way to track the progress of the SqlConnection object as it is executing the specified query? If so, what kind of details would I be able to provide and are there any readily available resources to look over and better understand the solution?
I am hopeful that there is a way to do this without having to recreate the SqlConnection object altogether.
Thank you all for your help!
EDIT
Also, as another note; I am NOT looking for handouts of code here. I am looking for resources that will help me in this situation if any are available. I have been looking into this issue for a few days already, I figure the best place to ask for help is here at this point.
Extra
A more thorough explanation: I am wondering if there is a way to provide the user with names of databases, tables, views, etc that are currently being received.
For example:
Loading Table X From Database Y On Server Z
Loading Table A From Database B On Server Z
The SqlConnection has an InfoMessage-Event to which you can assign a method.
Furthermore you have to set the FireInfoMessageEventOnUserErrors-Property to true.
No you can do something like
private void OnInfoMessage(object sender, SqlInfoMessageEventArgs e)
{
for (var index = 0; index < e.Errors.Count; index++)
{
var message = e.Errors[index];
//use the message object
}
}
Note that you should only evaluate messages with an errorcode lower then 11 (everything above is a 'real' error).
Depending on what command you are using sometime the server already generates such info messages (for example at VERIFY BACKUP).
Anyways you can also use this to report progress form a stored procedure or a query.
Pseudocode (im not an sqlguy):
RAISEERROR('START', 1,1) WITH NOWAIT; -- Will raise your OnInfoMessage-method
WHILE
-- Execute something
RAISEERROR('Done something', 1,1) WITH NOWAIT; -- Will raise your OnInfoMessage-method
-- etc.
END
Have fun with parsing, cause remember: such messages can also be generated by the server itself (so it is probably a good idea to start your own, relevant "errors" with a satic sequence which cannot occur under nomal circumstances).
Related
I'm a bit newbie still and I have been assigned with the task of maintaining previosuly done code.
I have a web that simulates SQL Management Studio, limitating deleting options for example, so basic users don't screw our servers.
Well, we have a function that expects a query or queries, it works fine, but our server RAM gets blown up with complex queries, maybe it's not that much data, but its casting xml and all that stuff that I still don't even understand in SQL.
This is the actual function:
public DataSet ExecuteMultipleQueries(string queries)
{
var results = new DataSet();
using (var myConnection = new SqlConnection(_connectionString))
{
myConnection.Open();
var sqlCommand = myConnection.CreateCommand();
sqlCommand.Transaction = myConnection.BeginTransaction(IsolationLevel.ReadUncommitted);
sqlCommand.CommandTimeout = AppSettings.SqlTimeout;
sqlCommand.CommandText = queries.Trim();
var dataAdapter = new SqlDataAdapter { SelectCommand = sqlCommand };
dataAdapter.Fill(results);
return results;
}
}
I'm a bit lost, I've read many different answers but either I don't understand them properly or they don't solve my problems in any way.
I know I could use Linq-toSql- or Entity, I tried them but I really don't know how to use them with an "unknown" query, I could try to research more anyway so if you think they will help me approaching a solution, by any means, I will try to learn it.
So to the point:
The function seems to stop at dataAdapter.Fill(results) when debugging, at that point is where the server tries to answer the query and just consume all its RAM and blocks itself. How can I solve this? I thought maybe by making SQL return a certain amount of data, store it in a certain collection, then continue returning data, and keep going until there is no more data to return from SQL, but I really don't know how to detect if there is any data left to return from SQL.
Also I thought about reading and storing in two different threads, but I don't know how the data that is in one thread can be stored in other thread async (and even less if it solves the issue).
So, yes, I don't have anything clear at all, so any guidance or tip would be highly appreciated.
Thanks in advance and sorry for the long post.
You can use pagination to fetch only part of the data.
Your code will be like this:
dataAdapter.Fill(results, 0, pageSize);
pageSize can be at size you want (100 or 250 for example).
You can get more information in this msdn article.
In order to investigate, try the following:
Start SQL profiler (it is usually installed along with SSMS and can be started from Management Studio, Tools menu)
Make sure you fill up some filters (either NT username or at least the database you are profiling). This is to catch as specific (i.e. only your) queries as possible
Include starting events to see when your query starts (e.g. RPC:Starting).
Start your application
Start the profiler before issuing the query (fill the adapter)
Issue the query -> you should see the query start in the profiler
Stop the profiler not to catch other queries (it puts overhead on SQL Server)
Stop the application (no reason to mess with server until the analysis is done)
Take the query within SQL Management Studio. I expect a SELECT that returns a lot of data. Do not run as it is, but put a TOP to limit its results. E.g. SELECT TOP 1000 <some columns> from ....
If the TOPed select runs slowly, you are returning too much data.
This may be due to returning some large fields such as N/VARCHAR(MAX) or VARBINARY(MAX). One possible solution is to exclude these fields from the initial SELECT and lazy-load this data (as needed).
Check these steps and come back with your actual query, if needed.
I'm working on a project where we're receiving data from multiple sources, that needs to be saved into various tables in our database.
Fast.
I've played with various methods, and the fastest I've found so far is using a collection of TableValue parameters, filling them up and periodically sending them to the database via a corresponding collection of stored procedures.
The results are quite satisfying. However, looking at disk usage (% Idle Time in Perfmon), I can see that the disk is getting periodically 'thrashed' (a 'spike' down to 0% every 13-18 seconds), whilst in between the %Idle time is around 90%. I've tried varying the 'batch' size, but it doesn't have an enormous influence.
Should I be able to get better throughput by (somehow) avoiding the spikes while decreasing the overall idle time?
What are some things I should be looking out to work out where the spiking is happening? (The database is in Simple recovery mode, and pre-sized to 'big', so it's not the log file growing)
Bonus: I've seen other questions referring to 'streaming' data into the database, but this seems to involve having a Stream from another database (last section here). Is there any way I could shoe-horn 'pushed' data into that?
A very easy way of inserting loads of data into an SQL-Server is -as mentioned- the 'bulk insert' method. ADO.NET offers a very easy way of doing this without the need of external files. Here's the code
var bulkCopy = new SqlBulkCopy(myConnection);
bulkCopy.DestinationTableName = "MyTable";
bulkCopy.WriteToServer (myDataSet);
That's easy.
But: myDataSet needs to have exactly the same structure as MyTable, i.e. Names, field types and order of fields must be exactly the same. If not, well there's a solution to that. It's column mapping. And this is even easier to do:
bulkCopy.ColumnMappings.Add("ColumnNameOfDataSet", "ColumnNameOfTable");
That's still easy.
But: myDataSet needs to fit into memory. If not, things become a bit more tricky as we have need a IDataReader derivate which allows us to instantiate it with an IEnumerable.
You might get all the information you need in this article.
Building on the code referred to in alzaimar's answer, I've got a proof of concept working with IObservable (just to see if I can). It seems to work ok. I just need to put together some tidier code to see if this is actually any faster than what I already have.
(The following code only really makes sense in the context of the test program in code download in the aforementioned article.)
Warning: NSFW, copy/paste at your peril!
private static void InsertDataUsingObservableBulkCopy(IEnumerable<Person> people,
SqlConnection connection)
{
var sub = new Subject<Person>();
var bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "Person";
bulkCopy.ColumnMappings.Add("Name", "Name");
bulkCopy.ColumnMappings.Add("DateOfBirth", "DateOfBirth");
using(var dataReader = new ObjectDataReader<Person>(people))
{
var task = Task.Factory.StartNew(() =>
{
bulkCopy.WriteToServer(dataReader);
});
var stopwatch = Stopwatch.StartNew();
foreach(var person in people) sub.OnNext(person);
sub.OnCompleted();
task.Wait();
Console.WriteLine("Observable Bulk copy: {0}ms",
stopwatch.ElapsedMilliseconds);
}
}
It's difficult to comment without knowing the specifics, but one of the fastest ways to get data into SQL Server is Bulk Insert from a file.
You could write the incoming data to a temp file and periodically bulk insert it.
Streaming data into SQL Server Table-Valued parameter also looks like a good solution for fast inserts as they are held in memory. In answer to your question, yes you could use this, you just need to turn your data into a IDataReader. There's various ways to do this, from a DataTable for example see here.
If your disk is a bottleneck you could always optimise your infrastructure. Put database on a RAM disk or SSD for example.
I am working on a C# application, which loads data from a MS SQL 2008 or 2008 R2 database. The table looks something like this:
ID | binary_data | Timestamp
I need to get only the last entry and only the binary data. Entries to this table are added irregular from another program, so I have no way of knowing if there is a new entry.
Which version is better (performance etc.) and why?
//Always a query, which might not be needed
public void ProcessData()
{
byte[] data = "query code get latest binary data from db"
}
vs
//Always a smaller check-query, and sometimes two queries
public void ProcessData()
{
DateTime timestapm = "query code get latest timestamp from db"
if(timestamp > old_timestamp)
data = "query code get latest binary data from db"
}
The binary_data field size will be around 30kB. The function "ProcessData" will be called several times per minutes, but sometimes can be called every 1-2 seconds. This is only a small part of a bigger program with lots of threading/database access, so I want to the "lightest" solution. Thanks.
Luckily, you can have both:
SELECT TOP 1 binary_data
FROM myTable
WHERE Timestamp > #last_timestamp
ORDER BY Timestamp DESC
If there is a no record newer than #last_timestamp, no record will be returned and, thus, no data transmission takes place (= fast). If there are new records, the binary data of the newest is returned immediately (= no need for a second query).
I would suggest you perform tests using both methods as the answer would depend on your usages. Simulate some expected behaviour.
I would say though, that you are probably okay to just do the first query. Do what works. Don't prematurely optimise, if the single query is too slow, try your second two-query approach.
Two-step approach is more efficient from overall workload of system point of view:
Get informed that you need to query new data
Query new data
There are several ways to implement this approach. Here are a pair of them.
Using Query Notifications which is built-in functionality of SQL Server supported in .NET.
Using implied method of getting informed of database table update, e.g. one described in this article at SQL Authority blog
I think that the better path is a storedprocedure that keeps the logic inside the database, Something with an output parameter with the data required and a return value like a TRUE/FALSE to signal the presence of new data
I'm trying to implement Microsoft Synchronization services into a smart device application I am developing however I seem to have hit a brick wall and I am hoping someone will be able to provide a solution. I have managed to implement synchronization so that it downloads every record in a table, however I am wanting to filter the records so only the data relating to the user is downloaded. To achieve this I have added the a WHERE Operator.kde = #kde clause to the SelectIncrementalInsertsCommand, as shown in the following code.
this.SelectIncrementalInsertsCommand.CommandText = #"IF #sync_initialized = 0 SELECT dbo.Operator.[OperatorID], [kde], [OperatorName], [Pass] FROM dbo.Operator LEFT OUTER JOIN CHANGETABLE(CHANGES dbo.Operator, #sync_last_received_anchor) CT ON CT.[OperatorID] = dbo.Operator.[OperatorID] WHERE dbo.Operator.[kde] = #kde AND (CT.SYS_CHANGE_CONTEXT IS NULL OR CT.SYS_CHANGE_CONTEXT <> #sync_client_id_binary) ELSE BEGIN SELECT dbo.Operator.[OperatorID], [kde], [OperatorName], [Pass] FROM dbo.Operator JOIN CHANGETABLE(CHANGES dbo.Operator, #sync_last_received_anchor) CT ON CT.[OperatorID] = dbo.Operator.[OperatorID] WHERE dbo.Operator.[kde] = #kde AND (CT.SYS_CHANGE_OPERATION = 'I' AND CT.SYS_CHANGE_CREATION_VERSION <= #sync_new_received_anchor AND (CT.SYS_CHANGE_CONTEXT IS NULL OR CT.SYS_CHANGE_CONTEXT <> #sync_client_id_binary)); IF CHANGE_TRACKING_MIN_VALID_VERSION(object_id(N'dbo.Operator')) > #sync_last_received_anchor RAISERROR (N'SQL Server Change Tracking has cleaned up tracking information for table ''%s''. To recover from this error, the client must reinitialize its local database and try again',16,3,N'dbo.Operator') END ";
I have then declared the #kde parameter as follows.
this.SelectIncrementalInsertsCommand.Parameters.Add(new System.Data.SqlClient.SqlParameter("#kde", System.Data.SqlDbType.Int));
To pass in the parameter I have added the following line to the code responsible for initiating the synchronization.
syncAgent.Configuration.SyncParameters.Add(new SyncParameter("#kde", kde));
NOTE: the kde value is an integer that is passed into my sync method
Despite these filters being added, the synchronization process seems to be completely ignoring them and downloading all the data for every operator. I have investigated this issue online and my code seems identical to numerous tutorials I have read, however it still does not work as desired.
I am fairly new to Sync Services so if anyone could provide me with information and guidance to solving this issue I will be hugely grateful
Thank You in advance
have you tried this?
ADDING FILTER TO LOCAL DATABASE CACHE GENERATED SYNC
I would suggest you run SQL Profiler as well to see the actual commands being passed to SQL Server.
I managed to fix this problem, it turned out to be some "dirty" records on the database which for some reason were affecting synchronization. Once I deleted these records everything worked as it should.
In one sentence, what i ultimately need to know is how to share objects between mid-tier functions w/ out requiring the application tier to to pass the data model objects.
I'm working on building a mid-tier layer in our current environment for the company I am working for. Currently we are using primarily .NET for programming and have built custom data models around all of our various database systems (ranging from Oracle, OpenLDAP, MSSQL, and others).
I'm running into issues trying to pull our model from the application tier and move it into a series of mid-tier libraries. The main issue I'm running into is that the application tier has the ability to hang on to a cached object throughout the duration of a process and make updates based on the cached data, but the Mid-Tier operations do not.
I'm trying to keep the model objects out of the application as much as possible so that when we make a change to the underlying database structure, we can edit and redeploy the mid-tier easily and multiple applications will not need to be rebuilt. I'll give a brief update of what the issue is in pseudo-code, since that is what us developers understand best :)
main
{
MidTierServices.UpdateCustomerName("testaccount", "John", "Smith");
// since the data takes up to 4 seconds to be replicated from
// write server to read server, the function below is going to
// grab old data that does not contain the first name and last
// name update.... John Smith will be overwritten w/ previous
// data
MidTierServices.UpdateCustomerPassword("testaccount", "jfjfjkeijfej");
}
MidTierServices
{
void UpdateCustomerName(string username, string first, string last)
{
Customer custObj = DataRepository.GetCustomer(username);
/*******************
validation checks and business logic go here...
*******************/
custObj.FirstName = first;
custObj.LastName = last;
DataRepository.Update(custObj);
}
void UpdateCustomerPassword(string username, string password)
{
// does not contain first and last updates
Customer custObj = DataRepository.GetCustomer(username);
/*******************
validation checks and business logic go here...
*******************/
custObj.Password = password;
// overwrites changes made by other functions since data is stale
DataRepository.Update(custObj);
}
}
On a side note, options I've considered are building a home grown caching layer, which takes a lot of time and is a very difficult concept to sell to management. Use a different modeling layer that has built in caching support such as nHibernate: This would also be hard to sell to management, because this option would also take a very long time tear apart our entire custom model and replace it w/ a third party solution. Additionally, not a lot of vendors support our large array of databases. For example, .NET has LINQ to ActiveDirectory, but not a LINQ to OpenLDAP.
Anyway, sorry for the novel, but it's a more of an enterprise architecture type question, and not a simple code question such as 'How do I get the current date and time in .NET?'
Edit
Sorry, I forgot to add some very important information in my original post. I feel very bad because Cheeso went through a lot of trouble to write a very in depth response which would have fixed my issue were there not more to the problem (which I stupidly did not include).
The main reason I'm facing the current issue is in concern to data replication. The first function makes a write to one server and then the next function makes a read from another server which has not received the replicated data yet. So essentially, my code is faster than the data replication process.
I could resolve this by always reading and writing to the same LDAP server, but my admins would probably murder me for that. The specifically set up a server that is only used for writing and then 4 other servers, behind a load balancer, that are only used for reading. I'm in no way an LDAP administrator, so I'm not aware if that is standard procedure.
You are describing a very common problem.
The normal approach to address it is through the use of Optimistic Concurrency Control.
If that sounds like gobbledegook, it's not. It's pretty simple idea. The concurrency part of the term refers to the fact that there are updates happening to the data-of-record, and those updates are happening concurrently. Possibly many writers. (your situation is a degenerate case where a single writer is the source of the problem, but it's the same basic idea). The optimistic part I'll get to in a minute.
The Problem
It's possible when there are multiple writers that the read+write portion of two updates become interleaved. Suppose you have A and B, both of whom read and then update the same row in a database. A reads the database, then B reads the database, then B updates it, then A updates it. If you have a naive approach, then the "last write" will win, and B's writes may be destroyed.
Enter optimistic concurrency. The basic idea is to presume that the update will work, but check. Sort of like the trust but verify approach to arms control from a few years back. The way to do this is to include a field in the database table, which must be also included in the domain object, that provides a way to distinguish one "version" of the db row or domain object from another. The simplest is to use a timestamp field, named lastUpdate, which holds the time of last update. There are other more complex ways to do the consistency check, but timestamp field is good for illustration purposes.
Then, when the writer or updater wants to update the DB, it can only update the row for which the key matches (whatever your key is) and also when the lastUpdate matches. This is the verify part.
Since developers understand code, I'll provide some pseudo-SQL. Suppose you have a blog database, with an index, a headline, and some text for each blog entry. You might retrieve the data for a set of rows (or objects) like this:
SELECT ix, Created, LastUpdated, Headline, Dept FROM blogposts
WHERE CONVERT(Char(10),Created,102) = #targdate
This sort of query might retrieve all the blog posts in the database for a given day, or month, or whatever.
With simple optimistic concurrency, you would update a single row using SQL like this:
UPDATE blogposts Set Headline = #NewHeadline, LastUpdated = #NewLastUpdated
WHERE ix=#ix AND LastUpdated = #PriorLastUpdated
The update can only happen if the index matches (and we presume that's the primary key), and the LastUpdated field is the same as what it was when the data was read. Also note that you must insure to update the LastUpdated field for every update to the row.
A more rigorous update might insist that none of the columns had been updated. In this case there's no timestamp at all. Something like this:
UPDATE Table1 Set Col1 = #NewCol1Value,
Set Col2 = #NewCol2Value,
Set Col3 = #NewCol3Value
WHERE Col1 = #OldCol1Value AND
Col2 = #OldCol2Value AND
Col3 = #OldCol3Value
Why is it called "optimistic"?
OCC is used as an alternative to holding database locks, which is a heavy-handed approach to keeping data consistent. A DB lock might prevent anyone from reading or updating the db row, while it is held. This obviously has huge performance implications. So OCC relaxes that, and acts "optimistically", by presuming that when it comes time to update, the data in the table will not have been updated in the meantime. But of course it's not blind optimism - you have to check right before update.
Using Optimistic Cancurrency in practice
You said you use .NET. I don't know if you use DataSets for your data access, strongly typed or otherwise. But .NET DataSets, or specifically DataAdapters, include built-in support for OCC. You can specify and hand-code the UpdateCommand for any DataAdapter, and that is where you can insert the consistency checks. This is also possible within the Visual Studio design experience.
(source: asp.net)
If you get a violation, the update will return a result showing that ZERO rows were updated. You can check this in the DataAdapter.RowUpdated event. (Be aware that in the ADO.NET model, there's a different DataAdapter for each sort of database. The link there is for SqlDataAdapter, which works with SQL Server, but you'll need a different DA for different data sources.)
In the RowUpdated event, you can check for the number of rows that have been affected, and then take some action if the count is zero.
Summary
Verify the contents of the database have not been changed, before writing updates. This is called optimistic concurrency control.
Other links:
MSDN on Optimistic Concurrency Control in ADO.NET
Tutorial on using SQL Timestamps for OCC