I need to copy large resultset from one database and save it to another database.
Stored procedures are used for both fetching and storing due to the fact that there is some logic involved during saving.
I'm trying to find an efficent solution, no way I can hold the whole dataset in memory, and I would like to minimize roundtrips count.
Data is read from source table with
var reader = fetchCommand.ExecuteReader();
while (reader.Read()){...}
Is there a way to insert this data to another sqlCommand without loading the whole dataset into a DataTable but also without inserting rows ine by one?
Sqlserver is MS SQL Server 2008 on both source and target databases. Databases are on different servers. Use of SSIS or linked servers is not an option.
EDIT:
It appears it's possible to stream rows into a stored procedure using table-valued paramaters. Will investigate this approach as well.
UPDATE:
Yes it's possible to stream data out from command.ExecuteReader to another command like this:
var reader = selectCommand.ExecuteReader();
insertCommand.Parameters.Add(
new SqlParameter("#data", reader)
{SqlDbType = SqlDbType.Structured}
);
insertCommand.ExecuteNonQuery();
Where insertCommand is a stored procedure with table-valued parameter #data.
You need SqlBulkCopy. You can just use it like this:
using (var reader = fetchCommand.ExecuteReader())
using (var bulkCopy = new SqlBulkCopy(myOtherDatabaseConnection))
{
bulkCopy.DestinationTableName = "...";
bulkCopy.ColumnMappings = ...
bulkCopy.WriteToServer(reader);
}
There is also a property to set the batch size. Something like 1000 rows might give you the best trade-off between memory usage and speed.
Although this doesn't let you pipe it into a stored procedure, the best approach might be to copy data to a temporary table and then run bulk update command on the server to copy the data into its final location. This usually faster by far than executing lots of separate statements for each row.
You can use SqlBulkCopy with a data-reader, which does roughly what you are asking (non-buffered etc) - however, this won't be calling stored procedures to insert. If you want that, perhaps use SqlBulkCopy to push the data into a second table (same structure), then at the DB server, loop over the rows calling the sproc locally. That way, latency etc ceases to be an issue (as the loop is all at the DB server).
Related
I have a stored procedure (SQL Server 2016) which currently returns 100K to 200K rows based on the parameters to that SP.
Each row can be a size of 100KB to 200KB. So total size can be around 10GB to 20GB.
My client(background job) has to call this SP and process all rows and send it to another client.
What is the best approach to handle such scenarios?
Currently I am thinking of using streaming enumerator using yield.
Get the record whenever the 'datareader.Read()' read a row and process it and send it to other client.
dataReader = command.ExecuteReader();
while (dataReader.Read())
{
obj = new SomeClass();
// prepare Someclass
yield return obj;
}
Is this approach sufficient to handler such large data?
Is there any better approach to it? (Such as multi threading etc.)
If so how should I approach to it. Any pointers to refer?
Edit: SP has multiple joins and runs couple of times in a day.
According to your description, I believe that it represents a good scenario for implementing an SSIS (Integration Services) which can manage and write the final results into a CSV file and allow the customer to exchange it.
I am using this code to insert 1 million records into an empty table in the database. Ok so without much code I will start from the point I have already interacted with data, and read the schema into a DataTable:
So:
DataTable returnedDtViaLocalDbV11 = DtSqlLocalDb.GetDtViaConName(strConnName, queryStr, strReturnedDtName);
And now that we have returnedDtViaLocalDbV11 lets create a new DataTable to be a clone of the source database table:
DataTable NewDtForBlkInsert = returnedDtViaLocalDbV11.Clone();
Stopwatch SwSqlMdfLocalDb11 = Stopwatch.StartNew();
NewDtForBlkInsert.BeginLoadData();
for (int i = 0; i < 1000000; i++)
{
NewDtForBlkInsert.LoadDataRow(new object[] { null, "NewShipperCompanyName"+i.ToString(), "NewShipperPhone" }, false);
}
NewDtForBlkInsert.EndLoadData();
DBRCL_SET.UpdateDBWithNewDtUsingSQLBulkCopy(NewDtForBlkInsert, tblClients._TblName, strConnName);
SwSqlMdfLocalDb11.Stop();
var ResSqlMdfLocalDbv11_0 = SwSqlMdfLocalDb11.ElapsedMilliseconds;
This code is populating 1 million records to an embedded SQL database (localDb) in 5200ms. The rest of the code is just implementing the bulkCopy but I will post it anyway.
public string UpdateDBWithNewDtUsingSQLBulkCopy(DataTable TheLocalDtToPush, string TheOnlineSQLTableName, string WebConfigConName)
{
//Open a connection to the database.
using (SqlConnection connection = new SqlConnection(ConfigurationManager.ConnectionStrings[WebConfigConName].ConnectionString))
{
connection.Open();
// Perform an initial count on the destination table.
SqlCommand commandRowCount = new SqlCommand("SELECT COUNT(*) FROM "+TheOnlineSQLTableName +";", connection);
long countStart = System.Convert.ToInt32(commandRowCount.ExecuteScalar());
var nl = "\r\n";
string retStrReport = "";
retStrReport = string.Concat(string.Format("Starting row count = {0}", countStart), nl);
retStrReport += string.Concat("==================================================", nl);
// Create a table with some rows.
//DataTable newCustomers = TheLocalDtToPush;
// Create the SqlBulkCopy object.
// Note that the column positions in the source DataTable
// match the column positions in the destination table so
// there is no need to map columns.
using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
bulkCopy.DestinationTableName = TheOnlineSQLTableName;
try
{
// Write from the source to the destination.
for (int colIndex = 0; colIndex < TheLocalDtToPush.Columns.Count; colIndex++)
{
bulkCopy.ColumnMappings.Add(colIndex, colIndex);
}
bulkCopy.WriteToServer(TheLocalDtToPush);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
// Perform a final count on the destination
// table to see how many rows were added.
long countEnd = System.Convert.ToInt32(
commandRowCount.ExecuteScalar());
retStrReport += string.Concat("Ending row count = ", countEnd, nl);
retStrReport += string.Concat("==================================================", nl);
retStrReport += string.Concat((countEnd - countStart)," rows were added.", nl);
retStrReport += string.Concat("New Customers Was updated successfully", nl, "END OF PROCESS !");
//Console.ReadLine();
return retStrReport;
}
}
Trying it via a connection to SQL server was around 7000ms(at best) & ~7700ms average. Also via a random kv nosql database took around 40 sec (really I did not even keep records of it as it passed over the x2 of sql variants). So... is there a faster way than what I was testing in my code?
Edit
i am using win7 x64 8gb ram and most important i should think (as i5 3ghz) is not so great by now
the x3 500Gb Wd on Raid-0 does the job even better
but i am just saying if you will check on your pc
though just compare it to any other method in your configuration
Have you tried SSIS? I have never written an SSIS package with a loacldb connection, but this is the sort of activity SSIS should be well suited.
If your data source is a SQL Server, another idea would be setting up a linked server. Not sure if this would work with localdb. If you can set up a linked server, you could bypass the C# all together and load your data with an INSERT .. SELECT ... FROM ... SQL statement.
you can use Dapper.NET.
Dapper is a micro-ORM, executes a query and map the results to a strongly typed List.
Object-relational mapping (ORM, O/RM, and O/R mapping) in computer software is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a “virtual object database” that can be used from within the programming language
For more info:
check out https://code.google.com/p/dapper-dot-net/
GitHub Repository: https://github.com/SamSaffron/dapper-dot-net
Hope It helps..
Remove looping... In SQL, try to make a table with 1 million rows... and left join it use this for insert/select data
Try sending it without storing it in a datatable.
See the example at the end of this post, that allows you to do it with an enumerator http://www.developerfusion.com/article/122498/using-sqlbulkcopy-for-high-performance-inserts/
If you are just creating nonsense data, create a stored procedure and just call that through .net
If you are passing real data, again passing it to a stored proc would be quicker but you would be best off dropping the table and recreating it with the data.
If you insert one row at a time, it will take longer than inserting it all at once. It will take even longer if you have indexes to write.
Create a single XML file for all rows you want to save into data base. Pass this XML to SQL stored procedure and save all record in one call only.
But your stored procedure must be written so that it can read all read then insert into table.
If this is a new project I recommend you to use Entity Framework. In this case you can create a List<> with an object with all the data you need and then simply add it entirely to the corresponding table.
This way you are quickly geting the needed data and then sending it to the database at once.
I agree with Mike on SSIS but it my not suit your environment, however for ETL processes that involve cross server calls and general data flow processes it is a great built in tool and highly integrated.
With 1 million rows you will likely have to do a bulk insert. Depending on the row size you would not really be able to use a stored procedure unless you did this in batches. A datatable will fill memory pretty quick, again depending on the row size. You could make a stored procedure and have that take a table type and call that every X number of rows but why would we do this when you already have a better, more scalable solution. That million rows could be 50 million next year.
I have used SSIS a bit and if that is an organizational fit I would suggest looking at it, but it wouldn't be a one time answer, wouldn't be worth the dependencies.
I have an application where I need to query a SQL Server CE database multiple times, often for the same data. The database has many tables, with intention to increase the amount of tables in the future, so nothing fixed.
The process is very slow, so I need a way of dumping the whole database into memory, and performing the queries there. The queries are all going to be very simple (on par with get the record with id X from table Y).
I was considering a DataTable, but I have many tables so that won't work.
I was also considering using LINQ , but each table is very different and I don't want to handwrite a new object each time (and .dbml files won't work with SQL Server CE - go figure).
Any other solution?
Way I ended up solving it is as follows.
I keep a Dictionary which maps tablenames to DataTables.
Whenever I try to access the DataTable, I search whether the name is in the dictionary or not.
public static DataTable GetDataTable(string tableName)
{
lock (lockMe)
{
if (!dictionary.Keys.Contains(tableName))
{
ReadTableIntoMemory(tableName);
}
return dictionary[tableName];
}
}
If they are not there, I lazily read the entire table into memory (using select * and a datatable) and add it to the dictionary.
This solution works for my case, but it might be too specific as a general solution.
C# with .net 2.0 with a SQL server 2005 DB backend.
I've a bunch of XML files which contain data along the lines of the following, the structure varies a little but is more or less as follows:
<TankAdvisory>
<WarningType name="Tank Overflow">
<ValidIn>All current tanks</ValidIn>
<Warning>Tank is close to capacity</Warning>
<IssueTime Issue-time="2011-02-11T10:00:00" />
<ValidFrom ValidFrom-time="2011-01-11T13:00:00" />
<ValidTo ValidTo-time="2011-01-11T14:00:00" />
</WarningType>
</TankAdvisory>
I have a single DB table that has all the above fields ready to be filled.
When I use the following method of reading the data from the XML file:
DataSet reportData = new DataSet();
reportData.ReadXml("../File.xml");
It successfully populates the Dataset but with multiple tables. So when I come to use SQLBulkCopy I can either save just one table this way:
sbc.WriteToServer(reportData.Tables[0]);
Or if I loop through all the tables in the Dataset adding them it adds a new row in the Database, when in actuality they're all to be stored in the one row.
Then of course there's also the issue of columnmappings, I'm thinking that maybe SQLBulkCopy is the wrong way of doing this.
What I need to do is find a quick way of getting the data from that XML file into the Database under the relevant columns in the DB.
Ok, so the original question is a little old, but i have just came across a way to resolve this issue.
All you need to do is loop through all the DataTables that are in your DataSet and add them to the One DataTable that has all the columns in the Table in your DB like so...
DataTable dataTable = reportData.Tables[0];
//Second DataTable
DataTable dtSecond = reportData.Tables[1];
foreach (DataColumn myCol in dtSecond.Columns)
{
sbc.ColumnMappings.Add(myCol.ColumnName, myCol.ColumnName);
dataTable.Columns.Add(myCol.ColumnName);
dataTable.Rows[0][myCol.ColumnName] = dtSecond.Rows[0][myCol];
}
//Finally Perform the BulkCopy
sbc.WriteToServer(dataTable);
foreach (DataColumn myCol in dtSecond.Columns)
{
dataTable.Columns.Add(myCol.ColumnName);
for (int intRowcnt = 0; intRowcnt <= dtSecond.Rows.Count - 1; intRowcnt++)
{
dataTable.Rows[intRowcnt][myCol.ColumnName] = dtSecond.Rows[intRowcnt][myCol];
}
}
SqlBulkCopy is for many inserts. It's perfect for those cases when you would otherwise generate a lot of INSERT statements and juggle the limit on total number of parameters per batch. The thing about the SqlBulkCopy class though, is that it's a cranky. Unless you fully specify all column mappings for the data set it will throw an exception.
I'm assuming that your data is quite manageable since your reading it into a DataSet. If you where to have even larger data sets you could lift chunks into memory and then flush them to the database piece by piece. But if everything fits in one go, it's as simple as that.
The SqlBulkCopy is the fastest way to put data into the database. Just setup column mappings for all the columns, otherwise it won't work.
Why reinvent the wheel? Use SSIS. Read with an XML Source, transform with one of the many Transformations, then load it with an OLE Db Destination into the SQL Server table. You will never beat SSIS in terms of runtime, speed to deploy the solution, maintenance, error handling etc etc.
I'm right now using SQL server 2008 in my project to store and fetch data . this is going perfect till now . I can fetch 20000 records in less than 50ms (JSON) . but facing a problem with inserts stuff . in my project I need to be able to insert something like 100000 records every minute . and this is seems to be very slow with SQL server .
I've tried to use another database (NOSQL DB) like mongoDB which are very fast in storing data (5s) comparing to SQLServer(270s) but not fast as sql in fetching data(20000 => 180ms) .
So I'm asking here if there any way to make SQL faster in storing . or to make mongoDB faster in fetching ( I'm not an expert in mongoDB I know the very basic things about it ) .
public static void ExecuteNonQuery(string sql)
{
SqlConnection con = GetConnection();
con.Open();
SqlCommand cmd = new SqlCommand(sql, con);
try
{
cmd.ExecuteNonQuery();
}
finally
{
con.Close();
}
}
SQL's Insert function
public IEnumerable<T> GetRecords<T>(System.Linq.Expressions.Expression<Func<T, bool>> expression, int from, int to) where T : class, new()
{
return _db.GetCollection<T>(collectionName).Find<T>(expression).Skip(from).Limit(to).Documents;
}
Mongo's Select function ( MongoDB 1.6 )
Update
: data structure : (int) Id , (string) Data
I guess that you are executing each insert in a transaction of its own (an implicit transaction might have been created if you do not provide one explicitly). As SQL server needs to ensure that the transaction is committed to the hard drive each transaction has a overhead that is very significant.
To get things to go faster, try to perform many inserts (try with a thousand or so) in a single ExecuteNonQuery() call. Also do not open and close, but keep the connection open (thus being in the same transaction) for several inserts.
You should have a look at the SqlBulkCopy Class
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
MongoDB is very fast on reads and writes. 50k reads and writes per second is doable on commodity hardware - depending on the data size. In addition to that you always have the option to scale out with sharding and replica sets but as said: 20k operations per seconds
with MongoDB is nothing.
Generally the speed on inserting data into the database is a function on the complexity of the operation.
If your inserts are significantly slow, then it points to optimisation problems with the inserts. Identify exaxtly what SQL insert statements your program is generating and then use the database EXPLAIN function to figure out what operations the underlying database is using. This often gives you a clue as to how you need to change your setup to increase the speed of these operations.
It might mean you have to change your database, or it might mean batching your inserts into a single call rather than inserting each item separately.
I see you are setting up and closing the connection each time.. this takes a significant time in itself. Try using a persistent connection.