I have a huge amount of data in excel files, with at least 20 columns each file.
I am working with .net (c#), my task is to import rows that met the conditions to insert data into SQL database, for an example, I need to insert only rows with current year (or selected year), also I have column name is 'Full Employee Name' I need to check it if it exists in table Resource Human.
Also other condition is to check if the column name is the same in the SQL table.
I am succeeding to do it with code, but at least 200 lines to do all the possible checks. I read about SSIS (integration service, BI tool), and it looks that can help me to do my task.
My question how doing it? I am stacking with this new concept.
I think that choosing the best approach is based on your needs:
If you are looking to create automated jobs and to perform data import from excel to SQL periodically, i think it is better to go SSIS
If you are trying to create a small tool that convert an excel file to SQL table, then working with .NET is fine
If you are looking to loop over Excel files with different structure, then you should use .NET or you have to convert files to .csv then use SSIS.
Also you can refer to the following Microsoft documentation for more options in importing Excel files to SQL: (SQL queries, Linked servers, OPENROWSET ...)
Import data from Excel to SQL Server or Azure SQL Database
If you've already got a working .net solution, and 200 lines of code doesn't sound that bad to me, I wouldn't bother looking into SSIS to replace it.
Related
So I'm using C# and Visual Studio 2013. I currently have an importer that uses System.Data.OleDb to connect to MS Access databases and System.Data.SqlClient for the Sql Server connection.
What I originally was doing was I would read data from MS Access and store tables in a DataTable and store the data in the DataTable into SQL Server. It was doing well until I finally hit a table with some 30 odd columns and almost 1 million rows and I got an OutOfMemoryException.
So now I'm attempting to think of a work around. I am thinking of setting a row count check on an MS Access table before I attempt to load into a DataTable and if it is a certain number of rows or higher I plan to attempt to write to an external file and then do an import on that file.
So what I'm asking is anyone know how I can go about this? Only solutions I've seen use Interop and I've heared as a practice you don't want to use interop in code because its slow and not terribly reliable. I was attempting to get an import from MS Access to a .csv or .txt file, but if a table doesn't have a Primary Key I'm not sure how to go about iterating over a table if it's not currently in a DataTable.
If you are doing an import on large data, you could use and OleDbReader. When using an OleDbReader, it would not affect your memory as you would read through one record at a time to do the insert into another database.
It may take slightly longer, but will ensure completion without an OutOfMemory Error.
I have two databases. One of them belongs to a CRM software and is the source.
The other one will be the destination used by a tool I'm developing.
The destination will contain a table ADDRESSES with a subset of the columns of a table of the same name in the source database.
What is the best (most efficient) way to copy the data between those databases (btw: they're on different SQL Server instances if that's important).
I could write a loop which does INSERT into the destination for each row obtained from the source but I don't think that this is efficient.
My thoughts and information:
The data won't be altered on its way from source to destination
It will be altered on its way back
I don't have the complete structure of the source but I know which fields I need and that they're warranted to be in the source (hence, access to the rows obtained from source isn't possible using the index of columns)
I can't use LINQ.
Anything leading me in the right direction here is appreciated.
Edit:
I really need a C# way to copy the data. I also need to know how to merge the copied rows back to the source. Is it really necessary (or even best practise) to do this row after row?
Why write code to do this?
The single fastest and easiest way is just to use SQL Server's bcp.exe utility (bcp: Bulk Copy Program).
Export the data from the source server.
Zip it or tar it if it needs it.
FTP it over to where it needs to go, if you need to move it to another box.
Import it into the destination server.
You can accomplish the same thing via SQL Server Management Studio in a number of different ways. Once you've defined the task, it can be saved and it can be scheduled.
You can use SQL Server's Powershell objects to do this as well.
If you're set on doing it in C#:
write your select query to get the data you want from the source server.
execute that and populate a temp file with the output.
execute SQL Server's bulk insert statement against the destination server to insert the data.
Note: For any of these techniques, you'll need to deal with identity columns if the target table has them. You'll also need to deal with key collisions. It is sometimes easier to bulk load the data into a perma-temp table first, and then apply the prerequisite transforms and manipulations to get it to where it needs to go.
According to your comment on Jwrit's answer, you want two way syncs.
If so, you might want to look into Microsoft Sync Framework.
We use it to sync 200+ tables on Premise SQL to SQL Azure and SQL Azure to SQL Azure.
You can use purely C#. However, it might offer a lot more than you want, or it might be over kill for a small project.
I'm just saying so that you can have different option for your project.
If these databases exist on two servers you can setup a link between the servers by executing sp_addlinkedserver there are instructions for setting this up here. This may come in handy if you plan on regularly "Sharing" data.
http://msdn.microsoft.com/en-us/library/ff772782.aspx
Once the servers are linked a simple select statement can copy the rows from one table to another
INSERT INTO db1.tblA( Field1,Field2,Field2 )
SELECT Field1,Field2,Field2 FROM db2.tblB
If the Databases are on the same instance you only need to execute similar SQL to the above
If this is one time - the best bet is normally SSIS (SQL server integration services), unless there are complex data transformations - you can quickly and easily do column mappings and have it done (reliably) in 15 mins flat......
I have to import a log file to SQL Server Database. File size is big, around 6MB with 40000 lines. The task of importing has to be done everyday with daily log. Each line of the log file contains many fields, which I have to import into properly columns in Database for post-processing.
I'm confused of these solutions:
-Use SQL Server Integration Services to do this.
-Write a C# app using BULK COPY
I'm relatively free to choose solution (in SQL Server and .NET framework). What solution is better for this, or you can suggest another one.
Thank you very much.
//Edit: I tried SSIS and saw that it's really simple. But everyday, after receiving log file, my program has to automatically import it into database. How can I do that?
I would write an SSIS package to do this.
You could use the import/export wizard to generate the beginings of a package and adapt it to meet your exact needs.
To do this in SQL 2005, Right click on your database in object explorer in SQL Management Studio, go Tasks > Import Data, follow the wizard and at the end select to save the package.
I imagine it's a similar process in SQL 2008, but I don't have it to hand.
After you have saved your package it's possible to schedule it using the SQL Server aAgent, when setting up the job, choose "SQL Server Integraton Services Package" as the type and select your packege.
I would probably write a script that converts the log file into an sql dump that insert the log file strings and then load that sql dump into the database.
6MB is actually pretty small :)
SQL Server Integration Services is more than up to the task. BULK COPY can become complicated really quickly, especially for people who are new to this. As a 3rd option, you could write your own program to do INSERTS, but then again, that's what SSIS was built for, so just stick with that.
I am trying to replace a DTS access exporter package with a exe we can call from our stored procedures (using xp_cmdshell).
We are in the middle of a transition between SQL 2000 and SQL 2005, and for the moment if we can not use DTS OR SSIS that would be the best options.
I believe I have the following options:
Using a SQL data reader to read SQL records, and using ADO.net to insert the read records into Access.
I have implemented this and it is WAY too slow. This is not a option
Setting up Linked tables in access, then getting access to pull the data out of sql.
If anyone has any experience in doing this I would be grateful for some code examples or pointing out some resources?
If there are any other options for transferring large amounts of data from SQL into a Access database that would be awesome, but performance is a big issue as we can be dealing with up to 1mil records per table.
Have you tried this?
Why not creating a linked table in Access, and pulling data from Sql Server instead of pushing from Sql to Access ?
I've done plenty of cases where I start with an Access database, attach to SQL Server, create a Create Table or Insert Querydef, and write some code to execute the querydef, possibly with arguments. But there are a lot of assumptions I would need to make about your problem and your familiarity with Access to go into more detail. How far can you get with that description?
I have ended up using Access interop, thanks to le dorfier for pointing me in the direction of the import function which seems to be the simplest way..
I now have something along these lines:
Access.ApplicationClass app = new Access.ApplicationClass();
Access.DoCmd doCmd = null;
app.NewCurrentDatabase(_args.Single("o"));
doCmd = app.DoCmd;
//Create a view on the server temporarily with the query I want to export
doCmd.TransferDatabase(Access.AcDataTransferType.acImport,
"ODBC Database",
string.Format("ODBC;DRIVER=SQL Server;Trusted_Connection=Yes;SERVER={0};Database={1}", _args.Single("s"), _args.Single("d")),
Microsoft.Office.Interop.Access.AcObjectType.acTable,
viewName,
exportDetails[0], false, false);
//Drop view on server
//Releasing com objects and exiting properly.
Have you looked at bcp? It's a command line utility that's supposed work well for importing and exporting large amounts of data. I've never tried to make it play nice with Access, but it's a great lightweight alternative to DTS and/or SSIS.
Like others have said, the easiest way I know to get data into an Access mdb is to set things up in Access to begin with. Roughly speaking:
Create linked tables to the SQL data you want to export. (in Access: File --> get ecternal data --> link tables) This just gives you a connection to sql server.
Create a local table that represents teh schema of the data you want to export. (on the tables tab, click the "new" button and follow your nose).
Create an Update query that selects data from the linked tables (SQL Server) and appends rows to the local table (access mdb).
On the macros tab, create a new macro that executes the query you just created above (I can't recall the exact "action" to use, but it's something like OpenQuery or RunQuery); name the macro "autoexec", which will cause it to automatically run when the mdb is opened.
Use a script (or whatever) to copy and open the mdb when appropriate; the autoexec macro will kick things off and the query will copy data from SQL server to the mdb.
I need to import the data form .csv file into the database table (MS SQL Server 2005). SQL BULK INSERT seems like a good option, but the problem is that my DB server is not on the same box as my WEB server. This question describes the same issue, however i don't have any control over my DB server, and can't share any folders on it.
I need a way to import my .csv programatically (C#), any ideas?
EDIT: this is a part of a website, where user can populate the table with .csv contents, and this would happen on a weekly basis, if not more often
You have several options:
SSIS
DTS
custom application
Any of these approaches ought to get the job done. If it is just scratch work it might be best to write a throwaway app in your favorite language just to get the data in. If it needs to be a longer-living solution you may want to look into SSIS or DTS as they are made for this type of situation.
Try Rhino-ETL, its an open source ETL engine written in C# that can even use BOO for simple ETL scripts so you don't need to compile it all the time.
The code can be found here:
https://github.com/hibernating-rhinos/rhino-etl
The guy who wrote it:
http://www.ayende.com/blog
The group lists have some discussions about it, I actually added bulk insert for boo scripts a while ago.
http://groups.google.com/group/rhino-tools-dev
http://groups.google.com/group/rhino-tools-dev/browse_thread/thread/2ecc765c1872df19/d640cd259ed493f1
If you download the code there are several samples, also check the google groups list if you need more help.
i ended up using CSV Reader. I saw a reference to it in one of the #Jon Skeet's answers, can't find it again to put the link to it
How big are your datasets? Unless they are very large you can get away with parameterized insert statements. You may want to load to a staging table first for peace of mind or performance reasons.