I have a problem.
I need to know if there is a way to delete a series of rows in C# Excel Interop without reading every single cell for a match. I have a 10k row excel sheet and I only want to delete the ones I want but using a for to look for the row and delete it takes an absurd amount of time. In fact it´s faster to open the file, apply a filter and delete the rows but I want to make all on code.
Someone knows a way to do it?
Related
I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.
I need to write an application in C# (VS 2008) that will search in a relatively large (80K rows) excel file for a specific row. I would normally use ADO.net, but Windows Mobile doesn't support this. I've tried to export excel into xml and parse it with linq, but it is still slow. Does anyone have any suggestions?
I would suggest to export the excel table to CSV (Comma Seperated Values) format.
If possible, you could load the complete file and perform a search using Divide and Conquer
Basically, the idea behind it is to alphabetically sort the column that contains the value you are looking for, eg. we are looking for the name "peter" within the name column. Than, you take the value in the very middle of the column (eg. "malcom") and check if the value you are looking for comes before or after that middle value. In this case it should be AFTER "malcom" due to the sort, so you split the table in half and continue the search in the according half of the table. You can repeat that recursively until you have a hand full of records left in the table (lets say 10) and perform a regular search to find the value.
I once did such things in my final paper. My implementation was built in C++ and used hashtables. It was even faster than excel.
I have one xlsx file in which i have to delete start column and add one new column in the end.
i am using EPPLUS library , but i am not getting any documentation.
If you need this type of solution (where hiding is just not good enough), the other option is to create a new worksheet and copy all the columns exept for the ones you want to remove, then delete the original sheet. This is how I ended up implementing it. There may be unintended consequences in terms of formulas, and such, but for me I just had a table full of values which worked great.
Here is the link to download EPPlus Library Documentation:
http://epplus.codeplex.com/downloads/get/336102
and here is the link to sample project:
http://epplus.codeplex.com/downloads/get/329346
By the way, it seems that you can't remove a column. You may need to move all the data in the cells one column to the left. Another option may be hiding or collapsing the first column.
I wanna read rows in a single column and put that in to a string variable. I take one row at a time. When i read a one row, i wanna delete that row from the sheet and save the excel file. Second time, it skips the empty row and takes the next row. Deletes it and saves the file. This goes on..How can i achieve that using C#??
If you just wanted to read from Excel, I would suggest Linq to Excel
However, since you also want to update the Excel document, the best places to get started for free are either Excel Interop (link to follow shortly) or OLEDB connection to Excel.
If you have a little budget, Spreadsheet Gear is well worth investigating.
I am currently documenting changes to a specific excel file in Sharepoint. Basically, we are taking the versions of the file, getting the sheet in which to load, grabbing the data in that sheet and loading that data to another sheet in another workbook. I have everything in place, but I am looking for the best method for loading multiple sheets of data and putting it into one, the only problem is that most of the columns do not match. So in my head I have figured I should just copy the column headers from the latest version, then with the older versions, check to see if the columns match up and if they don't create another column at the end.
I am sure that this will take too long and I only really need this for a one time deal, because after that we will just add the new version to the sheet and it should take two seconds.
I am just looking for the best method, or proven methods. Thanks for any help.
As I read the question, you are asking how to concatenate tables while adding the necessary columns.
If you are doing this in C#, I suggest that you create a Dictionary that maps the column name in to the column number in the accumulator sheet. Now, you can easily map the columns from each of the source sheets to the accumulator sheet and add a new one when the Dictionary has no key for it.