I have a webpage where the user can upload an excel file. I'm trying 2 different files - one works without a problem, and the other one gives me this error:
Error: Length cannot be less than zero. Parameter name: length
I know that sometimes this occurs when the file size is zero, but that is not the case here.
Can anyon shed light on this issue? Please let me know if you need more info.
As noted, more info is needed. It's not clear if you are opening the Excel file and processing directly from it or reading the data from Excel directly to a DataTable via ODBC, or something else.
Most of my problems reading Excel files are caused either by column titles, or by data in a particular column being different types. Check first to see if your two Excel files have the same columns, all columns have names, etc.
When you read to a DataTable, the program takes a guess at the data type of each column. If the first several cells are empty, the guess may be wrong. If your data is like mine, a column that looks like it's all numbers may be half actual numbers, half strings. Or, a column of dates may have an illegal value.
I have better luck writing the data from Excel to a .csv file, and having the program write a schema.ini and read it with the Microsoft Text Driver, but that may not suit your data.
Is there an update panel in your page ?
I had a problem when I was trying to use FileUpload with an Update Panel in the Page.
Very weird case, i suggest you could try to make a copy of file that works and try with this to see if works too.
or maybe, verify if both files were saved with the same version of Excel.
Related
I have to replace a broken SSIS package that basically just combines the data from a number of different Excel files, and creates a single master file. My C# desktop application is putting all the data into the Excel file correctly, but the one problem is that the original file, which was of type .xls, has some kind of hidden formatting on the cells.
When you just view the data in the cells, it looks normal, but when you click on a cell and view it in the edit box, there is a tick mark (it looks like this ') in front of the data. Editing it out does nothing, it just reappears.
I am guessing it is a "cheater" way to force the data to be of type character, since Excel is such a stinker about turning text which is made up of only numbers into a cell of type number, even if you write it as a string.
I wish we could do without it, but when we try to use a file without that formatting, it blows up the application. And I do not have the time or budget to rewrite the application.
How can I duplicate this "tick" mark in the files which I am writing? Currently I'm using EPPlus for writing, but I can also use Microsoft.Office.Interop.Excel.
If this is too difficult, I would also be open to someone telling me how to manually edit the file after I have created it programatically, and just add the cell formatting to the data.
I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.
I am wasting very much time on manipulating data in reports. Using pivot table is a good idea but how? I tried some free PivotTable classes but they were lacking subtotals.
Then, another approach. For excel output of reports I am using EPPlus. It also supports pivottable. The problem is some of our customers do not have office(OpenOffice, MicrosoftOffice etc.), so just creating and saving an xlsx file does not work. The only thing I can try with EPPlus is creating an ExcelPackage, filling a worksheet with data, and then creating a PivotTable with data.
I have several questions;
1) From that PivotTable object can I access the output of PivotTable fields and values. (Up to now I could not).
2) Related to the above question... Does an xlsx file contains data about the PivotTables or just the rules of creating PivotTable(Like name of table, sourceRange, rowFields, columnFields, dataFields, aggregate options etc). I have made a small test about this. Steps as following:
Opened a new excel file.
Inserted some raw data.
Created pivot table with the data.
Changed some values of data. (without refreshing pivot table)
Saved and closed the file.
Opened the file back.
In fact my guess was "pivot table would update according to new data", but I was wrong. It did not update. This may be a proof for "xlsx file contains not only rules for a pivot table but also all the values of it". If this is so I have a hope to access that data without saving the file (and I do not need any office programs).
3) Any other approach appreciated.
Thanks in advance
I am by no means an expert on EPPlus but have been working with it for the past few months and can hopefully shed some light on your questions.
If you create a brand new xlsx in EEP, add data to a worksheet, create a pivot table pointed at the data/worksheet, and save it - then the PivotTable does NOT contain any data. It merely contains the definition of the how the PT should slice the data when the file is opened in excel (as you mentioned in one of your questions).
When you actually open the file in excel and SAVE IT what excel does is copies all of the data that the PT relies on and puts it in the pivot table cache. This is why you can then delete the original cells that contained the data, save the file, and then reopen it in excel (might have to dismiss some errors), and still see the PT with data. You can even double click on one of the data cells in the PT and excel will regenerate some or all (depending on which cell you clicked) of the associated data into a new sheet.
Yes, your guess was in fact wrong because of this pivot table cache. You have to tell excel to update the data source in the proper Ribbon (assuming the data is still there) to see the new data show up.
So, to access the data you can figure out where it sits by going into the PivotTable.WorkSheet object and pulling the data out from that. You can see how I did it in the extension method i created here:
Create Pivot Table Filters With EPPLUS
Another option would be to extract the actual worksheet.xml file from the xlsx. An xlsx file (and any other MS Office .???x files) are just ZIP files renamed. So you can use the standard .NET methods to get the xml files out of the zip and use something like LinqToXml to extract the data. So something like this:
var zip = new ExcelPackage(file).Package;
var recordspart = zip.GetPart(new Uri("/xl/worksheets/sheet1.xml", UriKind.Relative));
var recordsxml = XDocument.Load(recordspart.GetStream());
It wont be pretty doing all the XML manipulation but if a final format of XLSX will not work it may be your best option.
I have strings (4 columns and 20 rows) in an excel sheet and I need to call there position in an app, such as column 2 row 159.
I was wondering what someone thought about my options and how I would move this data. I have looked into:
plain 2 dimensional array
sqlite
linq
dictionary hash table
It felt like they all required almost all manual data entry which I am trying to avoid. Does anyone have any ideas?
If you just need to take the data in your Excel spreadsheet and make it available to your WP7 application, this should be a straightforward process.
Firstly, I would save it in CSV format, this is the most accessible format for applications to read the data. Add it to your project as a resource and load it within your application, then use a CSV parser to load it into memory. You can then access it directly based on user input.
I have to read a pdf file which contains a table with several columns. Using iTextSharp I am able to read the file but I get bunch of non-formatted text. I am not able to structure the data so that I can insert into a database.
Any suggestions?
Unless its structured text there is no tagging to show columns. Tools like PdfBox make 'guesses' to try and extract the table.
There is an article explaining why text extraction is so hard at http://pdf.jpedal.org/java-pdf-blog/bid/12670/PDF-text
If I understand it correctly, pdf text is stored positionally, so it has no concept of rows or columns. That means you have to use heuristics based on the "likelihood" that a you're reading from a different column.
You can try doing this by comparing the amount of space between the words. (I'm not familiar with the ITextSharp interface so please forgive me if I'm mentioning things its not capable of. . . I'm mostly familiar with pdfNet.
Another idea that just came to me is that if the text has visual cues such as vertical lines separating the columns. If that's the case you should be able to come up with heuristics to determine if the text is left or right of the column lines.
...
However the best thing to do, if possible, is to get ahold of the data in a more database friendly format. This will likely save heartaches in the long run.
-- Jason
I am concluding there is no straight forward way to do this. Atleast reading the data in tabular format. I tried suggestions provided by Mark, but it is seems to be not feasible as per my requirement.