I have an excel spreadsheet which contains addresses. I'm reading the data from the spreadsheet using OLEDB and storing it into a DataTable in C#.
String connString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" + filename + #";Extended Properties=""Excel 8.0;HDR=1;IMEX=1""";
Here's the problem: When I use the DataSet Visualizer, I have an empty string in the zip code field.
12345-1234 --> ""
So I want to correct this behavior so that the zip code appears as it should. If i have to chop off the digits after the hyphen, that would be fine. How can I ensure the zip code gets read?
Excel often has its own ideas about how a column should be formatted. For example, if you have a column containing zip codes, some with the Plus 4, others without, it is pretty much a crap shoot as to how that column will be formatted. Maybe Excel will assume that its filled with Zip+4's, maybe it will assume 5-digit Zips, or maybe just numbers. I've worked with these files for years, and I'm convinced Microsoft uses a random number generator in making this decision.
As for your original question, according to this site, CONVERT is a valid SQL scalar function, so maybe something like
SELECT CONVERT(BadField, SQL_CHAR) AS FixedField FROM [Table$]
might work?
My first inclination was to suggest using COM (instead of OleDb) to read the data from the spreadsheet. I'm pretty sure you would be able to read each cell's format and deal with it accordingly, but I always found Excel via COM to be difficult and not terribly fast.
Related
I received a requirement to save data in CSV file and send it to customers.
Customers use both Excel and Notepad to view this file.
Data look like:
975567EB, 973456CE, 971343C8
And my data have some number end by "E3" like:
98765E3
so when open in Excel, it will change to:
9.8765E+7
I write a program to change this format to text by adding ="98765E3" to this in C#
while(!sr.EndOfStream) {
var line = sr.ReadLine();
var values = line.Split(',');
values[0] = "=" + "\"" + values[0] + "\""; //Change number format to string
listA.Add(new string[] {values[0], values[1], values[2], values[3]});
}
But with customer, who use Notepad to open CSV file, it will show like:
="98765E3"
How could I save number as text in CSV to open in both Excel and Notepad with the same result? Greatly appreciate any suggestion!
Don't Shoot the messenger.
Your problem is not the way you are exporting (creating...?) data in C#. It is with the way that you are opening the CSV files in Excel.
Excel has numerous options for importing text files that allow for the use of a FieldInfo parameter that specifies the TextFileColumnDataTypes property for each field (aka column) of data being brought in.
If you chose to double-click a CSV file from an Explorer folder window then you will have to put up with what Excel 'best-guesses' are your intended field types for each column. It's not going to stop halfway through an import process to ask your opinion. Some common errors include:
An alphanumeric value with an E will often be interpreted as scientific notation.
Half of the DMY dates will be misinterpreted as the wrong MDY dates (or vise-versa). The other half will become text since Excel cannot process something like 14/08/2015 as MDY.
Any value that starts with a + will produce a #NAME! error because Excel thinks you are attempting to bring in a formula with a named quality.
That's a short list of common errors. There are others. Here are some common solutions.
Use Data ► Get External Data ► From Text. Explicitly specify any ambiguous column data type; e.g. 98765E3 as Text, dates as either DMY, MDY, YMD, etc as the case may be. There is even the option to discard a column of useless data.
Use File ► Open ► Text Files which brings you through the same import wizard as the option above. These actions can be recorded for repeated use using either command.
Use VBA's Workbooks.OpenText method and specify each column's FieldInfo position and data type (the latter with a XlColumnDataType constant).
Read the import file into memory and process it in a memory array before dumping it into the target worksheet.
There are less precise solutions that are still subject to some interpretation from Excel.
Use a Range.PrefixCharacter to force numbers with leading zeroes or alphnumeric values that could conceivably be misinterpreted as scientific notation into the worksheet as text.
Use a text qualifier character; typically ASCII character 034 (e.g. ") to wrap values you want to be interpreted as text.
Copy and paste the entire text file into the target worksheet's column A then use the Range.TextToColumns method (again with FieldInfo options available for each column).
These latter two methods are going to cause some odd values in Notepad but Notepad isn't Excel and cannot process a half-million calculations and other operations in several seconds. If you must mash-up the two programs there will be some compromises.
My suggestion is to leave the values as best as they can be in Notepad and use the facilities and processes readily available in Excel to import the data properly.
I'm needing to import an Excel spreadsheet into my program and have the following code:
string connectionString = String.Format(#"Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=""Excel 8.0;IMEX=1;HDR=NO;""", MyExcelFile.xls);
command.CommandText = "SELECT * FROM [Sheet1$]";
(Note, code above isn't real code but should let you see what I'm doing)
I'm getting the file imported, only problem is any columns in the Excel sheet which are over 255 characters are being truncated.
Is there any way around this happening?
I read somewhere that if you make sure there is a long line of text in the column within the first 8 rows, then it will be treated as a memo field and therefore not truncated but that didn't seem to work.
Any ideas?
Graeme
Bumped into this one a few times. Fortunatly there's a registry hack to fix, described on MSDN here: http://support.microsoft.com/kb/189897
Effectively, Excel only looks at the first eight rows of data to determine how long columns should be. 255 is the default if the length is 255 chars or less. The MSDN article I've referenced above explains how to add a registry key "TypeGuessRows" that tells Excel how many rows to scan to determine column lengths.
Probably your problem has an easier solution, but as a last resort, try to save your Excel file as a CSV text file, then process it using the regular file and string manipulation classes instead of the JET engine.
Because I couldn't find the exact answer I needed, I'm going to leave this here in case it helps anyone.
HKEY_LOCAL_MACHINE ► Software ► Wow6432Node ► Microsoft ► Office ► 12.0 ► Access Connectivity Engine ► Engines
TypeGuessRows = 0
Source
It is usually best to import into an existing table. It is not too difficult to create a suitable table via code.
I have a .NET Windows Forms applications and I need to copy a list of 8-digit numeric codes into the clipboard to be pasted to Excel sheet.
string tabbedText = string.Join("\n", codesArray);
Clipboard.SetText(tabbedText);
The problem is that when a code begins with one or more zeros (ex. "00001234") it's pasted as number with the zeros trimmed.
Is there a way how to set clipboard text so that Excel accepts it as text?
I would treat this problem inside of Excel (and not in your application programaticaly). Format your cells to be treated as text, and then paste from clipboard. This way leading zeros are always pasted.
EDIT: This doesn't work in Excel, in that the apostrophe gets pasted in and shows up too. I'm leaving the answer here as an explicit statement that this approach won't help for Excel.
It does work for OpenOffice Calc though.
The standard way to 'tell' Excel to treat a string as a string is to prefix it with an apostrophe. Have you tried something like:
string tabbedText = "'" + string.Join("\n'", codesArray);
(note the extra apostrophe in there... it's a bit hard to see).
Of course, this may cause you issues if you're planning to use this value thereafter in Excel calculations but there are ways to handle that too.
I have to build a C# program that makes CSV files and puts long numbers (as string in my program). The problem is, when I open this CSV file in Excel the numbers appear like this:
1234E+ or 1234560000000 (the end of the number is 0)
How I retain the formatting of the numbers? If I open the file as a text file, the numbers are formatted correctly.
Thanks in advance.
As others have mentioned, you can force the data to be a string. The best way for that was ="1234567890123". The = makes the cell a formula, and the quotation marks make the enclosed value an Excel string literal. This will display all the digits, even beyond Excel's numeric precision limit, but the cell (generally) won't be able to be used directly in numeric calculations.
If you need the data to remain numeric, the best way is probably to create a native Excel file (.xls or .xlsx). Various approaches for that can be found in the solutions to this related Stack Overflow question.
If you don't mind having thousands separators, there is one other trick you can use, which is to make your C# program insert the thousands separators and surround the value in quotes: "1,234,567,890,123". Do not include a leading = (as that will force it to be a string). Note that in this case, the quotation marks are for protecting the commas in the CSV, not for specifying an Excel string literal.
Format those long numbers as strings by putting a ' (apostrophe) in front or making a formula out of it: ="1234567890123"
You can't. Excel stores numbers with fifteen digits of precision. If you don't mind not having the ability to perform calculations on the numbers from within Excel, you can store them as Text, and all of the digits will display.
When I generate data to imported into Excel, I do not generate a CSV file if I want control over how the data are displayed. Instead, I write out an Excel file where the properties of the cells are set appropriately. I do not know if there is a library out there that would do that for you in C# without requiring Excel to be installed on the machine generating the files, but it is something to look into.
My two cents:
I think it's important to realize there is a difference between "Data" and "Formatting". In this example you are kind of trying to store both in a data-only file. This will, as you can tell from other answers, change the nature of the data. (In other words cause it to be converted to a string. A CSV file is a data only file. You can do some tricks here and there to merge formatting in with data, but to my way of thinking this essentially corrupts the data by merging it with non-data values: ie: "Formatting".
If you really need to be able to store formatting information I suggest that, if you have time to develop it out, you switch to a file type capable of storing formatting info separately from the data. It sounds like this problem would be a good candidate for a XML Spreadsheet solution. In this way you can not only specify your data, but also it's type and any formatting you choose to use.
I'm able to connect to and read an excel file no problem. But when importing data such as zipcodes that have leading zeros, how do you prevent excel from guessing the datatype and in the process stripping out leading zeros?
I believe you have to set the option in your connect string to force textual import rather than auto-detecting it.
Provider=Microsoft.ACE.OLEDB.12.0;
Data Source=c:\path\to\myfile.xlsx;
Extended Properties=\"Excel 12.0 Xml;IMEX=1\";
Your milage may vary depending on the version you have installed. The IMEX=1 extended property tells Excel to treat intermixed data as text.
Prefix with '
Prefixing the contents of the cell with ' forces Excel to see it as text instead of a number. The ' won't be displayed in Excel.
There is a registry hack that can force Excel to read more than the first 8 rows when reading a column to determine the type:
Change
HKLM\Software\Microsoft\Jet\4.0\Engines\Excel\TypeGuessRows
To be 0 to read all rows, or another number to set it to that number of rows.
Not that this will have a slighht performance hit.
I think the way to do this would be to format the source excel file such that the column is formatted as Text instead of General. Select the entire column and right click and select format cells, select text from the list of options.
I think that would explicitly define that the column content is text and should be treated as such.
Let me know if that works.
Saving the file as a tab delimited text file has also worked well.
---old
Unfortunately, we can't rely on the columns of the excel doc to stay in a particular format as the users will be pasting data into it regularly. I don't want the app to crash if we're relying on a certain datatype for a column.
prefixing with ' would work, is there a reasonable way to do that programatically once the data already exists in the excel doc?
Sending value 00022556 as '=" 00022556"' from Sql server is excellent way to handle leading zero problem
Add "\t" before your string. It'll make the string seem in a new tab.