Unexpected chars in Excel - c#

I am using ADO.NET to fill a datatable from an Excel (xls) worksheet.
I got unexpected chars. At first I thought they came somehow during the import and so I tried to emininate them in the C# program but nothing I tried worked.
Finally I traced the chars back to Excel and I was able to use the replace function in Excel to replace the char with ''. These chars show up as blanks in Excel and I only found them by working backwards from their location in the datatable which I had dumped to a text file.
In Excel I also tried the clear formatting function. But that didn't do the job.
How do I filter the input in the datatable for only ascii chars (33 to 127)?
What kind of string do I get when I turn the datatable (typeof(System.String)) column into a string. I don't seem to be able to identify the chars when I convert the string to an array of chars.
Any suggestions? Since these chars were unexpected I want to be sure the spreadsheet input is filtered to keep only the visible printing chars and blank spaces. The text being imported should be just text, no numeric data...
The unexpected char that appears in the text file when I dump the table is ÿ.

Does your origin fields contain carriage return ("\r"), newline ("\n"), tab ("\t") characters (Jon Skeet answering even outside stackoverflow) or NULL fields?
Try striping all those characters before sending the information to the database.

Thanks, Voyager, for your reply.
Not that I could tell. There were some nulls from empty cells but I had gotten rid of them. I tried to filter away any \r, \n, \t and other non printing chars. I've done that sort of thing in C many times, but I didn't seem to be able it in the C# program.
Finally I dropped down to the excel worksheet itself and with a vba macro (module or whatever it is called) got rid of all the offending chars ( less than 32 and greater than 126) There were a lot hanging around.
But all the data passes throught the C# program , one program vs many spreadsheets, so of course I'd prefer to fix the issue in Excel.

Related

Escape semicolon in interpolated string C#

I am writing data to .csv file and I need the below expression to be read correctly:
csvWriter.WriteLine($"Security: {sec.ID} ; Serial number: {sec.SecuritySerialNo}");
the semicolon in between is used to put the serial number in a separate cell.
The problem is that ID can also contain semicolons and mess up the data, therefore I need to escape it.
I have tried to use replace:
csvWriter.WriteLine($"Security: {sec.ID.Replace(";", "")} ; Serial number: {sec.SecuritySerialNo}");
though deleting semicolons is not what I want to achieve, I just want to escape them.
Let's emphasize again that the best way to create a CSV file is through a specialized CSV Parser library.
However, just to resolve the simple case presented by your question you could add double quotes around each field. This should be enough to explain to the Excel parser how to handle your fields.
So, export the fields in this way:
csvWriter.WriteLine($"\"Security: {sec.ID}\";\"Serial number: {sec.SecuritySerialNo}\"");
Notice that I have removed the blank space around the semicolon. It is important otherwise Excel will not parse the line correctly

Field and text delimiters within cells in csv files

This is likely a very basic question that I could not, despite trying, find a satsifying answer to. Feel free to skip to the question at the end if you aren't interested in the background.
The task:
I wish to create an easy localisation solution for my unity projects. After some initial research I concluded it would be best to use a .csv file read by a streamreader, so that translators would only ever have to interact with the csv table, where information is neatly organized.
The main problem:
Due to the nature of the text, I need to account for linebreaks and special characters in the actual fields. As such I could not use the normal readLine() method.
This I worked with by using Read() and checking if a linebreak is within a text delimiter bracket. But as I check for the text delimiter, I am afraid it might run into an un-escaped delimiter part of the normal in-cell text (since the normal text delimiter is quotation marks).
So I switched the delimiter to §. But now every time I open the file I have to re-enter § as a text delimiter in OpenOfficeCalc, probably due to encoding differences. Which is annoying but not the end of the world.
My question:
How does OpenOffice (or similar software) usually tell in-cell commas/quotation marks apart from the ones used as delimiters? If I knew that, I could probably incorporate a similar approach in my reading of the file.
I've tried to look at the files with NotePad++, revealing a difference in linebreaks (/r instead of /r/n) and obviously it's within a text delimiter bracket, but when it comes to how it seperates its delimiters from ones just entered in the text/field, I am drawing a blank.
Translation file in OpenOffice Calc:
Translation file in NotePad++, showing all characters:
I'd appreciate any insight or links on the topic.
From https://en.wikipedia.org/wiki/Comma-separated_values:
The CSV file format is not fully standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain commas or even embedded line breaks.
LibreOffice Calc has a reasonable way to handle these things.
Use LF for line breaks and CR at the end of each record. It seems your code already handles this.
Use quotes to delimit strings when needed. If the string contains one or more quotes, then duplicate the quote to make it literal.
From the example in your question, it looks like you told Calc not to use any quotes as string delimiters. Why did you do this? When I tried it, LibreOffice (or Apache OpenOffice) showed the fields in different columns after opening the file saved that way.
The following example CSV file has fields that contain commas, quotes and line breaks.
When viewed in Calc:
A B
--------- --
1 | 1,",2", 3
--------- --
2 | a c
| b
Calc correctly reads and saves the file as shown below. Settings when saving are Field delimiter , and String delimiter " which are the defaults.
"1,"",2"",",3[CR]
"a
b",c[CR]

CSV field with less than max cell limit of excel is getting truncated

We are trying to export data to a CSV file via C# code. Some of the fields are really long and exceeds the max character limit of an excel's cell can support (32,767) as per this site. So what we did is to truncate any data in excess of the limit. However, when we tried opening the CSV file in excel, it placed some bits of characters below even though the text length inside the cell didn't reach the limit. I have attached a sample CSV file. Kindly download and open in excel.enter link description here. The csv field has a character limit of 32,766 including the begin and end double quotes.
Excel appears to have a limit of 32,758 characters per cell when importing a CSV file, despite the documentation saying that the limit is 32,767 characters. In other words, having 32,759 or more characters in a cell will cause problems when the CSV is imported.
Escaped double quotes ("") only count as a single character for this purpose.
The two double quotes that surround the cell content (if needed) don't count toward the limit.
Unix style newlines (\n) count as one character while Windows style newlines (\r\n) count as two characters. Note that, even if you specify Unix style newlines (\n) in your code, some functions/languages may automatically convert them to Windows style newlines (\r\n), which may then cause some of your otherwise valid cells to exceed the limit. Here's how to avoid that issue in several different languages:
C#: Use \n instead of Environment.NewLine.
C++ (fopen): Open the file in binary mode using the b flag in the mode parameter: fopen("filename", "wb").
C++ (ofstream): Use binary mode when opening the file for writing: std::ofstream outfile("filename", std::ios_base::binary | std::ios_base::out).
Java: Use \n instead of System.lineSeparator().
Python (io.open): Use newline='\n' in your call to io.open.

Copy numeric codes to clipboard and paste to Excel without having them formatted as numbers

I have a .NET Windows Forms applications and I need to copy a list of 8-digit numeric codes into the clipboard to be pasted to Excel sheet.
string tabbedText = string.Join("\n", codesArray);
Clipboard.SetText(tabbedText);
The problem is that when a code begins with one or more zeros (ex. "00001234") it's pasted as number with the zeros trimmed.
Is there a way how to set clipboard text so that Excel accepts it as text?
I would treat this problem inside of Excel (and not in your application programaticaly). Format your cells to be treated as text, and then paste from clipboard. This way leading zeros are always pasted.
EDIT: This doesn't work in Excel, in that the apostrophe gets pasted in and shows up too. I'm leaving the answer here as an explicit statement that this approach won't help for Excel.
It does work for OpenOffice Calc though.
The standard way to 'tell' Excel to treat a string as a string is to prefix it with an apostrophe. Have you tried something like:
string tabbedText = "'" + string.Join("\n'", codesArray);
(note the extra apostrophe in there... it's a bit hard to see).
Of course, this may cause you issues if you're planning to use this value thereafter in Excel calculations but there are ways to handle that too.

When I attach details to Excel It is truncating 0 infront of the Zip Code

I am sending a email to prgrom manager and Attaching all the details filled by the applicant in a csv. when they receive the email they are missing a 0 in the Zipcode.
I am using C# and asp.net I placed a break point just before I write data into CSV and It is looking good with the 0. But when I receive the email and open the Excel When I look at Zipcode it is missing the 0.
Can anyone suggest me how to correct this issue?
Thanks
Append a ' symbol in front of your numbers (a # in the excel code itself):
C# : string zip = #"0066222";
Excel: `0066222
Excel will read it as text and preserve the format rather than a number (where it trims the leading 0's).
The data will need formatted like this:
="001",="002",="003"
Append a single quote character (') to the beginning of the zip code value before you write it to the Excel file. This will tell Excel to show the exact value in that field. Otherwise, Excel will interpret that field as being numeric, and trim leading zeroes.

Categories