Specifying a text delimiter in FileHelpers record class definition

Specifying a text delimiter in FileHelpers record class definition - c#

I am a new user for FileHelpers, trying to parse a comma delimited text file. The parser in Excel has the concept of BOTH a field delimiter (such as a comma) AND a text delimiter (such as double quotes) for fields that contain funky stuff you want to pass through to the receiving field in the structure. The package generating the file may or may not include the text delimiter depending on the content of the data in the field.
I have searched the documentation and this site, but have not found whether this is possible, although it seems to be a common function.
Is there a way to specify a text delimiter to the DelimitedFileEngine, the same way that you can define [DelimitedRecord(",")] to the class definition of the record you are parsing?

Related

Escape semicolon in interpolated string C#

I am writing data to .csv file and I need the below expression to be read correctly:
csvWriter.WriteLine($"Security: {sec.ID} ; Serial number: {sec.SecuritySerialNo}");
the semicolon in between is used to put the serial number in a separate cell.
The problem is that ID can also contain semicolons and mess up the data, therefore I need to escape it.
I have tried to use replace:
csvWriter.WriteLine($"Security: {sec.ID.Replace(";", "")} ; Serial number: {sec.SecuritySerialNo}");
though deleting semicolons is not what I want to achieve, I just want to escape them.

Let's emphasize again that the best way to create a CSV file is through a specialized CSV Parser library.
However, just to resolve the simple case presented by your question you could add double quotes around each field. This should be enough to explain to the Excel parser how to handle your fields.
So, export the fields in this way:
csvWriter.WriteLine($"\"Security: {sec.ID}\";\"Serial number: {sec.SecuritySerialNo}\"");
Notice that I have removed the blank space around the semicolon. It is important otherwise Excel will not parse the line correctly

Field and text delimiters within cells in csv files

This is likely a very basic question that I could not, despite trying, find a satsifying answer to. Feel free to skip to the question at the end if you aren't interested in the background.
The task:
I wish to create an easy localisation solution for my unity projects. After some initial research I concluded it would be best to use a .csv file read by a streamreader, so that translators would only ever have to interact with the csv table, where information is neatly organized.
The main problem:
Due to the nature of the text, I need to account for linebreaks and special characters in the actual fields. As such I could not use the normal readLine() method.
This I worked with by using Read() and checking if a linebreak is within a text delimiter bracket. But as I check for the text delimiter, I am afraid it might run into an un-escaped delimiter part of the normal in-cell text (since the normal text delimiter is quotation marks).
So I switched the delimiter to §. But now every time I open the file I have to re-enter § as a text delimiter in OpenOfficeCalc, probably due to encoding differences. Which is annoying but not the end of the world.
My question:
How does OpenOffice (or similar software) usually tell in-cell commas/quotation marks apart from the ones used as delimiters? If I knew that, I could probably incorporate a similar approach in my reading of the file.
I've tried to look at the files with NotePad++, revealing a difference in linebreaks (/r instead of /r/n) and obviously it's within a text delimiter bracket, but when it comes to how it seperates its delimiters from ones just entered in the text/field, I am drawing a blank.
Translation file in OpenOffice Calc:
Translation file in NotePad++, showing all characters:
I'd appreciate any insight or links on the topic.

From https://en.wikipedia.org/wiki/Comma-separated_values:
The CSV file format is not fully standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain commas or even embedded line breaks.
LibreOffice Calc has a reasonable way to handle these things.
Use LF for line breaks and CR at the end of each record. It seems your code already handles this.
Use quotes to delimit strings when needed. If the string contains one or more quotes, then duplicate the quote to make it literal.
From the example in your question, it looks like you told Calc not to use any quotes as string delimiters. Why did you do this? When I tried it, LibreOffice (or Apache OpenOffice) showed the fields in different columns after opening the file saved that way.
The following example CSV file has fields that contain commas, quotes and line breaks.
When viewed in Calc:
A B
--------- --
1 | 1,",2", 3
--------- --
2 | a c
| b
Calc correctly reads and saves the file as shown below. Settings when saving are Field delimiter , and String delimiter " which are the defaults.
"1,"",2"",",3[CR]
"a
b",c[CR]

How can I use periods in MailMerge Field Names?

I have a List of Merge Field names passed into my application, along with a Word Document. The Merge Fields include periods and underscores, in the form "Here.Is.An.Example.Merge_Field". These separators are required - the characters may be able to be changed, but they cannot be removed altogether.
Several of these Merge Fields are contained within a datasource that is created for the document. The datasource has no data save for the merge field names - the merge itself takes place elsewhere in the process.
I attach the document to the datasource as below:
WordApplication.MailMerge.OpenDataSource(DataFilePath);
This loads the Merge Fields into the menu as desired, but all the periods have gone. I now have "HereIsAnExampleMerge_Field", which causes issues at other points in my application.
How can I prevent Word from removing these characters?

I don't think you can, because AFAIK the merge field names are modified to be valid bookmark names, which have certain restrictions
(See, e.g. What are the limitations for bookmark names in Microsoft Word? for a discussion, except in this case I think Word will insert an "M" before an initial "_". Plus, names over 40 characters long are mangled to produce unique names.)
So what to do really depends on what issues you are facing.
You can retrieve the fieldnames that Word actually uses (i.e. its mangled names) from either ActiveDocument.MailMerge.DataSource.DataFields or .FieldNames. AFAIK these are in the same sequence as the fields in the data source, even though Word sometimes rearranges fields for display in the Edit Recipients dialog (e.g., it sorts field names that it considers to be address field names in a certain sequence). So that should allow you to match its field names with the originals.
Alternatively, if your code needs to insert { MERGEFIELD } fields in a Mail Merge Main Document and knows the sequence of the fields in the data source, you can use field numbers (1 for the first field retrieved by Word etc.), e.g. { MERGEFIELD 1 }. But beware, as I have never seen that facility documented anywhere by Microsoft.

Importing csv values having multiple comma in one column in Asp.net

I am having an issue with importing a CSV file. The problem arises when an address field has multiple comma seperated values e.g. home no, street no, town etc.
I tried to use http://forums.asp.net/t/1705264.aspx/1 this article but, the problem did not solved because of a single field containing multiple comma separated values.
Any idea or solution? because I didnt found any help
Thanks

Don't split the string yourself. Parsing CSV files is not trivial, and using str.Split(',') will give you a lot of headaches. Try using a more robust library like CsvHelper
- https://github.com/JoshClose/CsvHelper
If that doesn't work then the format of the file is probably incorrect. Check to make sure the text fields are properly quoted.

Do you control the format of the CSV file? You could see about changing it to qualify the values by surrounding them with double quotes (""). Another option is to switch to a different delimiter like tabs.
Barring that, if the address is always in the same format, you could read in the file, and then inspect each record and manually concatenate columns 2, 3, and 4.

Are the fields surrounded by quotation marks? If so, split on "," rather than just ,.
Is the address field at the beginning or end of a record? If so, you can ignore the first x commas (if at the beginning) or split only the correct number of fields (if at the end).
Ideally, if you have control of the source file's creation, you would change the delimiter of either the address sub-fields, or the record fields.

Using FileHelpers without a type

I have a CSV file that is being exported from another system whereby the column orders and definitions may change. I have found that FileHelpers is perfect for reading csv files, but it seems you cannot use it unless you know the ordering of the columns before compiling the application. I want to know if its at all possible to use FileHelpers in a non-typed way. Currently I am using it to read the file but then everything else I am doing by hand, so I have a class:
[DelimitedRecord(",")]
public class CSVRow
{
public string Content { get; set; }
}
Which means each row is within Content, which is fine, as I have then split the row etc, however I am now having issues with this method because of commas inherent within the file, so a line might be:
"something",,,,0,,1,,"something else","","",,,"something, else"
My simple split on commas on this string doesnt work as there is a comma in `"something, else" which gets split. Obviously here is where something like FileHelpers comes in real handy, parsing these values and taking the quote marks into consideration. So is it possible to use FileHelpers in this way, without having a known column definition, or at least being able to pass it a csv string and get a list of values back, or is there any good library that does this?

You can use FileHelpers' RunTime records if you know (or can deduce) the order and definitions of the columns at runtime.
Otherwise, there are lots of questions about CSV libraries, eg Reading CSV files in C#
Edit: updated link. Original is archived here

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Specifying a text delimiter in FileHelpers record class definition - c#

Related

Escape semicolon in interpolated string C#

Field and text delimiters within cells in csv files

How can I use periods in MailMerge Field Names?

Importing csv values having multiple comma in one column in Asp.net

Using FileHelpers without a type

Categories

Resources