I have a csv file from which i am getting the data into a table.
Example:
"ABC",1,"Apple"
The requirement is that the strings will be inside the quotas " " and integer will be without quotes.
The above line will split into three columns.
i am using stream reader class to split the line into columns using line.split(','). It was working fine unfortunately i got a record in a file where there is a comma in between the string quotes like these
"ABC,DEF,ghi",2,"Orange".
So instead of 3 columns now they are acting as five columns and all the conversion are failing.
Can any one help me in writing the script in C# which will replace the comma between the quotas into semicolon and don't touch the comma in between the columns.
Thank you.
Looks like your CSV might be RFC 4180 compliant. Use an RFC 4180 parser. Many of those exist. Check this one: http://www.codeproject.com/KB/database/CsvReader.aspx
This question is answered here:
Java: splitting a comma-separated string but ignoring commas in quotes
You could use the same regular expression ",(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
and use the C# method Regex.Split().
Related
I am writing data to .csv file and I need the below expression to be read correctly:
csvWriter.WriteLine($"Security: {sec.ID} ; Serial number: {sec.SecuritySerialNo}");
the semicolon in between is used to put the serial number in a separate cell.
The problem is that ID can also contain semicolons and mess up the data, therefore I need to escape it.
I have tried to use replace:
csvWriter.WriteLine($"Security: {sec.ID.Replace(";", "")} ; Serial number: {sec.SecuritySerialNo}");
though deleting semicolons is not what I want to achieve, I just want to escape them.
Let's emphasize again that the best way to create a CSV file is through a specialized CSV Parser library.
However, just to resolve the simple case presented by your question you could add double quotes around each field. This should be enough to explain to the Excel parser how to handle your fields.
So, export the fields in this way:
csvWriter.WriteLine($"\"Security: {sec.ID}\";\"Serial number: {sec.SecuritySerialNo}\"");
Notice that I have removed the blank space around the semicolon. It is important otherwise Excel will not parse the line correctly
This is likely a very basic question that I could not, despite trying, find a satsifying answer to. Feel free to skip to the question at the end if you aren't interested in the background.
The task:
I wish to create an easy localisation solution for my unity projects. After some initial research I concluded it would be best to use a .csv file read by a streamreader, so that translators would only ever have to interact with the csv table, where information is neatly organized.
The main problem:
Due to the nature of the text, I need to account for linebreaks and special characters in the actual fields. As such I could not use the normal readLine() method.
This I worked with by using Read() and checking if a linebreak is within a text delimiter bracket. But as I check for the text delimiter, I am afraid it might run into an un-escaped delimiter part of the normal in-cell text (since the normal text delimiter is quotation marks).
So I switched the delimiter to §. But now every time I open the file I have to re-enter § as a text delimiter in OpenOfficeCalc, probably due to encoding differences. Which is annoying but not the end of the world.
My question:
How does OpenOffice (or similar software) usually tell in-cell commas/quotation marks apart from the ones used as delimiters? If I knew that, I could probably incorporate a similar approach in my reading of the file.
I've tried to look at the files with NotePad++, revealing a difference in linebreaks (/r instead of /r/n) and obviously it's within a text delimiter bracket, but when it comes to how it seperates its delimiters from ones just entered in the text/field, I am drawing a blank.
Translation file in OpenOffice Calc:
Translation file in NotePad++, showing all characters:
I'd appreciate any insight or links on the topic.
From https://en.wikipedia.org/wiki/Comma-separated_values:
The CSV file format is not fully standardized. The basic idea of separating fields with a comma is clear, but that idea gets complicated when the field data may also contain commas or even embedded line breaks.
LibreOffice Calc has a reasonable way to handle these things.
Use LF for line breaks and CR at the end of each record. It seems your code already handles this.
Use quotes to delimit strings when needed. If the string contains one or more quotes, then duplicate the quote to make it literal.
From the example in your question, it looks like you told Calc not to use any quotes as string delimiters. Why did you do this? When I tried it, LibreOffice (or Apache OpenOffice) showed the fields in different columns after opening the file saved that way.
The following example CSV file has fields that contain commas, quotes and line breaks.
When viewed in Calc:
A B
--------- --
1 | 1,",2", 3
--------- --
2 | a c
| b
Calc correctly reads and saves the file as shown below. Settings when saving are Field delimiter , and String delimiter " which are the defaults.
"1,"",2"",",3[CR]
"a
b",c[CR]
I am trying to parse a CSV file using Lumenworks CsvReader. Each data point is wrapped in double quotes, however, some values contain unescaped double quotes within the data, and other values contain commas within the data. The issue I am facing is that when I parse this with CsvReader, extra columns are ending up in my file due to Lumenworks seeing these characters as delimiters.
As you will read below, I've handled the issue with unescaped double quotes using a known solution, but this then results in issue of extra columns being generated for the data with commas inside.
Example: 2 columns (each wrapped in quotes), with unescaped double quotes in one of the data points
"Name","Description"
"Bob","I am a "cool" guy"
When attempting to perform csvReader.ReadNextRecord(), instead of splitting this up into 2 columns, it splits it up into 4 columns:
Bob
I am a
cool
guy
I've used the solution provided in Reading csv having double quotes with lumenwork csv reader and it works quite well!
This is how I've implemented it:
Char quotingCharacter = '\0' ;
Char escapeCharacter = quotingCharacter;
Char delimiter = ',';
using (CsvReader csvReader = new CsvReader(reader, false, delimiter, quotingCharacter, escapeCharacter, quotingCharacter, ValueTrimmingOptions.All))
{....
csvReader.ReadNextRecord();
...}
HOWEVER, when I implement this fix for my CSV file, it then creates the same issue with columns that have commas inside:
Example: 2 columns (each wrapped in quotes), with commas in one of the data points, after implementing the double quote workaround
"Name","Description"
"Bob","I am related to Suzie, Betty, and Tommy"
With the aforementioned solution implemented, the csvReader now does not know to read the commas as part of the data. Instead of 2 columns, I am left with 4 columns:
Bob
I am related to Suzie
Betty
and Tommy
So the question is: how do I allow Lumenworks CsvReader to work around this bad data and have it interpret unescaped double quotes as the data itself? How can this be done in a way that doesn't then cause the commas within the data to be interpreted as the delimitation?
I am having an issue with importing a CSV file. The problem arises when an address field has multiple comma seperated values e.g. home no, street no, town etc.
I tried to use http://forums.asp.net/t/1705264.aspx/1 this article but, the problem did not solved because of a single field containing multiple comma separated values.
Any idea or solution? because I didnt found any help
Thanks
Don't split the string yourself. Parsing CSV files is not trivial, and using str.Split(',') will give you a lot of headaches. Try using a more robust library like CsvHelper
- https://github.com/JoshClose/CsvHelper
If that doesn't work then the format of the file is probably incorrect. Check to make sure the text fields are properly quoted.
Do you control the format of the CSV file? You could see about changing it to qualify the values by surrounding them with double quotes (""). Another option is to switch to a different delimiter like tabs.
Barring that, if the address is always in the same format, you could read in the file, and then inspect each record and manually concatenate columns 2, 3, and 4.
Are the fields surrounded by quotation marks? If so, split on "," rather than just ,.
Is the address field at the beginning or end of a record? If so, you can ignore the first x commas (if at the beginning) or split only the correct number of fields (if at the end).
Ideally, if you have control of the source file's creation, you would change the delimiter of either the address sub-fields, or the record fields.
So Im reading a csv file and splitting the string with "," as the deliminator
but some of them have quotes as to not split the specific field because it has a comma in it.
1530,Pasadena CA,"2008, 05/01","2005, 12/14"
with just comma it would be:
1530
Pasadena CA
"2008
05/01"
"2005
12/14"
I need it to take commas into consideration when splitting so its like this
1530
Pasadena CA
"2008 05/01"
"2005 12/14"
Take a look at this page for a library that offers quick and easy CSV reading.
While it still may be a new reference, there is a class within the Visual Basic assemblies that should handle this well. At least then you know it's a part of the framework. You can find details here: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx