I have a CSV file that is being exported from another system whereby the column orders and definitions may change. I have found that FileHelpers is perfect for reading csv files, but it seems you cannot use it unless you know the ordering of the columns before compiling the application. I want to know if its at all possible to use FileHelpers in a non-typed way. Currently I am using it to read the file but then everything else I am doing by hand, so I have a class:
[DelimitedRecord(",")]
public class CSVRow
{
public string Content { get; set; }
}
Which means each row is within Content, which is fine, as I have then split the row etc, however I am now having issues with this method because of commas inherent within the file, so a line might be:
"something",,,,0,,1,,"something else","","",,,"something, else"
My simple split on commas on this string doesnt work as there is a comma in `"something, else" which gets split. Obviously here is where something like FileHelpers comes in real handy, parsing these values and taking the quote marks into consideration. So is it possible to use FileHelpers in this way, without having a known column definition, or at least being able to pass it a csv string and get a list of values back, or is there any good library that does this?
You can use FileHelpers' RunTime records if you know (or can deduce) the order and definitions of the columns at runtime.
Otherwise, there are lots of questions about CSV libraries, eg Reading CSV files in C#
Edit: updated link. Original is archived here
Related
I am a new user for FileHelpers, trying to parse a comma delimited text file. The parser in Excel has the concept of BOTH a field delimiter (such as a comma) AND a text delimiter (such as double quotes) for fields that contain funky stuff you want to pass through to the receiving field in the structure. The package generating the file may or may not include the text delimiter depending on the content of the data in the field.
I have searched the documentation and this site, but have not found whether this is possible, although it seems to be a common function.
Is there a way to specify a text delimiter to the DelimitedFileEngine, the same way that you can define [DelimitedRecord(",")] to the class definition of the record you are parsing?
I am having an issue with importing a CSV file. The problem arises when an address field has multiple comma seperated values e.g. home no, street no, town etc.
I tried to use http://forums.asp.net/t/1705264.aspx/1 this article but, the problem did not solved because of a single field containing multiple comma separated values.
Any idea or solution? because I didnt found any help
Thanks
Don't split the string yourself. Parsing CSV files is not trivial, and using str.Split(',') will give you a lot of headaches. Try using a more robust library like CsvHelper
- https://github.com/JoshClose/CsvHelper
If that doesn't work then the format of the file is probably incorrect. Check to make sure the text fields are properly quoted.
Do you control the format of the CSV file? You could see about changing it to qualify the values by surrounding them with double quotes (""). Another option is to switch to a different delimiter like tabs.
Barring that, if the address is always in the same format, you could read in the file, and then inspect each record and manually concatenate columns 2, 3, and 4.
Are the fields surrounded by quotation marks? If so, split on "," rather than just ,.
Is the address field at the beginning or end of a record? If so, you can ignore the first x commas (if at the beginning) or split only the correct number of fields (if at the end).
Ideally, if you have control of the source file's creation, you would change the delimiter of either the address sub-fields, or the record fields.
I have a CSV file I need to process which is a bit of a nightmare. Esentially it is the following
"Id","Name","Description"
"1","Test1","Test description text"
"2","Test2","<doc><style>body{font-family:"Calibri","sans-serif";}</style><p class="test_class"
name="test_name">Lots of word xdoc content here.</p></doc>"
"guid-xxxx-xxxx-xxxx-xxxx","Test3","Test description text 3"
I'm using the File Helpers library to process the CSV rather than reinvent the wheel. However, due to the description field containing unescaped Word xdoc xml which contains quotes it's getting rather confused when it comes to the start and end points of each record.
The following is an example mapping class.
[DelimitedRecord(","), IgnoreFirst(1), IgnoreEmptyLines()]
public class CSVDoc
{
#region Properties
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
public string Id;
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
public string Name;
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
public string Description;
[FieldQuoted('"', QuoteMode.AlwaysQuoted), FieldTrim(TrimMode.Both)]
}
I considered (despite my hate of regex for this kind of task) replacing all " with ' and then using ((?<=(^|',))'|'(?=($|,'))) pattern to replace all ' with " at the start and end of lines and where they are formatted ','. However, the dirty file contains some lines which end with a " and some css style attributes which are formatted ","
So now I'm left scratching my head trying to figure out how to do this and how it can be automated.
Any ideas?
You're going to have to re-invent the wheel, because that's not valid CSV or indeed a reasonable file at all - it doesn't have any sort of provably consistent escaping rules (e.g. we don't know if the plain-text columns are escaped correctly or not).
Your best bet is to ask the person producing this to fix the bug, it should be e.g.:
"2","Test2","<doc><style>body{font-family:""Calibri"",""sans-serif"";}</style><p class=""test_class""
name=""test_name"">Lots of word xdoc content here.</p></doc>"
Which your parser should handle fine, and which should not be hard for them to produce in a simple and efficient manner.
Failing that, you'll have to hand-code the parser to:
Read a line.
Check for unescaped " (any "that isn't followed by a " a , or whitespace.
If none found, parse as CSV.
If any found, parse as this horrible thing until you hit the line ending with "
It may be easier to look for < if that is consistently not used in the other lines. Or perhaps for <doc if it consistently identifies the correct rows.
If you don't mind doing some pre-processing before, you can change the first and second "," to "|" and then use FileHelper to parse the file normally (Assuming you don't have | in the last column where there are HTML tags)
The pre-processing could be something like (Simple pseudo code) :
var sb = new StringBuilder()
var regex = new Regex("\",\"");
foreach(string line in textFileLines)
{
sb.AppendLine(regex.Replace(line , "\"|\"", 2));
}
I worked on the CSV-1203 File Format standard a few months ago, so the first thing to realise is that you're not dealing with a CSV file - even though it's named "xyz.CSV".
As said by others here, it will be easier to write your own reader, they're not too difficult. I too have a hatred of everything regex, but the good news is you can code any solution without ever using it.
A couple of things: There's a really weird thing Excel does to CSV files that begin with the two capital letters ID (without quotes). It thinks your CSV is a corrupted SYLK file! Try it.
For details of this issue and a detailed CSV File Format specification, please refer to http://mastpoint.curzonnassau.com/csv-1203
I am currently using the FileHelpers library (v2.0.0.0) to parse a CSV file. The CSV file is mapped to a class that has a handful of public properties, let's say there are N. The problem is that, by default, FileHelpers doesn't seem to correctly handle cases where the user specifies a CSV file that has more than N-1 commas. The remaining commas just get appended to the last property value.
I figured this must be configurable via FileHelpers' attributes, but I didn't see anything that would ignore fields that don't have a matching property in the record.
I looked into the RecordConditions, but using something like ExcludeIfEnds(",") looks like it will skip the line entirely if it ends with a comma, but I just want them stripped.
It's possible that my only recourse is to pre-process the file and strip any trailing commas, which is totally fine, but I wanted to know if FileHelpers can do this as well, and perhaps I'm just not seeing it in the docs.
Just an idea for a hack / workaround: you could create a property called "ExtraCommas" and add it to your class, so that extra commas are serialized there and not in the real properties of your object...
If the number of commas varies, I think you are out of luck and would have to do post processing. However you can set blank fields in your class if there are a fixed amount.
[FieldOrder(5)]
public string Blank1;
[FieldOrder(6)]
public string Blank2;
This doesn't really ever bite me because I don't use a FileHelpers class as a business class, I use it as an object to build the business class from. I store it for auditing. I think at one point I played around with making the fields for the Blanks private, not sure how that turned out.
Here is a custom method that you can use, it might not be the best solution, but it will solve the last comma problem. The code could be more optimized for sure, this is just to give you the idea of how to get around this kind of problem.
int main(){
StreamReader sr = new StreamReader(#"C:\Users\musab.shaheed\Desktop\csv.csv");
var lineCount=File.ReadLines(#"C:\Users\musab.shaheed\Desktop\csv.csv").Count();
for (int i = 0; i < lineCount;i++ ) {
String fileText = sr.ReadLine();
fileText=fileText.Substring(0, fileText.Length - 1);
//store your data in here
Console.WriteLine(fileText);
};
sr.Close();
}
I have to build a C# program that makes CSV files and puts long numbers (as string in my program). The problem is, when I open this CSV file in Excel the numbers appear like this:
1234E+ or 1234560000000 (the end of the number is 0)
How I retain the formatting of the numbers? If I open the file as a text file, the numbers are formatted correctly.
Thanks in advance.
As others have mentioned, you can force the data to be a string. The best way for that was ="1234567890123". The = makes the cell a formula, and the quotation marks make the enclosed value an Excel string literal. This will display all the digits, even beyond Excel's numeric precision limit, but the cell (generally) won't be able to be used directly in numeric calculations.
If you need the data to remain numeric, the best way is probably to create a native Excel file (.xls or .xlsx). Various approaches for that can be found in the solutions to this related Stack Overflow question.
If you don't mind having thousands separators, there is one other trick you can use, which is to make your C# program insert the thousands separators and surround the value in quotes: "1,234,567,890,123". Do not include a leading = (as that will force it to be a string). Note that in this case, the quotation marks are for protecting the commas in the CSV, not for specifying an Excel string literal.
Format those long numbers as strings by putting a ' (apostrophe) in front or making a formula out of it: ="1234567890123"
You can't. Excel stores numbers with fifteen digits of precision. If you don't mind not having the ability to perform calculations on the numbers from within Excel, you can store them as Text, and all of the digits will display.
When I generate data to imported into Excel, I do not generate a CSV file if I want control over how the data are displayed. Instead, I write out an Excel file where the properties of the cells are set appropriately. I do not know if there is a library out there that would do that for you in C# without requiring Excel to be installed on the machine generating the files, but it is something to look into.
My two cents:
I think it's important to realize there is a difference between "Data" and "Formatting". In this example you are kind of trying to store both in a data-only file. This will, as you can tell from other answers, change the nature of the data. (In other words cause it to be converted to a string. A CSV file is a data only file. You can do some tricks here and there to merge formatting in with data, but to my way of thinking this essentially corrupts the data by merging it with non-data values: ie: "Formatting".
If you really need to be able to store formatting information I suggest that, if you have time to develop it out, you switch to a file type capable of storing formatting info separately from the data. It sounds like this problem would be a good candidate for a XML Spreadsheet solution. In this way you can not only specify your data, but also it's type and any formatting you choose to use.