I have a requirement to output some of our ERP data to a very specific csv format with an exact number of fields per record. Most of which we won't be providing at this time (Or have default values). To support future changes, I decided to write out the CSV format into a custom class of strings (All are strings) and readonly each of the strings we are not currently utilizing and default in the values that should go into those, most are String.Empty. So the Class looks something like this:
private class CustomClass
{
public string field1 = String.Empty;
public readonly string field2 = String.Empty; //Not going to be used
public string field3 = String.Empty;
public readonly string field4 = "N/A"; //Not going to be used
...
}
Now, after I populate the used fields, I need to take this data and export a specifically formatted comma delimited string. So using other posts on StackOverflow I came up with the following function to add to the class:
public string ToCsvFields()
{
StringBuilder sb = new StringBuilder();
foreach (var f in typeof(CustomClass).GetFields())
{
if (sb.Length > 0)
sb.Append(",");
var x = f.GetValue(this);
if (x != null)
sb.Append("\"" + x.ToString() + "\"");
}
return sb.ToString();
}
This works and gives me the exact CSV output I need for each Line when I call CustomClass.ToCsvFields(), and makes it pretty easy to maintain if the consumer of the CSV changes their column definition. But this line in-particular makes me feel like something could go wrong with Production code: var x = f.GetValue(this);
I understand what it is doing, but I generally shy away from "this" in my code; am I just being paranoid and this is totally acceptable code for this purpose?
Related
Is there an easier way to turn a string into a CSV compatible value, eg. escaping, quoting as necessary.
Currently I have this:
public static object ToCsv(object obj, CultureInfo cultureInfo, string delimiter)
{
CsvConfiguration config = new(cultureInfo) { Encoding = Encoding.UTF8, Delimiter = delimiter, HasHeaderRecord = false, ShouldQuote = (_) => true };
using var memoryStream = new MemoryStream();
using var streamWriter = new StreamWriter(memoryStream);
using var csvWriter = new CsvWriter(streamWriter, config);
csvWriter.WriteField(obj);
csvWriter.Flush();
memoryStream.Position = 0;
return Encoding.UTF8.GetString(memoryStream.ToArray());
}
Not only does this code seem like an overkill, I am also concerend about the performance.
The code is used in a copy&paste event where a DataGrid fires an event for each individual cell, and I need to parse each individual cell, depending on the amount of rows/columns the user has selected this piece of code could be called thousand of times. (for each cell)
CsvWriter and CsvConfiguration are part of the CsvHelper library: https://github.com/JoshClose/CsvHelper
This method is NOT designed to be used to build each field in a possible CSV file cell by cell. It is a once off to CSV parse a single value. It is a good implementation of this as it uses the same logic that a full file approach would use, but it would be a very poor implementation to use to write a whole file, or to use recursively for many fields in a file.
For the readers at home, if you need to use this library to write a file please consult the documentation: Writing a CV file
using (var writer = new StreamWriter("path\\to\\file.csv"))
using (var csv = new CsvWriter(writer, CultureInfo.InvariantCulture))
{
csv.WriteRecords(records);
}
To adequately judge/assess performance, it would be worth showing your code that is calling this method. As with the code above, to build a file you would setup the streams and config once for the file, then use WriteRecords which will iterate over your objects in the list and internally call csvWriter.WriteField(value); for each property in those objects.
Side note: that method really should declare the returning type a string and not an object.
The code is used in a copy & paste event where a DataGrid fires an event for each individual cell, and I need to parse each individual cell, depending on the amount of rows/columns the user has selected this piece of code could be called thousand of times. (for each cell)
If performance is a problem, do not try to handle this on a cell by cell basis, of alternatively give your user an alternate way to paste in a larger set of data that you can parse into a CSV file and then programmaticallyy assign to the underlying data.
As you are using 3rd party libraries (Telerik and CsvHelper) it is worth consulting their forums for specific advice on how to intercept a paste event for a bulk paste without being forced to handle the cells individually.
That being said, we can improve the performance by taking some of the internals from CsvHelper, not that you have specified that all fields should be quoted with the ShouldQuote = (_) => true so we can simply to this:
public static string ToCsv(object obj, CultureInfo cultureInfo, string delimiter, string escapedDelimiter, bool alwaysQuote = true)
{
var field = String.Format(cultureInfo, "{0}", obj);
if (alwaysQuote || field.Contains(delimiter))
{
field = field. Replace(delimiter, escapedDelimiter);
return delimiter + field + delimiter;
}
return field;
}
At this level, when we are only dealing with 1 individual value at a time, simple string replace is likely to be the same or more efficient than a Regular Expression solution.
This code was de-constructed from CsvHelper.WriteField
I know you mention CsvHelper, but here is a method I put together to build a csv "cell" using StringBuilder
/// <summary>
/// StringBuilder Extension method - Escape cells, as they may potentially contain reserved characters
/// </summary>
/// <param name="sb">StringBuilder that is assembling the csv string</param>
/// <param name="val">Value string to be persisted to the cell</param>
/// <returns>StringBuilder, with the escaped data cell appended</returns>
internal static StringBuilder EscapeCell(this StringBuilder sb, string val)
{
if (string.IsNullOrWhiteSpace(val)) return sb;
//no need to escape if does not contain , " \r or \n
if (!val.Contains(",") && !val.Contains("\"") && !val.Contains("\r") && !val.Contains("\n"))
{
sb.Append(val);
return sb;
}
//surround in quotes + any internal quotes need to be doubled -> ex.,"this is an ""example"" of an escaped cell",
string escaped = val[0] == '\"'
? val.Substring(1, val.Length - 2).Replace("\"", "\"\"")
: val.Replace("\"", "\"\"");
sb.Append('\"').Append(escaped).Append('\"');
return sb;
}
The idea is that you want to escape the entire cell if it has a special char that may break CSV structure, and any internal " needs to be normalized as ""
Using StringBuilder throughout means building the CSV string is as fast as can be. Then write the CSV string to file as needed
I'm trying to get certain strings out of a text file and put it in a variable.
This is what the structure of the text file looks like keep in mind this is just one line and each line looks like this and is separated by a blank line:
Date: 8/12/2013 12:00:00 AM Source Path: \\build\PM\11.0.64.1\build.11.0.64.1.FileServerOutput.zip Destination Path: C:\Users\Documents\.NET Development\testing\11.0.64.1\build.11.0.55.5.FileServerOutput.zip Folder Updated: 11.0.64.1 File Copied: build.11.0.55.5.FileServerOutput.zip
I wasn't entirely too sure of what to use for a delimiter for this text file or even if I should be using a delimiter so it could be subjected to change.
So just a quick example of what I want to happen with this, is I want to go through and grab the Destination Path and store it in a variable such as strDestPath.
Overall the code I came up with so far is this:
//find the variables from the text file
string[] lines = File.ReadAllLines(GlobalVars.strLogPath);
Yeah not much, but I thought perhaps if I just read one line at at a time and tried to search for what I was looking for through that line but honestly I'm not 100% sure if I should stick with that way or not...
If you are skeptical about how large your file is, you should come up using ReadLines which is deferred execution instead of ReadAllLines:
var lines = File.ReadLines(GlobalVars.strLogPath);
The ReadLines and ReadAllLines methods differ as follows:
When you use ReadLines, you can start enumerating the collection of strings before the whole collection is returned; when you use ReadAllLines, you must wait for the whole array of strings be returned before you can access the array. Therefore, when you are working with very large files, ReadLines can be more efficient.
As weird as it might sound, you should take a look to log parser. If you are free to set the file format you could use one that fits with log parser and, believe me, it will make your life a lot more easy.
Once you load the file with log parse you can user queries to get the information you want. If you don't care about using interop in your project you can even add a com reference and use it from any .net project.
This sample reads a HUGE csv file a makes a bulkcopy to the DB to perform there the final steps. This is not really your case, but shows you how easy is to do this with logparser
COMTSVInputContextClass logParserTsv = new COMTSVInputContextClass();
COMSQLOutputContextClass logParserSql = new COMSQLOutputContextClass();
logParserTsv.separator = ";";
logParserTsv.fixedSep = true;
logParserSql.database = _sqlDatabaseName;
logParserSql.server = _sqlServerName;
logParserSql.username = _sqlUser;
logParserSql.password = _sqlPass;
logParserSql.createTable = false;
logParserSql.ignoreIdCols = true;
// query shortened for clarity purposes
string SelectPattern = #"Select TO_STRING(UserName),TO_STRING(UserID) INTO {0} From {1}";
string query = string.Format(SelectPattern, _sqlTable, _csvPath);
logParser.ExecuteBatch(query, logParserTsv, logParserSql);
LogParser in one of those hidden gems Microsoft has and most people don't know about. I have use to read iis logs, CSV files, txt files, etc. You can even generate graphics!!!
Just check it here http://support.microsoft.com/kb/910447/en
Looks like you need to create a Tokenizer. Try something like this:
Define a list of token values:
List<string> gTkList = new List<string>() {"Date:","Source Path:" }; //...etc.
Create a Token class:
public class Token
{
private readonly string _tokenText;
private string _val;
private int _begin, _end;
public Token(string tk, int beg, int end)
{
this._tokenText = tk;
this._begin = beg;
this._end = end;
this._val = String.Empty;
}
public string TokenText
{
get{ return _tokenText; }
}
public string Value
{
get { return _val; }
set { _val = value; }
}
public int IdxBegin
{
get { return _begin; }
}
public int IdxEnd
{
get { return _end; }
}
}
Create a method to Find your Tokens:
List<Token> FindTokens(string str)
{
List<Token> retVal = new List<Token>();
if (!String.IsNullOrWhitespace(str))
{
foreach(string cd in gTkList)
{
int fIdx = str.IndexOf(cd);
if(fIdx > -1)
retVal.Add(cd,fIdx,fIdx + cd.Length);
}
}
return retVal;
}
Then just do something like this:
foreach(string ln in lines)
{
//returns ordered list of tokens
var tkns = FindTokens(ln);
for(int i=0; i < tkns.Length; i++)
{
int len = (i == tkns.Length - 1) ? ln.Length - tkns[i].IdxEnd : tkns[i+1].IdxBegin - tkns[i].IdxEnd;
tkns[i].value = ln.Substring(tkns[i].IdxEnd+1,len).Trim();
}
//Do something with the gathered values
foreach(Token tk in tkns)
{
//stuff
}
}
I have some .csv files which I am parsing before storing in database.
I would like to make application more robust, and perform validation upon the .csv files before save in the database.
So I am asking you guys if you have some good links, or code examples, patterns, or advice on how to do this?
I will paste an example of my .csv file below. The different data fields in the .csv file are separated by tabs. Each new row of data is on a new line.
I have been thinking a little about the things I should validate against and came up with the list below (I am very open for other suggestions, in case you have anything which you think should be added to the list?)
Correct file encoding.
That file is not empty.
Correct number of lines/columns.
correct number/text/date formats.
correct number ranges.
This is how my .csv file looks like (file with two lines, data on one line is separated by tabs).
4523424 A123456 GT-P1000 mobile phone Samsung XSD1234 135354191325234
345353 A134211 A8181 mobile phome HTC S4112-ad3 111911911932343
The string representation of above looks like:
"4523424\tA123456\tGT-P1000\tmobile phone\tSamsung\tXSD1234\t135354191325234\r
\n345353\tA134211\tA8181\tmobile phome\tHTC\tS4112-ad3\t111911911932343\r\n"
So do you have any good design, links, patterns, code examples, etc. on how to do this in C#?
I do like this:
Create a class to hold each parsed line with expected type
internal sealed class Record {
public int Field1 { get; set; }
public DateTime Field2 { get; set; }
public decimal? PossibleEmptyField3 { get; set; }
...
}
Create a method that parses a line into the record
public Record ParseRecord(string[] fields) {
if (fields.Length < SomeLineLength)
throw new MalformadLineException(...)
var record = new Record();
record.Field1 = int.Parse(fields[0], NumberFormat.None, CultureInvoice.InvariantCulture);
record.Field2 = DateTime.ParseExact(fields[1], "yyyyMMdd", CultureInvoice.InvariantCulture);
if (fields[2] != "")
record.PossibleEmptyField3 = decimal.Parse(fields[2]...)
return record;
}
Create a method parsing the entire file
public List<Record> ParseStream(Stream stream) {
var tfp = new TextFileParser(stream);
...
try {
while (!tfp.EndOfData) {
records.Add(ParseRecord(tfp.ReadFields());
}
}
catch (FormatException ex) {
... // show error
}
catch (MalformadLineException ex) {
... // show error
}
return records;
}
And then I create a number of methods validating the fields
public void ValidateField2(IEnumerable<Record> records) {
foreach (var invalidRecord in records.Where(x => x.Field2 < DateTime.Today))
... // show error
}
I have tried various tools but since the pattern is straight forward they don't help much.
(You should use a tool to split the line into fields)
You can use FileHelpers a free/open source .Net library to deal with CSV and many other file formats.
adrianm and Nipun Ambastha
Thank you for your response to my question.
I solved my problem by writing a solution to validate my .csv file myself.
It's quite possible a more elegant solution could be made by making use of adrianm's code, but I didn't do that, but I am encouraging to give adrianm's code a look.
I am validating the list below.
Empty file
new FileInfo(dto.AbsoluteFileName).Length == 0
Wrong formatting of file lines.
string[] items = line.Split('\t');
if (items.Count() == 20)
Wrong datatype in line fields.
int number;
bool isNumber = int.TryParse(dataRow.ItemArray[0].ToString(), out number);
Missing required line fields.
if (dataRow.ItemArray[4].ToString().Length < 1)
To work through the contents of the .csv file I based my code on this code example:
http://bytes.com/topic/c-sharp/answers/256797-reading-tab-delimited-file
Probably you should take a look to
http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader
We have been using this in our projects, its quite robust and does what it says.
I have a .txt file which has about 500k entries, each separated by new line. The file size is about 13MB and the format of each line is the following:
SomeText<tab>Value<tab>AnotherValue<tab>
My problem is to find a certain "string" with the input from the program, from the first column in the file, and get the corresponding Value and AnotherValue from the two columns.
The first column is not sorted, but the second and third column values in the file are actually sorted. But, this sorting is of no good use to me.
The file is static and does not change. I was thinking to use the Regex.IsMatch() here but I am not sure if that's the best approach here to go line by line.
If the lookup time would increase drastically, I could probably go for rearranging the first column (and hence un-sorting the second & third column). Any suggestions on how to implement this approach or the above approach if required?
After locating the string, how should I fetch those two column values?
EDIT
I realized that there will be quite a bit of searches in the file for atleast oe request by the user. If I have an array of values to be found, how can I return some kind of dictionary having a corresponding values of found matches?
Maybe with this code:
var myLine = File.ReadAllLines()
.Select(line => line.Split(new [] {' ', '\t'}, SplitStringOptions.RemoveEmptyEntries)
.Single(s => s[0] == "string to find");
myLine is an array of strings that represents a row. You may also use .AsParallel() extension method for better performance.
How many times do you need to do this search?
Is the cost of some pre-processing on startup worth it if you save time on each search?
Is loading all the data into memory at startup feasible?
Parse the file into objects and stick the results into a hashtable?
I don't think Regex will help you more than any of the standard string options. You are looking for a fixed string value, not a pattern, but I stand to be corrected on that.
Update
Presuming that the "SomeText" is unique, you can use a dictionary like this
Data represents the values coming in from the file.
MyData is a class to hold them in memory.
public IEnumerable<string> Data = new List<string>() {
"Text1\tValue1\tAnotherValue1\t",
"Text2\tValue2\tAnotherValue2\t",
"Text3\tValue3\tAnotherValue3\t",
"Text4\tValue4\tAnotherValue4\t",
"Text5\tValue5\tAnotherValue5\t",
"Text6\tValue6\tAnotherValue6\t",
"Text7\tValue7\tAnotherValue7\t",
"Text8\tValue8\tAnotherValue8\t"
};
public class MyData {
public String SomeText { get; set; }
public String Value { get; set; }
public String AnotherValue { get; set; }
}
[TestMethod]
public void ParseAndFind() {
var dictionary = Data.Select(line =>
{
var pieces = line.Split(new char[] { '\t' }, StringSplitOptions.RemoveEmptyEntries);
return new MyData {
SomeText = pieces[0],
Value = pieces[1],
AnotherValue = pieces[2],
};
}).ToDictionary<MyData, string>(dat =>dat.SomeText);
Assert.AreEqual("AnotherValue3", dictionary["Text3"].AnotherValue);
Assert.AreEqual("Value7", dictionary["Text7"].Value);
}
hth,
Alan
var firstFoundLine = File.ReadLines("filename").FirstOrDefault(s => s.StartsWith("string"));
if (firstFoundLine != "")
{
char yourColumnDelimiter = '\t';
var columnValues = firstFoundLine.Split(new []{yourColumnDelimiter});
var secondColumn = columnValues[1];
var thirdColumns = columnValues[2];
}
File.ReadLines is better than File.RealAllLines because you won't need to read the whole file -- only until matching string is found http://msdn.microsoft.com/en-us/library/dd383503.aspx
Parse this monstrosity into some sort of database.
SQL Server/MySQL would be preferable, but if you can't use them for various reasons, SQLite or even Access or Excel could work.
Doing that a single time is not hard.
After you are done with that, searching will become easy and fast.
GetLines(inputPath).FirstOrDefault(p=>p.Split(",")[0]=="SearchText")
private static IEnumerable<string> GetLines(string inputFile)
{
string filePath = Path.Combine(Directory.GetCurrentDirectory(),inputFile);
return File.ReadLines(filePath);
}
I am trying to take all the hardcoded strings in a .cs file and load it from a constant file.
For instance
string capital="Washington";
should be loaded as
string capital=Constants.capital;
and that will be added in Constants.cs
public final const capital="Washington";
I need a java/C# snippet to do this.I can't use any third party tools.Any help on this?
EDIT:
After reading the comments and answers I get a feeling I am not clear.I just want a way to replace all hard coded constants which will be having "" and rip that off and replace it with the Constants. and add that property in Constants.cs.This can be a simple text processing as well.
A few hints that should get you started:
Assume that your string processor function is called ProcessStrings.
1) Include Constants.cs into the same project as the ProcessStrings function, so it gets compiled in with the refactoring code.
2) Reflect over your Constants class to build a Dictionary of language strings to constant names, something like:
Dictionary<String, String> constantList = new Dictionary<String, String>();
FieldInfo[] fields = typeof(Constants).GetFields(BindingFlags.Static | BindingFlags.Public);
String constantValue;
foreach (FieldInfo field in fields)
{
if (field.FieldType == typeof(String))
{
constantValue = (string)field.GetValue(null);
constantList.Add(constantValue, field.Name);
}
}
3) constantList should now contain the full list of Constant names, indexed by the string they represent.
4) Grab all the lines from the file (using File.ReadAllLines).
5) Now iterate over the lines. Something like the following should allow you to ignore lines that you shouldn't be processing.
//check if the line is a comment or xml comment
if (Regex.IsMatch(lines[idx], #"^\s*//"))
continue;
//check if the entry is an attribute
if (Regex.IsMatch(lines[idx], #"^\s*\["))
continue;
//check if the line is part of a block comment (assuming a * at the start of the line)
if (Regex.IsMatch(lines[idx], #"^\s*(/\*+|\*+)"))
continue;
//check if the line has been marked as ignored
//(this is something handy I use to mark a string to be ignored for any reason, just put //IgnoreString at the end of the line)
if (Regex.IsMatch(lines[idx], #"//\s*IgnoreString\s*$"))
continue;
6) Now, match any quoted strings on the line, then go through each match and check it for a few conditions. You can remove some of these conditions if needs be.
MatchCollection mC = Regex.Matches(lines[idx], "#?\"([^\"]+)\"");
foreach (Match m in mC)
{
if (
// Detect format insertion markers that are on their own and ignore them,
!Regex.IsMatch(m.Value, #"""\s*\{\d(:\d+)?\}\s*""") &&
//or check for strings of single character length that are not proper characters (-, /, etc)
!Regex.IsMatch(m.Value, #"""\s*\\?[^\w]\s*""") &&
//check for digit only strings, allowing for decimal places and an optional percentage or multiplier indicator
!Regex.IsMatch(m.Value, #"""[\d.]+[%|x]?""") &&
//check for array indexers
!(m.Index <= lines[idx].Length && lines[idx][m.Index - 1] == '[' && lines[idx][m.Index + m.Length] == ']') &&
)
{
String toCheck = m.Groups[1].Value;
//look up the string we found in our list of constants
if (constantList.ContainsKey(toCheck))
{
String replaceString;
replaceString = "Constants." + constants[toCheck];
//replace the line in the file
lines[idx] = lines[idx].Replace("\"" + m.Groups[1].Value + "\"", replaceString);
}
else
{
//See Point 8....
}
}
7) Now join the array of lines back up, and write it back to the file. That should get you most of the way.
8) To get it to generate constants for strings you don't already have an entry for, in the else block for looking up the string,
generate a name for the constant from the string (I just removed all special characters and spaces from the string and limited it to 10 words). Then use that name and the original string (from the toCheck variable in point 6) to make a constant declaration and insert it into Constants.cs.
Then when you run the function again, those new constants will be used.
I don't know if there is any such code available, but I am providing some guidelines on how it can be implemented.
You can write a macro/standalone application (I think macro is a better option)
Parse current document or all the files in the project/solution
Write a regular expression for finding the strings (what about strings in XAML?). something like [string]([a-z A-Z0-9])["]([a-z A-Z0-9])["][;] -- this is not valid, I have just provide for discussion
Extract the constant from code.
Check if similar string is already there in your static class
If not found, insert new entry in static class
Replace string with the variable name
Goto step 2
Is there a reason why you can't put these into a static class or just in a file in your application? You can put constants anywhere and as long as they are scoped properly you can access them from everywhere.
public const string capital = "Washington";
if const doesn't work in static class, then it would be
public static readonly string capital = "Washington";
if you really want to do it the way you describe, read the file with a streamreader, split by \r\n, check if the first thing is "string", and then do all your replacements on that string element...
make sure that every time you change that string declaration, you add the nessesary lines to the other file.
You can create a class project for your constants, or if you have a helper class project, you can add a new class for you constants (Constants.cs).
public static class Constants
{
public const string CAPITAL_Washington = "Washington";
}
You can now use this:
string capital = Constants.CAPITAL_Washington;
You might as well name your constants quite specific.