Replacing hardcoded strings with constants in C# - c#

I am trying to take all the hardcoded strings in a .cs file and load it from a constant file.
For instance
string capital="Washington";
should be loaded as
string capital=Constants.capital;
and that will be added in Constants.cs
public final const capital="Washington";
I need a java/C# snippet to do this.I can't use any third party tools.Any help on this?
EDIT:
After reading the comments and answers I get a feeling I am not clear.I just want a way to replace all hard coded constants which will be having "" and rip that off and replace it with the Constants. and add that property in Constants.cs.This can be a simple text processing as well.

A few hints that should get you started:
Assume that your string processor function is called ProcessStrings.
1) Include Constants.cs into the same project as the ProcessStrings function, so it gets compiled in with the refactoring code.
2) Reflect over your Constants class to build a Dictionary of language strings to constant names, something like:
Dictionary<String, String> constantList = new Dictionary<String, String>();
FieldInfo[] fields = typeof(Constants).GetFields(BindingFlags.Static | BindingFlags.Public);
String constantValue;
foreach (FieldInfo field in fields)
{
if (field.FieldType == typeof(String))
{
constantValue = (string)field.GetValue(null);
constantList.Add(constantValue, field.Name);
}
}
3) constantList should now contain the full list of Constant names, indexed by the string they represent.
4) Grab all the lines from the file (using File.ReadAllLines).
5) Now iterate over the lines. Something like the following should allow you to ignore lines that you shouldn't be processing.
//check if the line is a comment or xml comment
if (Regex.IsMatch(lines[idx], #"^\s*//"))
continue;
//check if the entry is an attribute
if (Regex.IsMatch(lines[idx], #"^\s*\["))
continue;
//check if the line is part of a block comment (assuming a * at the start of the line)
if (Regex.IsMatch(lines[idx], #"^\s*(/\*+|\*+)"))
continue;
//check if the line has been marked as ignored
//(this is something handy I use to mark a string to be ignored for any reason, just put //IgnoreString at the end of the line)
if (Regex.IsMatch(lines[idx], #"//\s*IgnoreString\s*$"))
continue;
6) Now, match any quoted strings on the line, then go through each match and check it for a few conditions. You can remove some of these conditions if needs be.
MatchCollection mC = Regex.Matches(lines[idx], "#?\"([^\"]+)\"");
foreach (Match m in mC)
{
if (
// Detect format insertion markers that are on their own and ignore them,
!Regex.IsMatch(m.Value, #"""\s*\{\d(:\d+)?\}\s*""") &&
//or check for strings of single character length that are not proper characters (-, /, etc)
!Regex.IsMatch(m.Value, #"""\s*\\?[^\w]\s*""") &&
//check for digit only strings, allowing for decimal places and an optional percentage or multiplier indicator
!Regex.IsMatch(m.Value, #"""[\d.]+[%|x]?""") &&
//check for array indexers
!(m.Index <= lines[idx].Length && lines[idx][m.Index - 1] == '[' && lines[idx][m.Index + m.Length] == ']') &&
)
{
String toCheck = m.Groups[1].Value;
//look up the string we found in our list of constants
if (constantList.ContainsKey(toCheck))
{
String replaceString;
replaceString = "Constants." + constants[toCheck];
//replace the line in the file
lines[idx] = lines[idx].Replace("\"" + m.Groups[1].Value + "\"", replaceString);
}
else
{
//See Point 8....
}
}
7) Now join the array of lines back up, and write it back to the file. That should get you most of the way.
8) To get it to generate constants for strings you don't already have an entry for, in the else block for looking up the string,
generate a name for the constant from the string (I just removed all special characters and spaces from the string and limited it to 10 words). Then use that name and the original string (from the toCheck variable in point 6) to make a constant declaration and insert it into Constants.cs.
Then when you run the function again, those new constants will be used.

I don't know if there is any such code available, but I am providing some guidelines on how it can be implemented.
You can write a macro/standalone application (I think macro is a better option)
Parse current document or all the files in the project/solution
Write a regular expression for finding the strings (what about strings in XAML?). something like [string]([a-z A-Z0-9])["]([a-z A-Z0-9])["][;] -- this is not valid, I have just provide for discussion
Extract the constant from code.
Check if similar string is already there in your static class
If not found, insert new entry in static class
Replace string with the variable name
Goto step 2

Is there a reason why you can't put these into a static class or just in a file in your application? You can put constants anywhere and as long as they are scoped properly you can access them from everywhere.

public const string capital = "Washington";
if const doesn't work in static class, then it would be
public static readonly string capital = "Washington";

if you really want to do it the way you describe, read the file with a streamreader, split by \r\n, check if the first thing is "string", and then do all your replacements on that string element...
make sure that every time you change that string declaration, you add the nessesary lines to the other file.

You can create a class project for your constants, or if you have a helper class project, you can add a new class for you constants (Constants.cs).
public static class Constants
{
public const string CAPITAL_Washington = "Washington";
}
You can now use this:
string capital = Constants.CAPITAL_Washington;
You might as well name your constants quite specific.

Related

Turn A Full Path Into A Path With Environment Variables

I want to turn a full path into an environment variable path using c#
Is this even possible?
i.e.
C:\Users\Username\Documents\Text.txt -> %USERPROFILE%\Documents\Text.txt
C:\Windows\System32\cmd.exe -> %WINDIR%\System32\cmd.exe
C:\Program Files\Program\Program.exe -> %PROGRAMFILES%\Program\Program.exe
It is possible by going over all environment variables and checking which variable's value is contained in the string, then replacing that part of the string with the corresponding variable name surrounded by %.
First naive attempt:
string Tokenify(string path)
{
foreach (DictionaryEntry e in Environment.GetEnvironmentVariables())
{
int index = path.IndexOf(e.Value.ToString());
if (index > -1)
{
//we need to make sure we're not already inside a tokenized part.
int numDelimiters = path.Take(index).Count(c => c == '%');
if (numDelimiters % 2 == 0)
{
path = path.Replace(e.Value.ToString(), $"%{e.Key.ToString()}%");
}
}
}
return path;
}
The code currently makes a faulty assumption that the environment variable's value appears only once in the path. This needs to be corrected, but let's put that aside for now.
Also note that not all environment variables represent directories. For example, if I run this method on the string "6", I get "%PROCESSOR_LEVEL%". This can be remedied by checking for Directory.Exists() on the environment variable value before using it. This will probably also invalidate the need for checking whether we are already in a tokenized part of the string.
You may also want to sort the environment variables by length so to always use the most specific one. Otherwise you can end up with:
%HOMEDRIVE%%HOMEPATH%\AppData\Local\Folder
instead of:
%LOCALAPPDATA%\Folder
Updated code that prefers the longest variable:
string Tokenify(string path)
{
//first find all the environment variables that represent paths.
var validEnvVars = new List<KeyValuePair<string, string>>();
foreach (DictionaryEntry e in Environment.GetEnvironmentVariables())
{
string envPath = e.Value.ToString();
if (System.IO.Directory.Exists(envPath))
{
//this would be the place to add any other filters.
validEnvVars.Add(new KeyValuePair<string, string>(e.Key.ToString(), envPath));
}
}
//sort them by length so we always get the most specific one.
//if you are dealing with a large number of strings then orderedVars can be generated just once and cached.
var orderedVars = validEnvVars.OrderByDescending(kv => kv.Value.Length);
foreach (var kv in orderedVars)
{
//using regex just for case insensitivity. Otherwise just use string.Replace.
path = Regex.Replace(path, Regex.Escape(kv.Value), $"%{kv.Key}%", RegexOptions.IgnoreCase);
}
return path;
}
You may still want to add checks to avoid double-tokenizing parts of the string, but that is much less likely to be an issue in this version.
Also you might want to filter out some variables like drive roots, e.g. (%HOMEDRIVE%) or by any other criteria.

c# Read/ Write CSV - excluding Comma in field Value [duplicate]

I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.
Some of the ideas we are looking at are: quoted Identifiers (value "," values ","etc) or using a | instead of a comma. The biggest problem is that we have to make it easy, or the customer won't do it.
There's actually a spec for CSV format, RFC 4180 and how to handle commas:
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
http://tools.ietf.org/html/rfc4180
So, to have values foo and bar,baz, you do this:
foo,"bar,baz"
Another important requirement to consider (also from the spec):
If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"
As others have said, you need to escape values that include quotes. Here’s a little CSV reader in C♯ that supports quoted values, including embedded quotes and carriage returns.
By the way, this is unit-tested code. I’m posting it now because this question seems to come up a lot and others may not want an entire library when simple CSV support will do.
You can use it as follows:
using System;
public class test
{
public static void Main()
{
using ( CsvReader reader = new CsvReader( "data.csv" ) )
{
foreach( string[] values in reader.RowEnumerator )
{
Console.WriteLine( "Row {0} has {1} values.", reader.RowIndex, values.Length );
}
}
Console.ReadLine();
}
}
Here are the classes. Note that you can use the Csv.Escape function to write valid CSV as well.
using System.IO;
using System.Text.RegularExpressions;
public sealed class CsvReader : System.IDisposable
{
public CsvReader( string fileName ) : this( new FileStream( fileName, FileMode.Open, FileAccess.Read ) )
{
}
public CsvReader( Stream stream )
{
__reader = new StreamReader( stream );
}
public System.Collections.IEnumerable RowEnumerator
{
get {
if ( null == __reader )
throw new System.ApplicationException( "I can't start reading without CSV input." );
__rowno = 0;
string sLine;
string sNextLine;
while ( null != ( sLine = __reader.ReadLine() ) )
{
while ( rexRunOnLine.IsMatch( sLine ) && null != ( sNextLine = __reader.ReadLine() ) )
sLine += "\n" + sNextLine;
__rowno++;
string[] values = rexCsvSplitter.Split( sLine );
for ( int i = 0; i < values.Length; i++ )
values[i] = Csv.Unescape( values[i] );
yield return values;
}
__reader.Close();
}
}
public long RowIndex { get { return __rowno; } }
public void Dispose()
{
if ( null != __reader ) __reader.Dispose();
}
//============================================
private long __rowno = 0;
private TextReader __reader;
private static Regex rexCsvSplitter = new Regex( #",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );
private static Regex rexRunOnLine = new Regex( #"^[^""]*(?:""[^""]*""[^""]*)*""[^""]*$" );
}
public static class Csv
{
public static string Escape( string s )
{
if ( s.Contains( QUOTE ) )
s = s.Replace( QUOTE, ESCAPED_QUOTE );
if ( s.IndexOfAny( CHARACTERS_THAT_MUST_BE_QUOTED ) > -1 )
s = QUOTE + s + QUOTE;
return s;
}
public static string Unescape( string s )
{
if ( s.StartsWith( QUOTE ) && s.EndsWith( QUOTE ) )
{
s = s.Substring( 1, s.Length - 2 );
if ( s.Contains( ESCAPED_QUOTE ) )
s = s.Replace( ESCAPED_QUOTE, QUOTE );
}
return s;
}
private const string QUOTE = "\"";
private const string ESCAPED_QUOTE = "\"\"";
private static char[] CHARACTERS_THAT_MUST_BE_QUOTED = { ',', '"', '\n' };
}
The CSV format uses commas to separate values, values which contain carriage returns, linefeeds, commas, or double quotes are surrounded by double-quotes. Values that contain double quotes are quoted and each literal quote is escaped by an immediately preceding quote: For example, the 3 values:
test
list, of, items
"go" he said
would be encoded as:
test
"list, of, items"
"""go"" he said"
Any field can be quoted but only fields that contain commas, CR/NL, or quotes must be quoted.
There is no real standard for the CSV format, but almost all applications follow the conventions documented here. The RFC that was mentioned elsewhere is not a standard for CSV, it is an RFC for using CSV within MIME and contains some unconventional and unnecessary limitations that make it useless outside of MIME.
A gotcha that many CSV modules I have seen don't accommodate is the fact that multiple lines can be encoded in a single field which means you can't assume that each line is a separate record, you either need to not allow newlines in your data or be prepared to handle this.
Put double quotes around strings. That is generally what Excel does.
Ala Eli,
you escape a double quote as two
double quotes. E.g.
"test1","foo""bar","test2"
You can put double quotes around the fields. I don't like this approach, as it adds another special character (the double quote). Just define an escape character (usually backslash) and use it wherever you need to escape something:
data,more data,more data\, even,yet more
You don't have to try to match quotes, and you have fewer exceptions to parse. This simplifies your code, too.
There is a library available through nuget for dealing with pretty much any well formed CSV (.net) - CsvHelper
Example to map to a class:
var csv = new CsvReader( textReader );
var records = csv.GetRecords<MyClass>();
Example to read individual fields:
var csv = new CsvReader( textReader );
while( csv.Read() )
{
var intField = csv.GetField<int>( 0 );
var stringField = csv.GetField<string>( 1 );
var boolField = csv.GetField<bool>( "HeaderName" );
}
Letting the client drive the file format:
, is the standard field delimiter, " is the standard value used to escape fields that contain a delimiter, quote, or line ending.
To use (for example) # for fields and ' for escaping:
var csv = new CsvReader( textReader );
csv.Configuration.Delimiter = "#";
csv.Configuration.Quote = ''';
// read the file however meets your needs
More Documentation
In case you're on a *nix-system, have access to sed and there can be one or more unwanted commas only in a specific field of your CSV, you can use the following one-liner in order to enclose them in " as RFC4180 Section 2 proposes:
sed -r 's/([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*)/\1"\2"\3/' inputfile
Depending on which field the unwanted comma(s) may be in you have to alter/extend the capturing groups of the regex (and the substitution).
The example above will enclose the fourth field (out of six) in quotation marks.
In combination with the --in-place-option you can apply these changes directly to the file.
In order to "build" the right regex, there's a simple principle to follow:
For every field in your CSV that comes before the field with the unwanted comma(s) you write one [^,]*, and put them all together in a capturing group.
For the field that contains the unwanted comma(s) you write (.*).
For every field after the field with the unwanted comma(s) you write one ,.* and put them all together in a capturing group.
Here is a short overview of different possible regexes/substitutions depending on the specific field. If not given, the substitution is \1"\2"\3.
([^,]*)(,.*) #first field, regex
"\1"\2 #first field, substitution
(.*,)([^,]*) #last field, regex
\1"\2" #last field, substitution
([^,]*,)(.*)(,.*,.*,.*) #second field (out of five fields)
([^,]*,[^,]*,)(.*)(,.*) #third field (out of four fields)
([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*) #fourth field (out of six fields)
If you want to remove the unwanted comma(s) with sed instead of enclosing them with quotation marks refer to this answer.
As mentioned in my comment to harpo's answer, his solution is good and works in most cases, however in some scenarios when commas as directly adjacent to each other it fails to split on the commas.
This is because of the Regex string behaving unexpectedly as a vertabim string.
In order to get this behave correct, all " characters in the regex string need to be escaped manually without using the vertabim escape.
Ie. The regex should be this using manual escapes:
",(?=(?:[^\"\"]*\"\"[^\"\"]*\"\")*(?![^\"\"]*\"\"))"
which translates into ",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))"
When using a vertabim string #",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" it behaves as the following as you can see if you debug the regex:
",(?=(?:[^"]*"[^"]*")*(?![^"]*"))"
So in summary, I recommend harpo's solution, but watch out for this little gotcha!
I've included into the CsvReader a little optional failsafe to notify you if this error occurs (if you have a pre-known number of columns):
if (_expectedDataLength > 0 && values.Length != _expectedDataLength)
throw new DataLengthException(string.Format("Expected {0} columns when splitting csv, got {1}", _expectedDataLength, values.Length));
This can be injected via the constructor:
public CsvReader(string fileName, int expectedDataLength = 0) : this(new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
_expectedDataLength = expectedDataLength;
}
Add a reference to the Microsoft.VisualBasic (yes, it says VisualBasic but it works in C# just as well - remember that at the end it is all just IL).
Use the Microsoft.VisualBasic.FileIO.TextFieldParser class to parse CSV file Here is the sample code:
Dim parser As TextFieldParser = New TextFieldParser("C:\mar0112.csv")
parser.TextFieldType = FieldType.Delimited
parser.SetDelimiters(",")
While Not parser.EndOfData
'Processing row
Dim fields() As String = parser.ReadFields
For Each field As String In fields
'TODO: Process field
Next
parser.Close()
End While
You can use alternative "delimiters" like ";" or "|" but simplest might just be quoting which is supported by most (decent) CSV libraries and most decent spreadsheets.
For more on CSV delimiters and a spec for a standard format for describing delimiters and quoting see this webpage
If you're interested in a more educational exercise on how to parse files in general (using CSV as an example), you may check out this article by Julian Bucknall. I like the article because it breaks things down into much smaller problems that are much less insurmountable. You first create a grammar, and once you have a good grammar, it's a relatively easy and methodical process to convert the grammar into code.
The article uses C# and has a link at the bottom to download the code.
If you feel like reinventing the wheel, the following may work for you:
public static IEnumerable<string> SplitCSV(string line)
{
var s = new StringBuilder();
bool escaped = false, inQuotes = false;
foreach (char c in line)
{
if (c == ',' && !inQuotes)
{
yield return s.ToString();
s.Clear();
}
else if (c == '\\' && !escaped)
{
escaped = true;
}
else if (c == '"' && !escaped)
{
inQuotes = !inQuotes;
}
else
{
escaped = false;
s.Append(c);
}
}
yield return s.ToString();
}
In Europe we have this problem must earlier than this question. In Europe we use all a comma for a decimal point. See this numbers below:
| American | Europe |
| ------------- | ------------- |
| 0.5 | 0,5 |
| 3.14159265359 | 3,14159265359 |
| 17.54 | 17,54 |
| 175,186.15 | 175.186,15 |
So it isn't possible to use the comma separator for CSV files. Because of that reason, the CSV files in Europe are separated by a semicolon (;).
Programs like Microsoft Excel can read files with a semicolon and it's possible to switch from separator. You could even use a tab (\t) as separator. See this answer from Supper User.
Here's a neat little workaround:
You can use a Greek Lower Numeral Sign instead (U+0375)
It looks like this ͵
Using this method saves you a lot of resources too...
I know it's almost 13 years later, but we came across a similar situation where the client inputs us a CSV and has values with commas, there are 2 use cases:
If the client uses a windows Excel client to write the CSV (usually that's the case in windows environment) then commas are automatically added to the value.
The actual text value of the CSV:
3786962,1st Meridian Care Services,John,"Person A,Person B, Person C, Person D",Voyager
If the client is sending you the excel programmatically, then he should adhere to RFC4180 and enclose the value with "quotes". example:
Col1, Col2, "a, b, c", Col4
Just use SoftCircuits.CsvParser on NuGet. It will handle all those details for you and efficiently handles very large files. And, if needed, it can even import/export objects by mapping columns to object properties. In addition, my testing showed it averages nearly 4 times faster than the popular CsvHelper.
You can read the csv file like this.
this makes use of splits and takes care of spaces.
ArrayList List = new ArrayList();
static ServerSocket Server;
static Socket socket;
static ArrayList<Object> list = new ArrayList<Object>();
public static void ReadFromXcel() throws FileNotFoundException
{
File f = new File("Book.csv");
Scanner in = new Scanner(f);
int count =0;
String[] date;
String[] name;
String[] Temp = new String[10];
String[] Temp2 = new String[10];
String[] numbers;
ArrayList<String[]> List = new ArrayList<String[]>();
HashMap m = new HashMap();
in.nextLine();
date = in.nextLine().split(",");
name = in.nextLine().split(",");
numbers = in.nextLine().split(",");
while(in.hasNext())
{
String[] one = in.nextLine().split(",");
List.add(one);
}
int xount = 0;
//Making sure the lines don't start with a blank
for(int y = 0; y<= date.length-1; y++)
{
if(!date[y].equals(""))
{
Temp[xount] = date[y];
Temp2[xount] = name[y];
xount++;
}
}
date = Temp;
name =Temp2;
int counter = 0;
while(counter < List.size())
{
String[] list = List.get(counter);
String sNo = list[0];
String Surname = list[1];
String Name = list[2];
for(int x = 3; x < list.length; x++)
{
m.put(numbers[x], list[x]);
}
Object newOne = new newOne(sNo, Name, Surname, m, false);
StudentList.add(s);
System.out.println(s.sNo);
counter++;
}
I generally URL-encode the fields which can have any commas or any special chars. And then decode it when it is being used/displayed in any visual medium.
(commas becomes %2C)
Every language should have methods to URL-encode and decode strings.
e.g., in java
URLEncoder.encode(myString,"UTF-8"); //to encode
URLDecoder.decode(myEncodedstring, "UTF-8"); //to decode
I know this is a very general solution and it might not be ideal for situation where user wants to view content of csv file, manually.
I usually do this in my CSV files parsing routines. Assume that 'line' variable is one line within a CSV file and all of the columns' values are enclosed in double quotes. After the below two lines execute, you will get CSV columns in the 'values' collection.
// The below two lines will split the columns as well as trim the DBOULE QUOTES around values but NOT within them
string trimmedLine = line.Trim(new char[] { '\"' });
List<string> values = trimmedLine.Split(new string[] { "\",\"" }, StringSplitOptions.None).ToList();
The simplest solution I've found is the one LibreOffice uses:
Replace all literal " by ”
Put double quotes around your string
You can also use the one that Excel uses:
Replace all literal " by ""
Put double quotes around your string
Notice other people recommended to do only step 2 above, but that doesn't work with lines where a " is followed by a ,, like in a CSV where you want to have a single column with the string hello",world, as the CSV would read:
"hello",world"
Which is interpreted as a row with two columns: hello and world"
public static IEnumerable<string> LineSplitter(this string line, char
separator, char skip = '"')
{
var fieldStart = 0;
for (var i = 0; i < line.Length; i++)
{
if (line[i] == separator)
{
yield return line.Substring(fieldStart, i - fieldStart);
fieldStart = i + 1;
}
else if (i == line.Length - 1)
{
yield return line.Substring(fieldStart, i - fieldStart + 1);
fieldStart = i + 1;
}
if (line[i] == '"')
for (i++; i < line.Length && line[i] != skip; i++) { }
}
if (line[line.Length - 1] == separator)
{
yield return string.Empty;
}
}
I used Csvreader library but by using that I got data by exploding from comma(,) in column value.
So If you want to insert CSV file data which contains comma(,) in most of the columns values, you can use below function.
Author link => https://gist.github.com/jaywilliams/385876
function csv_to_array($filename='', $delimiter=',')
{
if(!file_exists($filename) || !is_readable($filename))
return FALSE;
$header = NULL;
$data = array();
if (($handle = fopen($filename, 'r')) !== FALSE)
{
while (($row = fgetcsv($handle, 1000, $delimiter)) !== FALSE)
{
if(!$header)
$header = $row;
else
$data[] = array_combine($header, $row);
}
fclose($handle);
}
return $data;
}
I used papaParse library to have the CSV file parsed and have the key-value pairs(key/header/first row of CSV file-value).
here is example that I use:
https://codesandbox.io/embed/llqmrp96pm
it has dummy.csv file in there to have the CSV parsing demo.
I've used it within reactJS though it is easy and simple to replicate in app written with any language.
An example might help to show how commas can be displayed in a .csv file. Create a simple text file as follows:
Save this text file as a text file with suffix ".csv" and open it with Excel 2000 from Windows 10.
aa,bb,cc,d;d
"In the spreadsheet presentation, the below line should look like the above line except the below shows a displayed comma instead of a semicolon between the d's."
aa,bb,cc,"d,d", This works even in Excel
aa,bb,cc,"d,d", This works even in Excel 2000
aa,bb,cc,"d ,d", This works even in Excel 2000
aa,bb,cc,"d , d", This works even in Excel 2000
aa,bb,cc, " d,d", This fails in Excel 2000 due to the space belore the 1st quote
aa,bb,cc, " d ,d", This fails in Excel 2000 due to the space belore the 1st quote
aa,bb,cc, " d , d", This fails in Excel 2000 due to the space belore the 1st quote
aa,bb,cc,"d,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote.
aa,bb,cc,"d ,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote.
aa,bb,cc,"d , d " , This works even in Excel 2000 even with spaces before and after the 2nd quote.
Rule: If you want to display a comma in a a cell (field) of a .csv file:
"Start and end the field with a double quotes, but avoid white space before the 1st quote"
As this is about general practices let's start from rules of the thumb:
Don't use CSV, use XML with a library to read & write the xml file instead.
If you must use CSV. Do it properly and use a free library to parse and store the CSV files.
To justify 1), most CSV parsers aren't encoding aware so if you aren't dealing with US-ASCII you are asking for troubles.
For example excel 2002 is storing the CSV in local encoding without any note about the encoding. The CSV standard isn't widely adopted :(.
On the other hand xml standard is well adopted and it handles encodings pretty well.
To justify 2), There is tons of csv parsers around for almost all language so there is no need to reinvent the wheel even if the solutions looks pretty simple.
To name few:
for python use build in csv module
for perl check CPAN and Text::CSV
for php use build in fgetcsv/fputcsv functions
for java check SuperCVS library
Really there is no need to implement this by hand if you aren't going to parse it on embedded device.
First, let's ask ourselves, "Why do we feel the need to handle commas differently for CSV files?"
For me, the answer is, "Because when I export data into a CSV file, the commas in a field disappear and my field gets separated into multiple fields where the commas appear in the original data." (That it because the comma is the CSV field separator character.)
Depending on your situation, semi colons may also be used as CSV field separators.
Given my requirements, I can use a character, e.g., single low-9 quotation mark, that looks like a comma.
So, here's how you can do it in Go:
// Replace special CSV characters with single low-9 quotation mark
func Scrub(a interface{}) string {
s := fmt.Sprint(a)
s = strings.Replace(s, ",", "‚", -1)
s = strings.Replace(s, ";", "‚", -1)
return s
}
The second comma looking character in the Replace function is decimal 8218.
Be aware that if you have clients that may have ascii-only text readers that this decima 8218 character will not look like a comma. If this is your case, then I'd recommend surrounding the field with the comma (or semicolon) with double quotes per RFC 4128: https://www.rfc-editor.org/rfc/rfc4180
Thank you others in this post.
I used the information here to create a function in JavaScript that will get csv output for an array of objects which may have property values containing commas.
like
rowsArray = [{obj1prop1: "foo", obj1prop2: "bar,baz"}, {obj2prop1: "qux", obj2prop2: "quux,corge,thud"}]
into
csvRowsArray = [{obj1prop1: "foo", obj1prop2: "\"bar,baz\""}, {...} ]
To use the commas in the values in a csv, the value needs to be wrapped in double quotes. And in order to have double quotes in the value in the json object, they just need to be escaped, i.e., \", backslash double quote. The escape is made here by subbing in a template literal and including the necessary quotes `"${row[key]}"`. The quotes are escaped when put in the object.
Here is my function:
const calculateTheCSVExport = (props) => {
if (props.rows === undefined) return;
let jsonRowsArray = props.rows;
// console.log(jsonRowsArray);
let csvRowsArrayNoCommasInObjectValues = [];
let csvCurrRowObject = {}
jsonRowsArray.forEach(row => {
Object.keys(row).forEach(key => {
// console.log(key, row[key])
if (row[key].indexOf(',') > -1) {
csvCurrRowObject = {...csvCurrRowObject, [key]: `"${row[key]}"`} // enclose value in escaped double quotes in JSON in order to export commas to csv correctly. see more: https://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file
} else {
csvCurrRowObject = {...csvCurrRowObject, [key]: row[key]}
}
});
csvRowsArrayNoCommasInObjectValues.push(csvCurrRowObject);
csvCurrRowObject = {};
})
// console.log(csvRowsArrayNoCommasInObjectValues)
return csvRowsArrayNoCommasInObjectValues;
}
I think the easiest solution to this problem is to have the customer to open the csv in excel, and then ctrl + r to replace all comma with whatever identifier you want. This is very easy for the customer and require only one change in your code to read the delimiter of your choice.
Use a tab character (\t) to separate the fields.

Parsing an inconsistent log file

I have a log file that I want to parse and load into a database. I'm struggling with the best way to go about parsing it.
The log file is in the format Category: Information
Case Number: CASE01
User ID: JOSM
Software: Microsoft Word
Date Started: 21-01-2010
Date Ended: 22-01-2010
Thing is, there's other bits and pieces thrown into the log file that mean the information isn't always present on the same line. I also only want the information, not the category.
So far, I've tried stick it all into an array separated by \r\n, but I have to know the index of the information I want in order to consistently retrieve it, and that changes. I've also tried feeding it through StreamReader and saying
if (line.Contains("Case Number"))
{
tbReport.AppendText("Case Number: " + line.Remove(0, 13) + "\r\n");
}
Which gets me the information I want, but makes it very hard to do anything with.
I feel I'm better off going down the array path, but I could do with some guidance on how to search the array for the the category, and then parse the information.
Once I can parse it accurately, adding it into a database should be fairly straight forward. As it's my first time attempting this, I'd be interested in any tips or guidance as to the best way to go about this though.
Thanks.
This will give you a collection with all key/value pairs.
List<KeyValuePair> items = new List<KeyValuePair>();
var line = reader.ReadLine();
while (line != null)
{
int pos = line.IndexOf(':');
items.Add(new KeyValuePair(line.Substring(0, pos), line.Substring(pos+1));
line = reader.ReadLine();
}
If you have a log class which contains all possible names as properties, you can use reflection instead:
class LogEntry
{
public string CaseNumber { get; set; }
public string User { get; set; }
public string Software{ get; set; }
public string DateStarted { get; set; }
public string DateEnded { get; set; }
}
List<LogEntry> items = new List<LogEntry>();
var line = reader.ReadLine();
var currentEntry = new LogEntry();
while (line != null)
{
if (line == "") //empty line = new log entry. Change to your delimiter.
{
items.Add(currentEntry);
currentEntry = new LogEntry();
}
int pos = line.IndexOf(':');
var name = line.Substring(0, pos).Replace(" ", string.Empty);
var value = line.Substring(pos+1);
var pi = entry.GetType().GetProperty(name);
pi.SetValue(entry, value, null);
line = reader.ReadLine();
}
Note that I've not tested the code (just written it directly in here). You have to add error checking and such. The last alternative is not very performant as it is, but should do OK.
Sounds like a good case candidate for RegExp :
http://www.regular-expressions.info/dotnet.html
They're not too easy to learn but once you get the basic understanding, they can't be beaten for that kind of tasks.
It's not really a simple answer, but have you maybe though about using a regular expression for parsing the information out?
Regular expressions is kinda hardcore stuff, but they can parsed advanced files quite easily.
So in what I can see, then its like:
If a line starts with A-Z, then (a-z or A-Z or 0-9 or space) from zero to many times, then followed by a : then a space, and then the value.
So if you make a regular expression for that (If you wait awhile I will try to make one for you), then you could test each line with that. If it matches, then we can also use regular expressions to take the last part out, and the "key". If it don't matches, then we just append it to the last key.
Beware that its not totally fool-proof, as a new line could just start this way, but its kinda the best thing we can do, i think.
As promised here is a starting point for your regular expression:
^(?'key'[A-Z][a-z,A-Z,0-9,\s]+):\s(?'value'.+)
So to try and tell what it does, we need to go though each part:
^ ensures that a match starts on the beginning of a line
(?'key' is a syntax to begin a "capture" group. The regular expression will then give us access to easily take the "key" part of the regular expression out.
We that with a [A-Z] - that is a group that will match any big letter. But only one
[a-z,A-Z,0-9,\s]+ - is like the previous group, but just for all big, or small letters, numbers and space (\s), the plus outside the group tells that it can match more than one.
Then we just end the group, and puts in out *: and then a space.
We then begin a new group the value group, just like the key group.
Then we just write . (that means everything), and then just a + after that to make it catch more than one
I actually think that you can just take the whole string, and just match a:
RegEx.Matches (or something like that), and loop over them.
Then just take match.Groups["key"] and match.Groups["value"] and put into your array. (Sorry i dont have a Visual Studio handy to test it out)

Trim all chars off file name after first "_"

I'd like to trim these purchase order file names (a few examples below) so that everything after the first "_" is omitted.
INCOLOR_fc06_NEW.pdf
Keep: INCOLOR (write this to db as the VendorID) Remove: _fc08_NEW.pdf
NORTHSTAR_sc09.xls
Keep: NORTHSTAR (write this to db as the VendorID) Remove: _sc09.xls
Our scenario: The managers are uploading these files to our Intranet web server, to make them available to download/view ect. I'm using Brettles NeatUpload, and for each file uploaded, am writing the files attributes into the PO table (sql 2000). The first part of the file name will be written to the DB as a VendorID.
The naming convention for these files is consistent in that the the first part of the file is always the vendor name (or Vendor ID) followed by an "_" then other unpredictable chars used to identify the type of Purchase Order then the file extention - which is consistently either .xls, .XLS, .PDF, or .pdf.
I tried TrimEnd - but the array of chars that you have to provide ends up being long and can conflict with the part of the file name I want to keep. I have a feeling I'm not using TrimEnd properly.
What is the best way to use string.TrimEnd (or any other string manipulation in C#) that will strip off all chars after the first "_" ?
String s = "INCOLOR_fc06_NEW.pdf";
int index = s.IndexOf("_");
return index >= 0 ? s.Substring(0,index) : s;
I'll probably offend the anti-regex lobby, but here I go (ducking):
string stripped = Regex.Replace(filename, #"(?<=[^_]*)_.*",String.Empty);
This code will strip all extra characters after the first '_', unless there is no '_' in the string (then it will just return the original string).
It's one line of code. It's slower than the more elaborate IndexOf() algorithm, but when used in a non-performance-sensitive part of the code, it's a good solution.
Get your flame-throwers out...
TrimEnd removes white spaces and punctuation marks at the end of the String, it won't help you here. Read more about TrimEnd here:
http://msdn.microsoft.com/en-us/library/system.string.trimend.aspx
Bnaffas code (with a small tweak):
String fileName = "INCOLOR_fc06_NEW.pdf";
int index = fileName.IndexOf("_");
return index >= 0 ? fileName.Substring(0, index) : fileName;
If you want to do something with the other parts, you could use a Split
string fileName = "INCOLOR_fc06_NEW.pdf";
string[] parts = fileName.Split('_');
public string StripOffStuff(string sInput)
{
int iIndex = sInput.IndexOf("_");
return (iIndex > 0) ? sInput.Substring(0, iIndex) : sInput;
}
// Call it like:
string sNewString = StripOffStuff("INCOLOR_fc06_NEW.pdf");
I would go with the SubString approach but to round out the available solutions here's a LINQ approach just for fun:
string filename = "INCOLOR_fc06_NEW.pdf";
string result = new string(filename.TakeWhile(c => c != '_').ToArray());
It'll return the original string if no underscore is found.
To go with all the "alternative" solutions, here's the second one that I thought of (after substring):
string filename = "INCOLOR_fc06_NEW.pdf";
string stripped = filename.Split('_')[0];

mcdonalds to ProperCase in C#

How would you convert names to proper case in C#?
I have a list of names that I'd like to proof.
For example: mcdonalds to McDonalds or o'brien to O'Brien.
You could consider using a search engine to help you. Submit a query and see how the results have capitalized the name.
I wrote the following extension methods. Feel free to use them.
public static class StringExtensions
{
public static string ToProperCase( this string original )
{
if( original.IsNullOrEmpty() )
return original;
string result = _properNameRx.Replace( original.ToLower( CultureInfo.CurrentCulture ), HandleWord );
return result;
}
public static string WordToProperCase( this string word )
{
if( word.IsNullOrEmpty() )
return word;
if( word.Length > 1 )
return Char.ToUpper( word[0], CultureInfo.CurrentCulture ) + word.Substring( 1 );
return word.ToUpper( CultureInfo.CurrentCulture );
}
private static readonly Regex _properNameRx = new Regex( #"\b(\w+)\b" );
private static readonly string[] _prefixes = { "mc" };
private static string HandleWord( Match m )
{
string word = m.Groups[1].Value;
foreach( string prefix in _prefixes )
{
if( word.StartsWith( prefix, StringComparison.CurrentCultureIgnoreCase ) )
return prefix.WordToProperCase() + word.Substring( prefix.Length ).WordToProperCase();
}
return word.WordToProperCase();
}
}
There is absolutely no way for a computer just to magically know that the first "D" in "McDonalds" should be capitalized. So, I think there are two choices.
Someone out there may have a piece of software or a library that will do this for you.
Barring that, your only choice is to take the following approach: First, I'd look up the name in a dictionary of words that have "interesting" capitalization. Obviously you'd have to provide this dictionary yourself, unless one exists already. Second, apply an algorithm that corrects some of the obvious ones, like Celtic names beginning with O' and Mac and Mc, although given a large enough pool of names, such an algorithm will undoubtedly have a lot of false positives. Lastly, capitalize the first letter of every name that doesn't meet the first two criteria.
The hard part of this is the algorithms to decide on the capitalization. The string manipulation itself is pretty easy. There isn't a perfect way, since there are no "rules" for cases. One strategy might be a set of rules, such as "capitalize the first letter...usually" and "capitalize the 3rd letter if the first two letters are mc...usually"
Starting with a dictionary of real names and comparing them to your own name for matches will help. You could also take a dictionary of real names, generate a Markhov chain from it, and throw any new names at the Markhov chain to determine the capitalization. That's a crazy, complicated solution.
The ultimate perfect solution is to use humans to correct the data.
Doing this requires that your program be able to interpret the english language to an extent. At the very least be able to break down a string into a set of words. There is no API built-into the .Net Framework that can achieve this.
However if there was, you could use the following code.
public string ProperCase(string str, Func<string,bool> isWord) {
var word = new StringBuilder();
var cur = new StringBuilder();
for ( var i = 0; i < str.Length; i++ ) {
cur.Append(cur.Length == 0 ? Char.ToUpper(str[i]) : str[i]));
if ( isWord(cur.ToString()) {
word.Append(cur.ToString());
cur.Length = 0;
}
}
if ( cur.Length > 0 ) {
word.Append(cur);
}
return word.ToString();
}
It's not a perfect solution but it gives you a general idea of the outline
You could check the lower/mixed case surname against a dictionary (file) that has the correct casings in it, then return the 'real' value from the dictionary.
I had a quick google to see if one exists, but to no avail!
I'm planning on writing such a function, but will probably not go into too many edge cases... Below in psuedo-code with regex for matching...
start with /\b[A-Z]+\b/ as set matching, so each sequence of letters up against a word boundary, match as a set.
if the string is all uppercase...
lower-case the string
upper-case the first letter
do the following beginning of string replacements
Vanb -> VanB
Vanh -> VanH
Mc? -> Mc? (uppercase wildcard character)
Mac[^kh] -> Mac? (uppercase wildcard match)
With the replaced whole-name string do matching against other replacement sets like...
"De La " -> "de la "
That should catch most cases for names in particular... but a nice database of common name casing would be very nice.
Here was my solution. This hard-codes the names into the program but with a little work you could keep a text file outside of the program and read in the name exceptions (i.e. Van, Mc, Mac) and loop through them.
public static String toProperName(String name)
{
if (name != null)
{
if (name.Length >= 2 && name.ToLower().Substring(0, 2) == "mc") // Changes mcdonald to "McDonald"
return "Mc" + Regex.Replace(name.ToLower().Substring(2), #"\b[a-z]", m => m.Value.ToUpper());
if (name.Length >= 3 && name.ToLower().Substring(0, 3) == "van") // Changes vanwinkle to "VanWinkle"
return "Van" + Regex.Replace(name.ToLower().Substring(3), #"\b[a-z]", m => m.Value.ToUpper());
return Regex.Replace(name.ToLower(), #"\b[a-z]", m => m.Value.ToUpper()); // Changes to title case but also fixes
// appostrophes like O'HARE or o'hare to O'Hare
}
return "";
}
CultureInfo cultureInfo = Thread.CurrentThread.CurrentCulture;
TextInfo textInfo = cultureInfo.TextInfo;
string txt = textInfo.ToTitleCase("texthere");

Categories