C# performance - Regex vs. multiple Split [closed] - c#

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm working with a rather large set of strings I have to process as quickly as possible.
The format is quite fixed:
[name]/[type]:[list ([key] = [value],)]
or
[name]/[type]:[key]
I hope my representation is okay. What this means is that I have a word (in my case I call it Name), then a slash, followed by another word (I call it Type), then a colon, and it is either followed by a comma-separated list of key-value pairs (key = value), or a single key.
Name, Type cannot contain any whitespaces, however the key and value fields can.
Currently I'm using Regex to parse this data, and a split:
var regex = #"(\w+)\/(\w+):(.*)";
var r = new Regex(regex, RegexOptions.IgnoreCase | RegexOptions.Singleline);
var m = r.Match(Id);
if (m.Success) {
Name = m.Groups[1].Value;
Type= m.Groups[2].Value;
foreach (var intern in m.Groups[3].Value.Split(','))
{
var split = intern.Trim().Split('=');
if (split.Length == 2)
Items.Add(split[0], split[1]);
else if (split.Length == 1)
Items.Add(split[0], split[0]);
}
}
Now I know this is not the most optional case, but I'm not sure which would be the fastest:
Split the string first by the : then by / for the first element, and , for the second, then process the latter list and split again by =
Use the current mixture as it is
Use a completely regex-based
Of course I'm open to suggestions, my main goal is to achieve the fastest processing of this single string.

Its always fun to implement a custom parser. Obviously concerning code maintenance, Regex is probably the best choice, but if performance is an ultimate concern then you probably need a tailor made parser which even in the simplest syntaxes is quite a lot more of work.
I've whipped one up really quick (it might be a little hackish in some places) to see what it would take to implement one with some basic error recovery and information. This isn't tested in any way but I'd be curious, if its minimally functional, to know how well it stacks up with the Regex solution in terms of performance.
public class ParserOutput
{
public string Name { get; }
public string Type { get; }
public IEnumerable<Tuple<string, string>> KeyValuePairs { get; }
public bool ContainsKeyValuePairs { get; }
public bool HasErrors { get; }
public IEnumerable<string> ErrorDescriptions { get; }
public ParserOutput(string name, string type, IEnumerable<Tuple<string, string>> keyValuePairs, IEnumerable<string> errorDescriptions)
{
Name = name;
Type = type;
KeyValuePairs = keyValuePairs;
ContainsKeyValuePairs = keyValuePairs.FirstOrDefault()?.Item2?.Length > 0;
ErrorDescriptions = errorDescriptions;
HasErrors = errorDescriptions.Any();
}
}
public class CustomParser
{
private const char forwardSlash = '/';
private const char colon = ':';
private const char space = ' ';
private const char equals = '=';
private const char comma = ',';
StringBuilder buffer = new StringBuilder();
public ParserOutput Parse(string input)
{
var diagnosticsBag = new Queue<string>();
using (var enumerator = input.GetEnumerator())
{
var name = ParseToken(enumerator, forwardSlash, diagnosticsBag);
var type = ParseToken(enumerator, colon, diagnosticsBag);
var keyValuePairs = ParseListOrKey(enumerator, diagnosticsBag);
if (name.Length == 0)
{
diagnosticsBag.Enqueue("Input has incorrect format. Name could not be parsed.");
}
if (type.Length == 0)
{
diagnosticsBag.Enqueue("Input has incorrect format. Type could not be parsed.");
}
if (!keyValuePairs.Any() ||
input.Last() == comma /*trailing comma is error?*/)
{
diagnosticsBag.Enqueue("Input has incorrect format. Key / Value pairs could not be parsed.");
}
return new ParserOutput(name, type, keyValuePairs, diagnosticsBag);
}
}
private string ParseToken(IEnumerator<char> enumerator, char separator, Queue<string> diagnosticsBag)
{
buffer.Clear();
var allowWhitespaces = separator != forwardSlash && separator != colon;
while (enumerator.MoveNext())
{
if (enumerator.Current == space && !allowWhitespaces)
{
diagnosticsBag.Enqueue($"Input has incorrect format. {(separator == forwardSlash ? "Name" : "Type")} cannot contain whitespaces.");
}
else if (enumerator.Current != separator)
{
buffer.Append(enumerator.Current);
}
else
return buffer.ToString();
}
return buffer.ToString();
}
private IEnumerable<Tuple<string, string>> ParseListOrKey(IEnumerator<char> enumerator, Queue<string> diagnosticsBag)
{
buffer.Clear();
var isList = false;
while (true)
{
var key = ParseToken(enumerator, equals, diagnosticsBag);
var value = ParseToken(enumerator, comma, diagnosticsBag);
if (key.Length == 0)
break;
yield return new Tuple<string, string>(key, value);
if (!isList && value.Length != 0)
{
isList = true;
}
else if (isList && value.Length == 0)
{
diagnosticsBag.Enqueue($"Input has incorrect format: malformed [key / value] list.");
}
}
}
}

Related

How to validate or compare string by omitting certain part of it

I have a string as below
"a1/type/xyz/parts"
The part where 'xyz' exists is dynamic and varies accordingly at any size. I want to compare just the two strings are equal discarding the 'xyz' portion exactly.
For example I have string as below
"a1/type/abcd/parts"
Then my comparison has to be successful
I tried with regular expression as below. Though my knowledge on regular expressions is limited and it did not work. Probably something wrong in the way I used.
var regex = #"^[a-zA-Z]{2}/\[a-zA-Z]{16}/\[0-9a-zA-Z]/\[a-z]{5}/$";
var result = Regex.Match("mystring", regex).Success;
Another idea is to get substring of first and last part omitting the unwanted portion and comparing it.
The comparison should be successful by discarding certain portion of the string with effective code.
Comparison successful cases
string1: "a1/type/21412ghh/parts"
string2: "a1/type/eeeee122ghh/parts"
Comparison failure cases:
string1: "a1/type/21412ghh/parts"
string2: "a2/type/eeeee122ghh/parts/mm"
In short "a1/type/abcd/parts" in this part of string the non-bold part is static always.
Honestly, you could do this using regex, and pull apart the string. But you have a specified delimiter, just use String.Split:
bool AreEqualAccordingToMyRules(string input1, string input2)
{
var split1 = input1.Split('/');
var split2 = input2.Split('/');
return split1.Length == split2.Length // strings must have equal number of sections
&& split1[0] == split2[0] // section 1 must match
&& split1[1] == split2[1] // section 2 must match
&& split1[3] == split2[3] // section 4 must match
}
You can try Split (to get parts) and Linq (to exclude 3d one)
using System.Linq;
...
string string1 = "a1/type/xyz/parts";
string string2 = "a1/type/abcd/parts";
bool result = string1
.Split('/') // string1 parts
.Where((v, i) => i != 2) // all except 3d one
.SequenceEqual(string2 // must be equal to
.Split('/') // string2 parts
.Where((v, i) => i != 2)); // except 3d one
Here's a small programm using string functions to compare the parts before and after the middle part:
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine(CutOutMiddle("a1/type/21412ghh/parts"));
Console.WriteLine("True: " + CompareNoMiddle("a1/type/21412ghh/parts", "a1/type/21412ghasdasdh/parts"));
Console.WriteLine("False: " + CompareNoMiddle("a1/type/21412ghh/parts", "a2/type/21412ghh/parts/someval"));
Console.WriteLine("False: " + CompareNoMiddle("a1/type/21412ghh/parts", "a1/type/21412ghasdasdh/parts/someappendix"));
}
private static bool CompareNoMiddle(string s1, string s2)
{
var s1CutOut = CutOutMiddle(s1);
var s2CutOut = CutOutMiddle(s2);
return s1CutOut == s2CutOut;
}
private static string CutOutMiddle(string val)
{
var fistSlash = val.IndexOf('/', 0);
var secondSlash = val.IndexOf('/', fistSlash+1);
var thirdSlash = val.IndexOf('/', secondSlash+1);
var firstPart = val.Substring(0, secondSlash);
var secondPart = val.Substring(thirdSlash, val.Length - thirdSlash);
return firstPart + secondPart;
}
}
returns
a1/type/parts
True: True
False: False
False: False
This solution should cover your case, as said by others, if you have a delimiter use it. In the function below you could change int skip for string ignore or something similar and within the comparison loop if(arrayStringOne[i] == ignore) continue;.
public bool Compare(string valueOne, string valueTwo, int skip) {
var delimiterOccuranceOne = valueOne.Count(f => f == '/');
var delimiterOccuranceTwo = valueTwo.Count(f => f == '/');
if(delimiterOccuranceOne == delimiterOccuranceTwo) {
var arrayStringOne = valueOne.Split('/');
var arrayStringTwo = valueTwo.Split('/');
for(int i=0; i < arrayStringOne.Length; ++i) {
if(i == skip) continue; // or instead of an index you could use a string
if(arrayStringOne[i] != arrayStringTwo[i]) {
return false;
}
}
return true;
}
return false;
}
Compare("a1/type/abcd/parts", "a1/type/xyz/parts", 2);

compare the characters in two strings

In C#, how do I compare the characters in two strings.
For example, let's say I have these two strings
"bc3231dsc" and "bc3462dsc"
How do I programically figure out the the strings
both start with "bc3" and end with "dsc"?
So the given would be two variables:
var1 = "bc3231dsc";
var2 = "bc3462dsc";
After comparing each characters from var1 to var2, I would want the output to be:
leftMatch = "bc3";
center1 = "231";
center2 = "462";
rightMatch = "dsc";
Conditions:
1. The strings will always be a length of 9 character.
2. The strings are not case sensitive.
The string class has 2 methods (StartsWith and Endwith) that you can use.
After reading your question and the already given answers i think there are some constraints are missing, which are maybe obvious to you, but not to the community. But maybe we can do a little guess work:
You'll have a bunch of string pairs that should be compared.
The two strings in each pair are of the same length or you are only interested by comparing the characters read simultaneously from left to right.
Get some kind of enumeration that tells me where each block starts and how long it is.
Due to the fact, that a string is only a enumeration of chars you could use LINQ here to get an idea of the matching characters like this:
private IEnumerable<bool> CommonChars(string first, string second)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var charsToCompare = first.Zip(second, (LeftChar, RightChar) => new { LeftChar, RightChar });
var matchingChars = charsToCompare.Select(pair => pair.LeftChar == pair.RightChar);
return matchingChars;
}
With this we can proceed and now find out how long each block of consecutive true and false flags are with this method:
private IEnumerable<Tuple<int, int>> Pack(IEnumerable<bool> source)
{
if (source == null)
throw new ArgumentNullException("source");
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
bool current = iterator.Current;
int index = 0;
int length = 1;
while (iterator.MoveNext())
{
if(current != iterator.Current)
{
yield return Tuple.Create(index, length);
index += length;
length = 0;
}
current = iterator.Current;
length++;
}
yield return Tuple.Create(index, length);
}
}
Currently i don't know if there is an already existing LINQ function that provides the same functionality. As far as i have already read it should be possible with SelectMany() (cause in theory you can accomplish any LINQ task with this method), but as an adhoc implementation the above was easier (for me).
These functions could then be used in a way something like this:
var firstString = "bc3231dsc";
var secondString = "bc3462dsc";
var commonChars = CommonChars(firstString, secondString);
var packs = Pack(commonChars);
foreach (var item in packs)
{
Console.WriteLine("Left side: " + firstString.Substring(item.Item1, item.Item2));
Console.WriteLine("Right side: " + secondString.Substring(item.Item1, item.Item2));
Console.WriteLine();
}
Which would you then give this output:
Left side: bc3
Right side: bc3
Left side: 231
Right side: 462
Left side: dsc
Right side: dsc
The biggest drawback is in someway the usage of Tuple cause it leads to the ugly property names Item1 and Item2 which are far away from being instantly readable. But if it is really wanted you could introduce your own simple class holding two integers and has some rock-solid property names. Also currently the information is lost about if each block is shared by both strings or if they are different. But once again it should be fairly simply to get this information also into the tuple or your own class.
static void Main(string[] args)
{
string test1 = "bc3231dsc";
string tes2 = "bc3462dsc";
string firstmatch = GetMatch(test1, tes2, false);
string lasttmatch = GetMatch(test1, tes2, true);
string center1 = test1.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
string center2 = test2.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
}
public static string GetMatch(string fist, string second, bool isReverse)
{
if (isReverse)
{
fist = ReverseString(fist);
second = ReverseString(second);
}
StringBuilder builder = new StringBuilder();
char[] ar1 = fist.ToArray();
for (int i = 0; i < ar1.Length; i++)
{
if (fist.Length > i + 1 && ar1[i].Equals(second[i]))
{
builder.Append(ar1[i]);
}
else
{
break;
}
}
if (isReverse)
{
return ReverseString(builder.ToString());
}
return builder.ToString();
}
public static string ReverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
Pseudo code of what you need..
int stringpos = 0
string resultstart = ""
while not end of string (either of the two)
{
if string1.substr(stringpos) == string1.substr(stringpos)
resultstart =resultstart + string1.substr(stringpos)
else
exit while
}
resultstart has you start string.. you can do the same going backwards...
Another solution you can use is Regular Expressions.
Regex re = new Regex("^bc3.*?dsc$");
String first = "bc3231dsc";
if(re.IsMatch(first)) {
//Act accordingly...
}
This gives you more flexibility when matching. The pattern above matches any string that starts in bc3 and ends in dsc with anything between except a linefeed. By changing .*? to \d, you could specify that you only want digits between the two fields. From there, the possibilities are endless.
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
List<string> common_str = commonStrings(s1,s2);
foreach ( var s in common_str)
Console.WriteLine(s);
}
static public List<string> commonStrings(string s1, string s2){
int len = s1.Length;
char [] match_chars = new char[len];
for(var i = 0; i < len ; ++i)
match_chars[i] = (Char.ToLower(s1[i])==Char.ToLower(s2[i]))? '#' : '_';
string pat = new String(match_chars);
Regex regex = new Regex("(#+)", RegexOptions.Compiled);
List<string> result = new List<string>();
foreach (Match match in regex.Matches(pat))
result.Add(s1.Substring(match.Index, match.Length));
return result;
}
}
for UPDATE CONDITION
using System;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
int len = 9;//s1.Length;//cond.1)
int l_pos = 0;
int r_pos = len;
for(int i=0;i<len && Char.ToLower(s1[i])==Char.ToLower(s2[i]);++i){
++l_pos;
}
for(int i=len-1;i>0 && Char.ToLower(s1[i])==Char.ToLower(s2[i]);--i){
--r_pos;
}
string leftMatch = s1.Substring(0,l_pos);
string center1 = s1.Substring(l_pos, r_pos - l_pos);
string center2 = s2.Substring(l_pos, r_pos - l_pos);
string rightMatch = s1.Substring(r_pos);
Console.Write(
"leftMatch = \"{0}\"\n" +
"center1 = \"{1}\"\n" +
"center2 = \"{2}\"\n" +
"rightMatch = \"{3}\"\n",leftMatch, center1, center2, rightMatch);
}
}

Best practice for parsing and validating mobile number

I wonder what the best practice for parsing and validating a mobile number before sending a text is. I've got code that works, but I'd like to find out better ways of doing it (as my last question, this is part of my early new years resolution to write better quality code!).
At the moment we are very forgiving when the user enters the number on the form, they can enter things like "+44 123 4567890", "00441234567890", "0123456789", "+44(0)123456789", "012-345-6789" or even "haven't got a phone".
However, to send the text the format must be 44xxxxxxxxxx (this is for UK mobiles only), so we need to parse it and validate it before we can send. Below is the code that I have for now (C#, asp.net), it would be great if anyone had any ideas on how to improve it.
Thanks,
Annelie
private bool IsMobileNumberValid(string mobileNumber)
{
// parse the number
_mobileNumber = ParsedMobileNumber(mobileNumber);
// check if it's the right length
if (_mobileNumber.Length != 12)
{
return false;
}
// check if it contains non-numeric characters
if(!Regex.IsMatch(_mobileNumber, #"^[-+]?[0-9]*\.?[0-9]+$"))
{
return false;
}
return true;
}
private string ParsedMobileNumber(string number)
{
number = number.Replace("+", "");
number = number.Replace(".", "");
number = number.Replace(" ", "");
number = number.Replace("-", "");
number = number.Replace("/", "");
number = number.Replace("(", "");
number = number.Replace(")", "");
number = number.Trim(new char[] { '0' });
if (!number.StartsWith("44"))
{
number = "44" + number;
}
return number;
}
EDIT
Here's what I ended up with:
private bool IsMobileNumberValid(string mobileNumber)
{
// remove all non-numeric characters
_mobileNumber = CleanNumber(mobileNumber);
// trim any leading zeros
_mobileNumber = _mobileNumber.TrimStart(new char[] { '0' });
// check for this in case they've entered 44 (0)xxxxxxxxx or similar
if (_mobileNumber.StartsWith("440"))
{
_mobileNumber = _mobileNumber.Remove(2, 1);
}
// add country code if they haven't entered it
if (!_mobileNumber.StartsWith("44"))
{
_mobileNumber = "44" + _mobileNumber;
}
// check if it's the right length
if (_mobileNumber.Length != 12)
{
return false;
}
return true;
}
private string CleanNumber(string phone)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(phone, "");
}
Use a regular expression to remove any non-numeric characters instead of trying to guess how a person will enter their number - this will remove all your Replace() and Trim() methods, unless you really need to trim a leading zero.
string CleanPhone(string phone)
{
Regex digitsOnly = new Regex(#"[^\d]");
return digitsOnly.Replace(phone, "");
}
Alternatively, I would recommend you use a masked textbox to collect the # (there are many options available) to allow only numeric input, and display the input with whatever format you'd like. This way you're guaranteeing that the value received will be all numeric characters.
Check out QAS, it's a commercial solution.
They have email, phone and address validations.
http://www.qas.com/phone-number-validation-web-service.htm
We use their services for Address and Email (not phone) and have been satisfied with it.
#annelie maybe you can update your regular expression to a more powerful one. Check out this site here. It contains many expressions but I think one of the top 2 expressions in the site should be suitable to you.
public class PhoneNumber
{
public PhoneNumber(string value)
{
if (String.IsNullOrEmpty(value))
throw new ArgumentNullException("numberString", Properties.Resources.PhoneNumberIsNullOrEmpty);
var match = new Regex(#"\+(\w+) \((\w+)\) (\w+)", RegexOptions.Compiled).Match(value);
if (match.Success)
{
ushort countryCode = 0;
ushort localCode = 0;
int number = 0;
if (UInt16.TryParse(match.Result("$1"), out countryCode) &&
UInt16.TryParse(match.Result("$2"), out localCode) &&
Int32.TryParse(match.Result("$3"), out number))
{
this.CountryCode = countryCode;
this.LocalCode = localCode;
this.Number = number;
}
}
else
{
throw new ArgumentNullException("numberString", Properties.Resources.PhoneNumberInvalid);
}
}
public PhoneNumber(int countryCode, int localCode, int number)
{
if (countryCode == 0)
throw new ArgumentOutOfRangeException("countryCode", Properties.Resources.PhoneNumberIsNullOrEmpty);
else if (localCode == 0)
throw new ArgumentOutOfRangeException("localCode", Properties.Resources.PhoneNumberIsNullOrEmpty);
else if (number == 0)
throw new ArgumentOutOfRangeException("number", Properties.Resources.PhoneNumberIsNullOrEmpty);
this.CountryCode = countryCode;
this.LocalCode = localCode;
this.Number = number;
}
public int CountryCode { get; set; }
public int LocalCode { get; set; }
public int Number { get; set; }
public override string ToString()
{
return String.Format(System.Globalization.CultureInfo.CurrentCulture, "+{0} ({1}) {2}", CountryCode, LocalCode, Number);
}
public static bool Validate(string value)
{
return new Regex(#"\+\w+ \(\w+\) \w+", RegexOptions.Compiled).IsMatch(value);
}
public static bool Validate(string countryCode, string localCode, string number, out PhoneNumber phoneNumber)
{
var valid = false;
phoneNumber = null;
try
{
ushort uCountryCode = 0;
ushort uLocalCode = 0;
int iNumber = 0;
// match only if all three numbers have been parsed successfully
valid = UInt16.TryParse(countryCode, out uCountryCode) &&
UInt16.TryParse(localCode, out uLocalCode) &&
Int32.TryParse(number, out iNumber);
if (valid)
phoneNumber = new PhoneNumber(uCountryCode, uLocalCode, iNumber);
}
catch (ArgumentException)
{
// still not match
}
return valid;
}
}

Reading CSV files in C# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone know of an open-source library that allows you to parse and read .csv files in C#?
Here, written by yours truly to use generic collections and iterator blocks. It supports double-quote enclosed text fields (including ones that span mulitple lines) using the double-escaped convention (so "" inside a quoted field reads as single quote character). It does not support:
Single-quote enclosed text
\ -escaped quoted text
alternate delimiters (won't yet work on pipe or tab delimited fields)
Unquoted text fields that begin with a quote
But all of those would be easy enough to add if you need them. I haven't benchmarked it anywhere (I'd love to see some results), but performance should be very good - better than anything that's .Split() based anyway.
Now on GitHub
Update: felt like adding single-quote enclosed text support. It's a simple change, but I typed it right into the reply window so it's untested. Use the revision link at the bottom if you'd prefer the old (tested) code.
public static class CSV
{
public static IEnumerable<IList<string>> FromFile(string fileName, bool ignoreFirstLine = false)
{
using (StreamReader rdr = new StreamReader(fileName))
{
foreach(IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
}
}
public static IEnumerable<IList<string>> FromStream(Stream csv, bool ignoreFirstLine=false)
{
using (var rdr = new StreamReader(csv))
{
foreach (IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
}
}
public static IEnumerable<IList<string>> FromReader(TextReader csv, bool ignoreFirstLine=false)
{
if (ignoreFirstLine) csv.ReadLine();
IList<string> result = new List<string>();
StringBuilder curValue = new StringBuilder();
char c;
c = (char)csv.Read();
while (csv.Peek() != -1)
{
switch (c)
{
case ',': //empty field
result.Add("");
c = (char)csv.Read();
break;
case '"': //qualified text
case '\'':
char q = c;
c = (char)csv.Read();
bool inQuotes = true;
while (inQuotes && csv.Peek() != -1)
{
if (c == q)
{
c = (char)csv.Read();
if (c != q)
inQuotes = false;
}
if (inQuotes)
{
curValue.Append(c);
c = (char)csv.Read();
}
}
result.Add(curValue.ToString());
curValue = new StringBuilder();
if (c == ',') c = (char)csv.Read(); // either ',', newline, or endofstream
break;
case '\n': //end of the record
case '\r':
//potential bug here depending on what your line breaks look like
if (result.Count > 0) // don't return empty records
{
yield return result;
result = new List<string>();
}
c = (char)csv.Read();
break;
default: //normal unqualified text
while (c != ',' && c != '\r' && c != '\n' && csv.Peek() != -1)
{
curValue.Append(c);
c = (char)csv.Read();
}
result.Add(curValue.ToString());
curValue = new StringBuilder();
if (c == ',') c = (char)csv.Read(); //either ',', newline, or endofstream
break;
}
}
if (curValue.Length > 0) //potential bug: I don't want to skip on a empty column in the last record if a caller really expects it to be there
result.Add(curValue.ToString());
if (result.Count > 0)
yield return result;
}
}
Take a look at A Fast CSV Reader on CodeProject.
The last time this question was asked, here's the answer I gave:
If you're just trying to read a CSV file with C#, the easiest thing is to use the Microsoft.VisualBasic.FileIO.TextFieldParser class. It's actually built into the .NET Framework, instead of being a third-party extension.
Yes, it is in Microsoft.VisualBasic.dll, but that doesn't mean you can't use it from C# (or any other CLR language).
Here's an example of usage, taken from the MSDN documentation:
Using MyReader As New _
Microsoft.VisualBasic.FileIO.TextFieldParser("C:\testfile.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & _
"is not valid and will be skipped.")
End Try
End While
End Using
Again, this example is in VB.NET, but it would be trivial to translate it to C#.
I really like the FileHelpers library. It's fast, it's C# 100%, it's available for FREE, it's very flexible and easy to use.
I'm implementing Daniel Pryden's answer in C#, so it is easier to cut and paste and customize. I think this is the easiest method for parsing CSV files. Just add a reference and you are basically done.
Add the Microsoft.VisualBasic Reference to your project
Then here is sample code in C# from Joel's answer:
using (Microsoft.VisualBasic.FileIO.TextFieldParser MyReader = new
Microsoft.VisualBasic.FileIO.TextFieldParser(filename))
{
MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
MyReader.SetDelimiters(",");
while (!MyReader.EndOfData)
{
try
{
string[] fields = MyReader.ReadFields();
if (first)
{
first = false;
continue;
}
// This is how I treat my data, you'll need to throw this out.
//"Type" "Post Date" "Description" "Amount"
LineItem li = new LineItem();
li.date = DateTime.Parse(fields[1]);
li.description = fields[2];
li.Value = Convert.ToDecimal(fields[3]);
lineitems1.Add(li);
}
catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex)
{
MessageBox.Show("Line " + ex.Message +
" is not valid and will be skipped.");
}
}
}
Besides parsing/reading, some libraries do other nice things like convert the parsed data into object for you.
Here is an example of using CsvHelper (a library I maintain) to read a CSV file into objects.
var csv = new CsvHelper( File.OpenRead( "file.csv" ) );
var myCustomObjectList = csv.Reader.GetRecords<MyCustomObject>();
By default, conventions are used for matching the headers/columns with the properties. You can change the behavior by changing the settings.
// Using attributes:
public class MyCustomObject
{
[CsvField( Name = "First Name" )]
public string StringProperty { get; set; }
[CsvField( Index = 0 )]
public int IntProperty { get; set; }
[CsvField( Ignore = true )]
public string ShouldIgnore { get; set; }
}
Sometimes you don't "own" the object you want to populate the data with. In this case you can use fluent class mapping.
// Fluent class mapping:
public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
public MyCustomObjectMap()
{
Map( m => m.StringProperty ).Name( "First Name" );
Map( m => m.IntProperty ).Index( 0 );
Map( m => m.ShouldIgnore ).Ignore();
}
}
You can use Microsoft.VisualBasic.FileIO.TextFieldParser
get below code example from above article
static void Main()
{
string csv_file_path=#"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}

C# Stripping / converting one or more characters

Is there a fast way (without having to explicitly looping through each character in a string) and either stripping or keeping it. In Visual FoxPro, there is a function CHRTRAN() that does it great. Its on a 1:1 character replacement, but if no character in the alternate position, its stripped from the final string. Ex
CHRTRAN( "This will be a test", "it", "X" )
will return
"ThXs wXll be a es"
Notice the original "i" is converted to "X", and lower case "t" is stripped out.
I looked at the replace for similar intent, but did not see an option to replace with nothing.
I'm looking to make some generic routines to validate multiple origins of data that have different types of input restrictions. Some of the data may be coming from external sources, so its not just textbox entry validation I need to test for.
Thanks
All you need is a couple of calls to String.Replace().
string s = "This will be a test";
s = s.Replace("i", "X");
s = s.Replace("t", "");
Note that Replace() returns a new string. It does not alter the string itself.
Does this what you want?
"This will be a test".Replace("i", "X").Replace("t", String.Empty)
Here is a simple implementation of the CHRTRAN function - it does not work if the string contains \0 and is quite messy. You could write a nicer one using loops, but I just wanted to try it using LINQ.
public static String ChrTran(String input, String source, String destination)
{
return source.Aggregate(
input,
(current, symbol) => current.Replace(
symbol,
destination.ElementAtOrDefault(source.IndexOf(symbol))),
preResult => preResult.Replace("\0", String.Empty));
}
And the you can use it.
// Returns "ThXs wXll be a es"
String output = ChrTran("This will be a test", "it", "X");
Just to have a clean solution - the same without LINQ and working for the \0 cases, too, and it is almost in place because of using a StringBuilder but won't modify the input, of course.
public static String ChrTran(String input, String source, String destination)
{
StringBuilder result = new StringBuilder(input);
Int32 minLength = Math.Min(source.Length, destination.Length);
for (Int32 i = 0; i < minLength; i++)
{
result.Replace(source[i], destination[i]);
}
for (Int32 i = minLength; i < searchPattern.Length; i++)
{
result.Replace(source[i].ToString(), String.Empty);
}
return result.ToString();
}
Null reference handling is missing.
Inspired by tvanfosson's solution, I gave LINQ a second shot.
public static String ChrTran(String input, String source, String destination)
{
return new String(input.
Where(symbol =>
!source.Contains(symbol) ||
source.IndexOf(symbol) < destination.Length).
Select(symbol =>
source.Contains(symbol)
? destination[source.IndexOf(symbol)]
: symbol).
ToArray());
}
Here was my final function and works perfectly as expected.
public static String ChrTran(String ToBeCleaned,
String ChangeThese,
String IntoThese)
{
String CurRepl = String.Empty;
for (int lnI = 0; lnI < ChangeThese.Length; lnI++)
{
if (lnI < IntoThese.Length)
CurRepl = IntoThese.Substring(lnI, 1);
else
CurRepl = String.Empty;
ToBeCleaned = ToBeCleaned.Replace(ChangeThese.Substring(lnI, 1), CurRepl);
}
return ToBeCleaned;
}
This is a case where I think using LINQ overcomplicates the matter. This is simple and to the point:
private static string Translate(string input, string from, string to)
{
StringBuilder sb = new StringBuilder();
foreach (char ch in input)
{
int i = from.IndexOf(ch);
if (from.IndexOf(ch) < 0)
{
sb.Append(ch);
}
else
{
if (i >= 0 && i < to.Length)
{
sb.Append(to[i]);
}
}
}
return sb.ToString();
}
To "replace with nothing", just replace with an empty string. This will give you:
String str = "This will be a test";
str = str.Replace("i", "X");
str = str.Replace("t","");
A more general version as a string extension. Like the others this does not do a translation in place since strings are immutable in C#, but instead returns a new string with the replacements as specified.
public static class StringExtensions
{
public static string Translate( this string source, string from, string to )
{
if (string.IsNullOrEmpty( source ) || string.IsNullOrEmpty( from ))
{
return source;
}
return string.Join( "", source.ToCharArray()
.Select( c => Translate( c, from, to ) )
.Where( c => c != null )
.ToArray() );
}
private static string Translate( char c, string from, string to )
{
int i = from != null ? from.IndexOf( c ) : -1;
if (i >= 0)
{
return (to != null && to.Length > i)
? to[i].ToString()
: null;
}
else
{
return c.ToString();
}
}
}

Categories