Spelling libraries (like hunspell) in UWP Applications? - c#

I am porting an application for writers to the UWP platorm. The only piece of the puzzle i have left is the NHunspell library. I use it extensively for spell checking and thesaurus features. I've customized the heck out of it, and created custom dictionaries for various things (i.e. a different dictionary for each writing project). This library is a beautiful thing.
However, I can't seem to include this DLL in my UWP application.
1) Is there a way to force the usage of this DLL? I really do like how the NHunSpell system is set up. It makes common sense and is very fast and easy to use.
2) If not, can anyone recommend a better solution for custom dictionaries, customized spell checking, etc?
Update 3
After considerable update and reading online, I found a link discussing the theory of spell checking. Here is one quick example (the one I used the most).
http://www.anotherchris.net/csharp/how-to-write-a-spelling-corrector-in-csharp/
After reading this article, taking that base code, and stripping the English words from the Hunspell .dic files, I have created my own spell-checking library that works in UWP.
Once I get it solidified, I will post it as an answer below to donate to the SO community. :)
Update 2
I'm conceding the use of Hunspell. It doesn't look like it is possible at all... are there any other libraries/packages that anyone can suggest?
UPDATE :
I probably need to rephrase the statement that I can't include the DLL: I cannot include the DLL through NuGet. It complains that the DLL is not compatible with the UAP/UWP platform.
I am able to MANUALLY include the DLL in my project by linking to an existing DLL (not NuGet). However, that DLL does indeed prove to be incompatible with the UAP platform. A simple call to spellcheck a word works fine in WinForms, but immediately crashes with System.IO.FileNotFoundException.
The constructor of NHunspell does reach out to load the associated .dic and .aff files. However, I have mitigated this by loading the files into memory and then call the alternate constructor which takes a byte array instead of a file name for each of those files. It still crashes, but with a new Method not found error:
String System.AppDomain.get_RelativeSearchPath()
I am looking for any spell checking engine that will work within the UAP framework. I would prefer for it to be NHunspell simply for familiarity reasons. However, I'm not blind to the fact that this is becoming increasingly less-possible as an option.
People I work with have suggested that I use the built-in spellchecking options. However, I can't use the built-in Windows 10/TextBox spell checking features (that I know of), because I can't control custom dictionaries and I can't disable things like auto-capitalize and word-replacement (where it replaces the word for you if it thinks it is close enough to the right guess). Those things are chapter-suicide for writers! A writer can turn them off at the OS level, but they may want them on for other apps, just not this one.
Please let me know if there is a work-around for NHunspell. And if you don't know of a work-around, please let me know your best replacement custom spellcheck engine that works within the UAP framework.
As a side note, I also use NHunspell for its thesaurus capability. It works very well in my windows apps. I would also have to replace this functionality as well -- hopefully with the same engine as the spellcheck engine. However, if you know of a good thesaurus engine (but it doesn't spell check), that's good too!
Thank you!!

I download the source code of NHunspell library and I tried to build a library with UWP support, however I found problems with the Marshalling (Marshalling.cs)
The package loads dlls that only working in x86 and x64 architecture, so in arm (mobiles, tablets) the app will not work.
The package loads dlls with system calls:
[DllImport("kernel32.dll")]
internal static extern IntPtr LoadLibrary(string fileName);
and I think that it needs to be rewrite for working in UWP, because UWP uses a sandboxing.
IMHO there are only two options:
1) Rewrite the Marshalling class with the restrictions of UWP.
2) Not use Hunspell in your program.
I don't have a large knowledge about dlls with UWP, but I believe that the rewrite could be very difficult.

As promised, here is the class I built to do my spell checking.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;
namespace Com.HanelDev.HSpell
{
public class HSpellProcess
{
private Dictionary<string, string> _dictionary = new Dictionary<string, string>();
public int MaxSuggestionResponses { get; set; }
public HSpellProcess()
{
MaxSuggestionResponses = 10;
}
public void AddToDictionary(string w)
{
if (!_dictionary.ContainsKey(w.ToLower()))
{
_dictionary.Add(w.ToLower(), w);
}
else
{
// Upper case words are more specific (but may be the first word
// in a sentence.) Lower case words are more generic.
// If you put an upper-case word in the dictionary, then for
// it to be "correct" it must match case. This is not true
// for lower-case words.
// We want to only replace existing words with their more
// generic versions, not the other way around.
if (_dictionary[w.ToLower()].CaseSensitive())
{
_dictionary[w.ToLower()] = w;
}
}
}
public void LoadDictionary(byte[] dictionaryFile, bool resetDictionary = false)
{
if (resetDictionary)
{
_dictionary = new Dictionary<string, string>();
}
using (MemoryStream ms = new MemoryStream(dictionaryFile))
{
using (StreamReader sr = new StreamReader(ms))
{
string tmp = sr.ReadToEnd();
tmp = tmp.Replace("\r\n", "\r").Replace("\n", "\r");
string [] fileData = tmp.Split("\r".ToCharArray());
foreach (string line in fileData)
{
if (string.IsNullOrWhiteSpace(line) || line.StartsWith("#"))
{
continue;
}
string word = line;
// I added all of this for file imports (not array imports)
// to be able to handle words from Hunspell dictionaries.
// I don't get the hunspell derivatives, but at least I get
// the root word.
if (line.Contains("/"))
{
string[] arr = line.Split("/".ToCharArray());
word = arr[0];
}
AddToDictionary(word);
}
}
}
}
public void LoadDictionary(Stream dictionaryFileStream, bool resetDictionary = false)
{
string s = "";
using (StreamReader sr = new StreamReader(dictionaryFileStream))
{
s = sr.ReadToEnd();
}
byte [] bytes = Encoding.UTF8.GetBytes(s);
LoadDictionary(bytes, resetDictionary);
}
public void LoadDictionary(List<string> words, bool resetDictionary = false)
{
if (resetDictionary)
{
_dictionary = new Dictionary<string, string>();
}
foreach (string line in words)
{
if (string.IsNullOrWhiteSpace(line) || line.StartsWith("#"))
{
continue;
}
AddToDictionary(line);
}
}
public string ExportDictionary()
{
StringBuilder sb = new StringBuilder();
foreach (string k in _dictionary.Keys)
{
sb.AppendLine(_dictionary[k]);
}
return sb.ToString();
}
public HSpellCorrections Correct(string word)
{
HSpellCorrections ret = new HSpellCorrections();
ret.Word = word;
if (_dictionary.ContainsKey(word.ToLower()))
{
string testWord = word;
string dictWord = _dictionary[word.ToLower()];
if (!dictWord.CaseSensitive())
{
testWord = testWord.ToLower();
dictWord = dictWord.ToLower();
}
if (testWord == dictWord)
{
ret.SpelledCorrectly = true;
return ret;
}
}
// At this point, we know the word is assumed to be spelled incorrectly.
// Go get word candidates.
ret.SpelledCorrectly = false;
Dictionary<string, HSpellWord> candidates = new Dictionary<string, HSpellWord>();
List<string> edits = Edits(word);
GetCandidates(candidates, edits);
if (candidates.Count > 0)
{
return BuildCandidates(ret, candidates);
}
// If we didn't find any candidates by the main word, look for second-level candidates based on the original edits.
foreach (string item in edits)
{
List<string> round2Edits = Edits(item);
GetCandidates(candidates, round2Edits);
}
if (candidates.Count > 0)
{
return BuildCandidates(ret, candidates);
}
return ret;
}
private void GetCandidates(Dictionary<string, HSpellWord> candidates, List<string> edits)
{
foreach (string wordVariation in edits)
{
if (_dictionary.ContainsKey(wordVariation.ToLower()) &&
!candidates.ContainsKey(wordVariation.ToLower()))
{
HSpellWord suggestion = new HSpellWord(_dictionary[wordVariation.ToLower()]);
suggestion.RelativeMatch = RelativeMatch.Compute(wordVariation, suggestion.Word);
candidates.Add(wordVariation.ToLower(), suggestion);
}
}
}
private HSpellCorrections BuildCandidates(HSpellCorrections ret, Dictionary<string, HSpellWord> candidates)
{
var suggestions = candidates.OrderByDescending(c => c.Value.RelativeMatch);
int x = 0;
ret.Suggestions.Clear();
foreach (var suggest in suggestions)
{
x++;
ret.Suggestions.Add(suggest.Value.Word);
// only suggest the first X words.
if (x >= MaxSuggestionResponses)
{
break;
}
}
return ret;
}
private List<string> Edits(string word)
{
var splits = new List<Tuple<string, string>>();
var transposes = new List<string>();
var deletes = new List<string>();
var replaces = new List<string>();
var inserts = new List<string>();
// Splits
for (int i = 0; i < word.Length; i++)
{
var tuple = new Tuple<string, string>(word.Substring(0, i), word.Substring(i));
splits.Add(tuple);
}
// Deletes
for (int i = 0; i < splits.Count; i++)
{
string a = splits[i].Item1;
string b = splits[i].Item2;
if (!string.IsNullOrEmpty(b))
{
deletes.Add(a + b.Substring(1));
}
}
// Transposes
for (int i = 0; i < splits.Count; i++)
{
string a = splits[i].Item1;
string b = splits[i].Item2;
if (b.Length > 1)
{
transposes.Add(a + b[1] + b[0] + b.Substring(2));
}
}
// Replaces
for (int i = 0; i < splits.Count; i++)
{
string a = splits[i].Item1;
string b = splits[i].Item2;
if (!string.IsNullOrEmpty(b))
{
for (char c = 'a'; c <= 'z'; c++)
{
replaces.Add(a + c + b.Substring(1));
}
}
}
// Inserts
for (int i = 0; i < splits.Count; i++)
{
string a = splits[i].Item1;
string b = splits[i].Item2;
for (char c = 'a'; c <= 'z'; c++)
{
inserts.Add(a + c + b);
}
}
return deletes.Union(transposes).Union(replaces).Union(inserts).ToList();
}
public HSpellCorrections CorrectFrom(string txt, int idx)
{
if (idx >= txt.Length)
{
return null;
}
// Find the next incorrect word.
string substr = txt.Substring(idx);
int idx2 = idx;
List<string> str = substr.Split(StringExtensions.WordDelimiters).ToList();
foreach (string word in str)
{
string tmpWord = word;
if (string.IsNullOrEmpty(word))
{
idx2++;
continue;
}
// If we have possessive version of things, strip the 's off before testing
// the word. THis will solve issues like "My [mother's] favorite ring."
if (tmpWord.EndsWith("'s"))
{
tmpWord = word.Substring(0, tmpWord.Length - 2);
}
// Skip things like ***, #HashTagsThatMakeNoSense and 1,2345.67
if (!tmpWord.IsWord())
{
idx2 += word.Length + 1;
continue;
}
HSpellCorrections cor = Correct(tmpWord);
if (cor.SpelledCorrectly)
{
idx2 += word.Length + 1;
}
else
{
cor.Index = idx2;
return cor;
}
}
return null;
}
}
}

You could use the windows built-in spell checker directly so you can control it's behavior better. And then apply your results to the textbox control yourself.
Have a look at ISpellChecker. It let's you add your own custom dictionary and has a lot more options to control its behavior. And yes, it's available for UWP.

Related

Separating Syllables into an array or list Unity C#

Im trying to seperate each syllable of an english word/name into a list. The code I have is counting the syllables perfectly which I also need, but I cannot figure out how to seperate each syllable from the words given.
Im using an input field but for now the example word is my name.
im very new to programming so please keep that in mind.
preferred example output would be something like {"joh", "nat", "hon"}
using System.Collections;
using System.Collections.Generic;
using UnityEngine;
public class SyllableSeperator : MonoBehaviour
{
string word = "johnathon";
private void Start()
{
SyllableCount(word);
}
public static int SyllableCount(string word)
{
word = word.ToLower().Trim();
List<string> wordList = new List<string>();
bool lastWasVowel = false;
string vowels = "aeiouy";
int count = 0;
foreach (char c in word)
{
if (vowels.Contains(c))
{
if (!lastWasVowel)
{
count++;
lastWasVowel = true;
}
}
else
{
lastWasVowel = false;
}
}
if((word.EndsWith("e") || (word.EndsWith("es") || word.EndsWith("ed"))) && !word.EndsWith("le"))
{
count--;
}
return count;
}
}
what i have tried just does not work at all so I figured I wouldn't post that code.
Again please keep in mind that I am brand new to Unity and programming in general. Someone may have already asked how to do this but the code was too advanced for me.
Assuming you are checking the syllable by going through each character, you can make a temporary string and keep adding characters to the string.
Once you determine that string is good enough to be a syllable, add it to the list.
Like so:
public static int GetSyllableCount(string word, out List<string> syllableList)
{
word = word.ToLower().Trim();
List<string> syllableList = new List<string>();
bool lastWasVowel = false;
string vowels = "aeiouy";
StringBuilder currSyllable = new StringBuilder();
foreach (char c in word)
{
if (vowels.Contains(c))
{
if (!lastWasVowel)
{
lastWasVowel = true;
// Finish this syllable and add to the list
syllableList.Add(currSyllable.ToString());
currSyllable.Clear();
}
}
else
{
lastWasVowel = false;
}
// Add this character to the current syllable
currSyllable.Append(c);
}
if((word.EndsWith("e") || (word.EndsWith("es") || word.EndsWith("ed"))) && !word.EndsWith("le"))
{
// Remove the last syllable?
syllableList.Remove(syllableList.Count - 1);
}
return syllableList.Count;
}
The out parm modifer allows you to output another value.

parsing functions from text file without third party

I’ve been struggling wrapping my head around parsers and lexers, which I’m not even sure is the best way to tackle my challenge. I don’t want to use any third party libraries because of the unnecessary overhead, project size, and possible license issues.
Also there won’t be any arithmetic’s, loops, while’s, for’s or foreach’s
It’s a very simple parser that adds or instantiates objects from a text file.
For instance,
buildings.addWithMaterials([buildable:mansion], {{[materials: brick], [materials: wood]}, {[materials: wood], [materials: brick]}});
Parsing this text would add a Mansion made of two pieces of brick and wood to the buildings Collection.
The object buildable contains the properties Name which in this case is Mansion
and some building components which in this case are the materials Brick and Wood.
Any tip/direction to search for doing this?
I've looked and searched within stackoverflow, most entries I've stumbled upon refer to third parties
like Sprache and more.
If I missed an article, :/ sorry, please point it out to me
Thanks and kind regards, Nick
Providing all of the items in your script are in the same format, you can use something like this to parse it:
public static class ScriptParser
{
public static void Parse(string filename)
{
string[] script = null;
using (StreamReader reader = File.OpenText(filename))
{
// read the whole file, remove all line breaks and split
// by semicolons to get individual commands
script = reader.ReadToEnd().Replace("\r", string.Empty.Replace("\n", string.Empty)).Split(';');
}
foreach (string command in script)
HandleCommand(command);
}
// as arguments seem to be grouped you can't just split
// by commas so this will get the arguments recursively
private static object[] ParseArgs(string argsString)
{
int startPos = 0;
int depth = 0;
List<object> args = new List<object>();
Action<int> addArg = pos =>
{
string arg = argsString.Substring(startPos, pos - startPos).Trim();
if (arg.StartsWith("{") && arg.EndsWith("}"))
{
arg = arg.Substring(1, arg.Length - 2);
args.Add(ParseArgs(arg));
}
else
{
args.Add(arg);
}
}
for (int i = 0; i < argsString.Length; i++)
{
switch (argsString[i])
{
case ',':
if (depth == 0)
{
addArg(i);
startPos = i + 1;
}
break;
case '{':
// increase depth of parsing so that commas
// within braces are ignored
depth++;
break;
case '}':
// decrease depth when exiting braces
depth--;
break;
}
}
// as there is no final comma
addArg(argsString.Length);
return args.ToArray();
}
private static void HandleCommand(string commandString)
{
Command command = new Command();
string prefix = commandString.Substring(0, commandString.IndexOf("("));
command.Collection = prefix.Split('.')[0];
command.Action = prefix.Split('.')[1];
string commandArgs = commandString.Substring(commandString.IndexOf("("));
commandArgs = commandArgs.Substring(1, commandArgs.Length - 2);
command.Args = ParseArgs(commandArgs);
command.Execute();
}
private class Command
{
public string Collection;
public string Action;
public object[] Args;
public void Execute()
{
// need to handle any expected commands
if (Collection == "buildings")
{
if (Action == "addWithMaterials")
{
buildings.AddWithMaterials((string)Args[0], (object[])Args[1]);
}
}
}
}
//Eg.
public static BuildingCollection buildings = new BuildingCollection();
public class BuildingCollection
{
public void AddWithMaterials(string building, object[] materials)
{
Building building = new Building();
foreach (object[] materialsPart in materials)
// more stuff for you to work out
// you will need to match the strings to material classes
}
}
}
Usage:
ScriptParser.Parse("scriptfile.txt");
Note: This lacks error handling, which you will definitely want when parsing a file as it could contain anything.

Counting/sorting characters in a text file

I am trying to write a program that reads a text file, sorts it by character, and keeps track of how many times each character appears in the document. This is what I have so far.
class Program
{
static void Main(string[] args)
{
CharFrequency[] Charfreq = new CharFrequency[128];
try
{
string line;
System.IO.StreamReader file = new System.IO.StreamReader(#"C:\Users\User\Documents\Visual Studio 2013\Projects\Array_Project\wap.txt");
while ((line = file.ReadLine()) != null)
{
int ch = file.Read();
if (Charfreq.Contains(ch))
{
}
}
file.Close();
Console.ReadLine();
}
catch (Exception e)
{
Console.WriteLine("The process failed: {0}", e.ToString());
}
}
}
My question is, what should go in the if statement here?
I also have a Charfrequency class, which I'll include here in case it is helpful/necessary that I include it (and yes, it is necessary that I use an array versus a list or arraylist).
public class CharFrequency
{
private char m_character;
private long m_count;
public CharFrequency(char ch)
{
Character = ch;
Count = 0;
}
public CharFrequency(char ch, long charCount)
{
Character = ch;
Count = charCount;
}
public char Character
{
set
{
m_character = value;
}
get
{
return m_character;
}
}
public long Count
{
get
{
return m_count;
}
set
{
if (value < 0)
value = 0;
m_count = value;
}
}
public void Increment()
{
m_count++;
}
public override bool Equals(object obj)
{
bool equal = false;
CharFrequency cf = new CharFrequency('\0', 0);
cf = (CharFrequency)obj;
if (this.Character == cf.Character)
equal = true;
return equal;
}
public override int GetHashCode()
{
return m_character.GetHashCode();
}
public override string ToString()
{
String s = String.Format("'{0}' ({1}) = {2}", m_character, (byte)m_character, m_count);
return s;
}
}
Have a look at this post.
https://codereview.stackexchange.com/questions/63872/counting-the-number-of-character-occurrences
It uses LINQ to achieve your goal
You shouldn't use Contains
first you need to initialize your Charfreq array:
CharFrequency[] Charfreq = new CharFrequency[128];
for (int i = 0; i < Charferq.Length; i++)
{
Charfreq[i] = new CharFrequency((char)i);
}
try
then you can
int ch;
// -1 means that there are no more characters to read,
// otherwise ch is the char read
while ((ch = file.Read()) != -1)
{
CharFrequency cf = new CharFrequency((char)ch);
// This works because CharFrequency overloads the
// Equals method, and the Equals method checks only
// for the Character property of CharFrequency
int ix = Array.IndexOf(Charfreq, cf);
// if there is the "right" charfrequency
if (ix != -1)
{
Charfreq[ix].Increment();
}
}
Note that this isn't the way I would write the program. This is the minimum changes needed to make your program working.
As a sidenote, this program will count the "frequency" of ASCII characters (characters with code <= 127)
CharFrequency cf = new CharFrequency('\0', 0);
cf = (CharFrequency)obj;
And this is an useless initialization:
CharFrequency cf = (CharFrequency)obj;
is enough, otherwise you are creating a CharFrequency just to discard it the line below.
A dictionary is well suited for a task like this. You didn't say which character set and encoding the file was in. So, because Unicode is so common, let's assume the Unicode character set and UTF-8 encoding. (After all, it is the default for .NET, Java, JavaScript, HTML, XML,….) If that's not the case then read the file using the applicable encoding and fix your code because you currently are using UTF-8 in your StreamReader.
Next comes iterating across the "characters". And then incrementing the count for a "character" in the dictionary as it is seen in the text.
Unicode does have a few complex features. One is combining characters, where a base character can be overlaid with diacritics etc. Users view such combinations as one "character", or, as Unicode calls them, graphemes. Thankfully, .NET gives is the StringInfo class that iterates over them as a "text element."
So, if you think about it, using an array would be quite difficult. You'd have to build your own dictionary on top of your array.
The example below uses a Dictionary and is runnable using a LINQPad script. After it creates the dictionary, it orders and dumps it with a nice display.
var path = Path.GetTempFileName();
// Get some text we know is encoded in UTF-8 to simplify the code below
// and contains combining codepoints as a matter of example.
using (var web = new WebClient())
{
web.DownloadFile("http://superuser.com/questions/52671/which-unicode-characters-do-smilies-like-%D9%A9-%CC%AE%CC%AE%CC%83-%CC%83%DB%B6-consist-of", path);
}
// since the question asks to analyze a file
var content = File.ReadAllText(path, Encoding.UTF8);
var frequency = new Dictionary<String, int>();
var itor = System.Globalization.StringInfo.GetTextElementEnumerator(content);
while (itor.MoveNext())
{
var element = (String)itor.Current;
if (!frequency.ContainsKey(element))
{
frequency.Add(element, 0);
}
frequency[element]++;
}
var histogram = frequency
.OrderByDescending(f => f.Value)
// jazz it up with the list of codepoints in each text element
.Select(pair =>
{
var bytes = Encoding.UTF32.GetBytes(pair.Key);
var codepoints = new UInt32[bytes.Length/4];
Buffer.BlockCopy(bytes, 0, codepoints, 0, bytes.Length);
return new {
Count = pair.Value,
textElement = pair.Key,
codepoints = codepoints.Select(cp => String.Format("U+{0:X4}", cp) ) };
});
histogram.Dump(); // For use in LINQPad

what to change to use data from csv file not from SQL db [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone know of an open-source library that allows you to parse and read .csv files in C#?
Here, written by yours truly to use generic collections and iterator blocks. It supports double-quote enclosed text fields (including ones that span mulitple lines) using the double-escaped convention (so "" inside a quoted field reads as single quote character). It does not support:
Single-quote enclosed text
\ -escaped quoted text
alternate delimiters (won't yet work on pipe or tab delimited fields)
Unquoted text fields that begin with a quote
But all of those would be easy enough to add if you need them. I haven't benchmarked it anywhere (I'd love to see some results), but performance should be very good - better than anything that's .Split() based anyway.
Now on GitHub
Update: felt like adding single-quote enclosed text support. It's a simple change, but I typed it right into the reply window so it's untested. Use the revision link at the bottom if you'd prefer the old (tested) code.
public static class CSV
{
public static IEnumerable<IList<string>> FromFile(string fileName, bool ignoreFirstLine = false)
{
using (StreamReader rdr = new StreamReader(fileName))
{
foreach(IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
}
}
public static IEnumerable<IList<string>> FromStream(Stream csv, bool ignoreFirstLine=false)
{
using (var rdr = new StreamReader(csv))
{
foreach (IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
}
}
public static IEnumerable<IList<string>> FromReader(TextReader csv, bool ignoreFirstLine=false)
{
if (ignoreFirstLine) csv.ReadLine();
IList<string> result = new List<string>();
StringBuilder curValue = new StringBuilder();
char c;
c = (char)csv.Read();
while (csv.Peek() != -1)
{
switch (c)
{
case ',': //empty field
result.Add("");
c = (char)csv.Read();
break;
case '"': //qualified text
case '\'':
char q = c;
c = (char)csv.Read();
bool inQuotes = true;
while (inQuotes && csv.Peek() != -1)
{
if (c == q)
{
c = (char)csv.Read();
if (c != q)
inQuotes = false;
}
if (inQuotes)
{
curValue.Append(c);
c = (char)csv.Read();
}
}
result.Add(curValue.ToString());
curValue = new StringBuilder();
if (c == ',') c = (char)csv.Read(); // either ',', newline, or endofstream
break;
case '\n': //end of the record
case '\r':
//potential bug here depending on what your line breaks look like
if (result.Count > 0) // don't return empty records
{
yield return result;
result = new List<string>();
}
c = (char)csv.Read();
break;
default: //normal unqualified text
while (c != ',' && c != '\r' && c != '\n' && csv.Peek() != -1)
{
curValue.Append(c);
c = (char)csv.Read();
}
result.Add(curValue.ToString());
curValue = new StringBuilder();
if (c == ',') c = (char)csv.Read(); //either ',', newline, or endofstream
break;
}
}
if (curValue.Length > 0) //potential bug: I don't want to skip on a empty column in the last record if a caller really expects it to be there
result.Add(curValue.ToString());
if (result.Count > 0)
yield return result;
}
}
Take a look at A Fast CSV Reader on CodeProject.
The last time this question was asked, here's the answer I gave:
If you're just trying to read a CSV file with C#, the easiest thing is to use the Microsoft.VisualBasic.FileIO.TextFieldParser class. It's actually built into the .NET Framework, instead of being a third-party extension.
Yes, it is in Microsoft.VisualBasic.dll, but that doesn't mean you can't use it from C# (or any other CLR language).
Here's an example of usage, taken from the MSDN documentation:
Using MyReader As New _
Microsoft.VisualBasic.FileIO.TextFieldParser("C:\testfile.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & _
"is not valid and will be skipped.")
End Try
End While
End Using
Again, this example is in VB.NET, but it would be trivial to translate it to C#.
I really like the FileHelpers library. It's fast, it's C# 100%, it's available for FREE, it's very flexible and easy to use.
I'm implementing Daniel Pryden's answer in C#, so it is easier to cut and paste and customize. I think this is the easiest method for parsing CSV files. Just add a reference and you are basically done.
Add the Microsoft.VisualBasic Reference to your project
Then here is sample code in C# from Joel's answer:
using (Microsoft.VisualBasic.FileIO.TextFieldParser MyReader = new
Microsoft.VisualBasic.FileIO.TextFieldParser(filename))
{
MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
MyReader.SetDelimiters(",");
while (!MyReader.EndOfData)
{
try
{
string[] fields = MyReader.ReadFields();
if (first)
{
first = false;
continue;
}
// This is how I treat my data, you'll need to throw this out.
//"Type" "Post Date" "Description" "Amount"
LineItem li = new LineItem();
li.date = DateTime.Parse(fields[1]);
li.description = fields[2];
li.Value = Convert.ToDecimal(fields[3]);
lineitems1.Add(li);
}
catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex)
{
MessageBox.Show("Line " + ex.Message +
" is not valid and will be skipped.");
}
}
}
Besides parsing/reading, some libraries do other nice things like convert the parsed data into object for you.
Here is an example of using CsvHelper (a library I maintain) to read a CSV file into objects.
var csv = new CsvHelper( File.OpenRead( "file.csv" ) );
var myCustomObjectList = csv.Reader.GetRecords<MyCustomObject>();
By default, conventions are used for matching the headers/columns with the properties. You can change the behavior by changing the settings.
// Using attributes:
public class MyCustomObject
{
[CsvField( Name = "First Name" )]
public string StringProperty { get; set; }
[CsvField( Index = 0 )]
public int IntProperty { get; set; }
[CsvField( Ignore = true )]
public string ShouldIgnore { get; set; }
}
Sometimes you don't "own" the object you want to populate the data with. In this case you can use fluent class mapping.
// Fluent class mapping:
public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
public MyCustomObjectMap()
{
Map( m => m.StringProperty ).Name( "First Name" );
Map( m => m.IntProperty ).Index( 0 );
Map( m => m.ShouldIgnore ).Ignore();
}
}
You can use Microsoft.VisualBasic.FileIO.TextFieldParser
get below code example from above article
static void Main()
{
string csv_file_path=#"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}

Reading CSV files in C# [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Does anyone know of an open-source library that allows you to parse and read .csv files in C#?
Here, written by yours truly to use generic collections and iterator blocks. It supports double-quote enclosed text fields (including ones that span mulitple lines) using the double-escaped convention (so "" inside a quoted field reads as single quote character). It does not support:
Single-quote enclosed text
\ -escaped quoted text
alternate delimiters (won't yet work on pipe or tab delimited fields)
Unquoted text fields that begin with a quote
But all of those would be easy enough to add if you need them. I haven't benchmarked it anywhere (I'd love to see some results), but performance should be very good - better than anything that's .Split() based anyway.
Now on GitHub
Update: felt like adding single-quote enclosed text support. It's a simple change, but I typed it right into the reply window so it's untested. Use the revision link at the bottom if you'd prefer the old (tested) code.
public static class CSV
{
public static IEnumerable<IList<string>> FromFile(string fileName, bool ignoreFirstLine = false)
{
using (StreamReader rdr = new StreamReader(fileName))
{
foreach(IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
}
}
public static IEnumerable<IList<string>> FromStream(Stream csv, bool ignoreFirstLine=false)
{
using (var rdr = new StreamReader(csv))
{
foreach (IList<string> item in FromReader(rdr, ignoreFirstLine)) yield return item;
}
}
public static IEnumerable<IList<string>> FromReader(TextReader csv, bool ignoreFirstLine=false)
{
if (ignoreFirstLine) csv.ReadLine();
IList<string> result = new List<string>();
StringBuilder curValue = new StringBuilder();
char c;
c = (char)csv.Read();
while (csv.Peek() != -1)
{
switch (c)
{
case ',': //empty field
result.Add("");
c = (char)csv.Read();
break;
case '"': //qualified text
case '\'':
char q = c;
c = (char)csv.Read();
bool inQuotes = true;
while (inQuotes && csv.Peek() != -1)
{
if (c == q)
{
c = (char)csv.Read();
if (c != q)
inQuotes = false;
}
if (inQuotes)
{
curValue.Append(c);
c = (char)csv.Read();
}
}
result.Add(curValue.ToString());
curValue = new StringBuilder();
if (c == ',') c = (char)csv.Read(); // either ',', newline, or endofstream
break;
case '\n': //end of the record
case '\r':
//potential bug here depending on what your line breaks look like
if (result.Count > 0) // don't return empty records
{
yield return result;
result = new List<string>();
}
c = (char)csv.Read();
break;
default: //normal unqualified text
while (c != ',' && c != '\r' && c != '\n' && csv.Peek() != -1)
{
curValue.Append(c);
c = (char)csv.Read();
}
result.Add(curValue.ToString());
curValue = new StringBuilder();
if (c == ',') c = (char)csv.Read(); //either ',', newline, or endofstream
break;
}
}
if (curValue.Length > 0) //potential bug: I don't want to skip on a empty column in the last record if a caller really expects it to be there
result.Add(curValue.ToString());
if (result.Count > 0)
yield return result;
}
}
Take a look at A Fast CSV Reader on CodeProject.
The last time this question was asked, here's the answer I gave:
If you're just trying to read a CSV file with C#, the easiest thing is to use the Microsoft.VisualBasic.FileIO.TextFieldParser class. It's actually built into the .NET Framework, instead of being a third-party extension.
Yes, it is in Microsoft.VisualBasic.dll, but that doesn't mean you can't use it from C# (or any other CLR language).
Here's an example of usage, taken from the MSDN documentation:
Using MyReader As New _
Microsoft.VisualBasic.FileIO.TextFieldParser("C:\testfile.txt")
MyReader.TextFieldType = FileIO.FieldType.Delimited
MyReader.SetDelimiters(",")
Dim currentRow As String()
While Not MyReader.EndOfData
Try
currentRow = MyReader.ReadFields()
Dim currentField As String
For Each currentField In currentRow
MsgBox(currentField)
Next
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & _
"is not valid and will be skipped.")
End Try
End While
End Using
Again, this example is in VB.NET, but it would be trivial to translate it to C#.
I really like the FileHelpers library. It's fast, it's C# 100%, it's available for FREE, it's very flexible and easy to use.
I'm implementing Daniel Pryden's answer in C#, so it is easier to cut and paste and customize. I think this is the easiest method for parsing CSV files. Just add a reference and you are basically done.
Add the Microsoft.VisualBasic Reference to your project
Then here is sample code in C# from Joel's answer:
using (Microsoft.VisualBasic.FileIO.TextFieldParser MyReader = new
Microsoft.VisualBasic.FileIO.TextFieldParser(filename))
{
MyReader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited;
MyReader.SetDelimiters(",");
while (!MyReader.EndOfData)
{
try
{
string[] fields = MyReader.ReadFields();
if (first)
{
first = false;
continue;
}
// This is how I treat my data, you'll need to throw this out.
//"Type" "Post Date" "Description" "Amount"
LineItem li = new LineItem();
li.date = DateTime.Parse(fields[1]);
li.description = fields[2];
li.Value = Convert.ToDecimal(fields[3]);
lineitems1.Add(li);
}
catch (Microsoft.VisualBasic.FileIO.MalformedLineException ex)
{
MessageBox.Show("Line " + ex.Message +
" is not valid and will be skipped.");
}
}
}
Besides parsing/reading, some libraries do other nice things like convert the parsed data into object for you.
Here is an example of using CsvHelper (a library I maintain) to read a CSV file into objects.
var csv = new CsvHelper( File.OpenRead( "file.csv" ) );
var myCustomObjectList = csv.Reader.GetRecords<MyCustomObject>();
By default, conventions are used for matching the headers/columns with the properties. You can change the behavior by changing the settings.
// Using attributes:
public class MyCustomObject
{
[CsvField( Name = "First Name" )]
public string StringProperty { get; set; }
[CsvField( Index = 0 )]
public int IntProperty { get; set; }
[CsvField( Ignore = true )]
public string ShouldIgnore { get; set; }
}
Sometimes you don't "own" the object you want to populate the data with. In this case you can use fluent class mapping.
// Fluent class mapping:
public sealed class MyCustomObjectMap : CsvClassMap<MyCustomObject>
{
public MyCustomObjectMap()
{
Map( m => m.StringProperty ).Name( "First Name" );
Map( m => m.IntProperty ).Index( 0 );
Map( m => m.ShouldIgnore ).Ignore();
}
}
You can use Microsoft.VisualBasic.FileIO.TextFieldParser
get below code example from above article
static void Main()
{
string csv_file_path=#"C:\Users\Administrator\Desktop\test.csv";
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
Console.WriteLine("Rows count:" + csvData.Rows.Count);
Console.ReadLine();
}
private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
DataTable csvData = new DataTable();
try
{
using(TextFieldParser csvReader = new TextFieldParser(csv_file_path))
{
csvReader.SetDelimiters(new string[] { "," });
csvReader.HasFieldsEnclosedInQuotes = true;
string[] colFields = csvReader.ReadFields();
foreach (string column in colFields)
{
DataColumn datecolumn = new DataColumn(column);
datecolumn.AllowDBNull = true;
csvData.Columns.Add(datecolumn);
}
while (!csvReader.EndOfData)
{
string[] fieldData = csvReader.ReadFields();
//Making empty value as null
for (int i = 0; i < fieldData.Length; i++)
{
if (fieldData[i] == "")
{
fieldData[i] = null;
}
}
csvData.Rows.Add(fieldData);
}
}
}
catch (Exception ex)
{
}
return csvData;
}

Categories