text parsing application c# without third party libraries

text parsing application c# without third party libraries - c#

For example, there is a line:
name, tax, company.
To separate them i need a split method.
string[] text = File.ReadAllLines("file.csv", Encoding.Default);
foreach (string line in text)
{
string[] words = line.Split(',');
foreach (string word in words)
{
Console.WriteLine(word);
}
}
Console.ReadKey();
But how to divide if in quotes the text with a comma is indicated:
name, tax, "company, Ariel";<br>
"name, surname", tax, company;<br> and so on.
To make it like this :
Max | 12.3 | company, Ariel
Alex, Smith| 13.1 | Oriflame
It is necessary to take into account that the input data will not always be in an ideal format (as in the example). That is, there may be 3 quotes in a row or a string without commas. The program should not fall in any case. If it is impossible to parse, then issue a message about it.

Split using double quotes first. And Split using comma on the first string.

You can use TextFieldParser from Microsoft.VisualBasic.FileIO
var list = new List<Data>();
var isHeader=true;
using (TextFieldParser parser = new TextFieldParser(filePath))
{
parser.Delimiters = new string[] { "," };
while (true)
{
string[] parts = parser.ReadFields();
if(isHeader)
{
isHeader = false;
continue;
}
if (parts == null)
break;
list.Add(new Data
{
People = parts[0],
Tax = Double.Parse(parts[1]),
Company = parts[2]
});
}
}
Where Data is defined as
public class Data
{
public string People{get;set;}
public double Tax{get;set;}
public string Company{get;set;}
}
Please note you need to include Microsoft.VisualBasic.FileIO
Example Data,
Name,Tax,Company
Max,12.3,"company, Ariel"
Ariel,13.1,"company, Oriflame"
Output

Here's a bit of code that might help, not the most efficient but I use it to 'see' what is going on with the parsing if a particular line is giving trouble.
string[] text = File.ReadAllLines("file.csv", Encoding.Default);
string[] datArr;
string tmpStr;
foreach (string line in text)
{
ParseString(line, ",", "!####!", out datArr, out tmpStr)
foreach(string s in datArr)
{
Console.WriteLine(s);
}
}
Console.ReadKey();
private static void ParseString(string inputString, string origDelim, string newDelim, out string[] retArr, out string retStr)
{
string tmpStr = inputString;
retArr = new[] {""};
retStr = "";
if (!string.IsNullOrWhiteSpace(tmpStr))
{
//If there is only one Quote character in the line, ignore/remove it:
if (tmpStr.Count(f => f == '"') == 1)
tmpStr = tmpStr.Replace("\"", "");
string[] tmpArr = tmpStr.Split(new[] {origDelim}, StringSplitOptions.None);
var inQuote = 0;
StringBuilder lineToWrite = new StringBuilder();
foreach (var s in tmpArr)
{
if (s.Contains("\""))
inQuote++;
switch (inQuote)
{
case 1:
//Begin quoted text
lineToWrite.Append(lineToWrite.Length > 0
? newDelim + s.Replace("\"", "")
: s.Replace("\"", ""));
if (s.Length > 4 && s.Substring(0, 2) == "\"\"" && s.Substring(s.Length - 2, 2) != "\"\"")
{
//if string has two quotes at the beginning and is > 4 characters and the last two characters are NOT quotes,
//inquote needs to be incremented.
inQuote++;
}
else if ((s.Substring(0, 1) == "\"" && s.Substring(s.Length - 1, 1) == "\"" &&
s.Length > 1) || (s.Count(x => x == '\"') % 2 == 0))
{
//if string has more than one character and both begins and ends with a quote, then it's ok and counter should be reset.
//if string has an EVEN number of quotes, it should be ok and counter should be reset.
inQuote = 0;
}
else
{
inQuote++;
}
break;
case 2:
//text between the quotes
//If we are here the origDelim value was found between the quotes
//include origDelim so there is no data loss.
//Example quoted text: "Dr. Mario, Sr, MD";
// ", Sr" would be handled here
// ", MD" would be handled in case 3 end of quoted text.
lineToWrite.Append(origDelim + s);
break;
case 3:
//End quoted text
//If we are here the origDelim value was found between the quotes
//and we are at the end of the quoted text
//include origDelim so there is no data loss.
//Example quoted text: "Dr. Mario, MD"
// ", MD" would be handled here.
lineToWrite.Append(origDelim + s.Replace("\"", ""));
inQuote = 0;
break;
default:
lineToWrite.Append(lineToWrite.Length > 0 ? newDelim + s : s);
break;
}
}
if (lineToWrite.Length > 0)
{
retStr = lineToWrite.ToString();
retArr = tmpLn.Split(new[] {newDelim}, StringSplitOptions.None);
}
}
}

Related

Can't properly rebuild a string with Replacement values from Dictionary

I am trying to build a file using a template. I am processing the file in a while loop line by line. The first section of the file, first 35 lines are header information. The infromation is surrounded by # signs. Take this string for example:
Field InspectionStationID 3 {"PVA TePla #WSM#", "sw#data.tool_context.TOOL_SOFTWARE_VERSION#", "#data.context.TOOL_ENTITY#"}
The expected output should be:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
This header section uses a different mapping than the rest of the file so I wanted to parse the file line by line from top to bottom and use a different logic for each section so that I don't waste time parsing the entire file at once multiple times for different sections.
The logic uses two dictionaries populated from an xml file. Because the file has mutliple tables, I combined them in the two dictionaries like so:
var headerCdataIndexKeyVals = Dictionary<string, int>(){
{"data.tool_context.TOOL_SOFTWARE_VERSION", 1},
{"data.context.TOOL_ENTITY",0}
};
var headerCdataArrayKeyVals = new Dictionary<string, List<string>>();
var tool_contextCdataList = new list <string>{"HM654", "sw0.2.002"};
var contextCdataList = new List<string>{"WSM102"}
headerCdataArrayKeyVals.add("tool_context", tool_contextCdataList);
headerCdataArrayKeyVals.add("context", contextCdataList);
To help me map the values to their respective positions in the string in one go and without having to loop through multiple dictionaries.
I am using the following logic:
public static string FindSubsInDelimetersAndReturn(string str, char openDelimiter, char closeDelimiter, HeaderMapperData mapperData )
{
string newString = string.Empty;
// Stores the indices of
Stack <int> dels = new Stack <int>();
for (int i = 0; i < str.Length; i++)
{
var let = str[i];
// If opening delimeter
// is encountered
if (str[i] == openDelimiter && dels.Count == 0)
{
dels.Push(i);
}
// If closing delimeter
// is encountered
else if (str[i] == closeDelimiter && dels.Count > 0)
{
// Extract the position
// of opening delimeter
int pos = dels.Peek();
dels.Pop();
// Length of substring
int len = i - 1 - pos;
// Extract the substring
string headerSubstring = str.Substring(pos + 1, len);
bool hasKey = mapperData.HeaderCdataIndexKeyVals.TryGetValue(headerSubstring.ToUpper(), out int headerCdataIndex);
string[] headerSubstringSplit = headerSubstring.Split('.');
string headerCDataVal = string.Empty;
if (hasKey)
{
if (headerSubstring.Contains("CONTAINER.CONTEXT", StringComparison.OrdinalIgnoreCase))
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper() + '.' + headerSubstringSplit[2].ToUpper()][headerCdataIndex];
//mapperData.HeaderCdataArrayKeyVals[]
}
else
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper()][headerCdataIndex];
}
string strToReplace = openDelimiter + headerSubstring + closeDelimiter;
string sub = str.Remove(i + 1);
sub = sub.Replace(strToReplace, headerCDataVal);
newString += sub;
}
else if (headerSubstring == "WSM" && closeDelimiter == '#')
{
string sub = str.Remove(len + 1);
newString += sub.Replace(openDelimiter + headerSubstring + closeDelimiter, "");
}
else
{
newString += let;
}
}
}
return newString;
}
}
But my output turns out to be:
"\tFie\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw0.2.002\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw#data.tool_context.TOOL_SOFTWARE_VERSION#\", \"WSM102"
Can someone help understand why this is happening and how I can go about correcting it so I get the output:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
Am i even trying to solve this the right way or is there a better cleaner way to do it? Btw if the key is not in the dictionary I replace it with empty string

C# string parse

I have string like this
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'"
Each pair of single quote is a field delimited by a comma. I want to empty the 8th field in the string. I cannot simply do replace("MUL,NBLD,NITA,NUND","") because that field could contain anything. also please note the the 4th field is a number and therefore has no single quote around 5.
How can I achieve this?

static void Main()
{
var temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
var parts = Split(temp).ToArray();
parts[7] = null;
var ret = string.Join(",", parts);
// or replace the above 3 lines with this...
//var ret = string.Join(",", Split(temp).Select((v,i)=>i!=7 ? v : null));
//ret == "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40',,'','Address line 2'"
}
public static IEnumerable<string> Split(string input, char delimiter = ',', char quote = '\'')
{
string temp = "";
bool skipDelimiter = false;
foreach (var c in input)
{
if (c == quote)
skipDelimiter = !skipDelimiter;
else if (c == delimiter && !skipDelimiter)
{
//do split
yield return temp;
temp = "";
continue;
}
temp += c;
}
yield return temp;
}

I made a small implementation below. I explain the logic in the comments. Basically you want to write a simple parser to accomplish what you described.
edit0: just realized I did the opposite of what you asked for oops..fixed now
edit1: replacing the string with null as opposed to eliminating the entire field from the comma-delimited list.
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
//keep track of the single quotes
int singleQuoteCount= 0;
//keep track of commas
int comma_count = 0;
String field = "";
foreach (Char chr in temp)
{
//add to the field string if we are not between the 7th and 8th comma not counting commas between single quotes
if (comma_count != 7)
field += chr;
//plug in null string between two single quotes instead of whatever chars are in the eigth field.
else if (chr == '\'' && singleQuoteCount %2 ==1)
field += "\'',";
if (chr == '\'') singleQuoteCount++;
//only want to add to comma_count if we are outside of single quotes.
if (singleQuoteCount % 2 == 0 && chr == ',') comma_count++;
}
}

If you would use '-' (or other char) instead of ',' inside of the fields (exam: 'MUL-NBLD-NITA-NUND'), you could use this code:
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL-NBLD-NITA-NUND','','Address line 2'";
temp = replaceField(temp, 8);
}
static string replaceField(string list, int field)
{
string[] fields = list.Split(',');
string chosenField = fields[field - 1 /*<--Arrays start at 0!*/];
if(!(field == fields.Length))
list = list.Replace(chosenField + ",", "");
else
list = list.Replace("," + chosenField, "");
return list;
}
//Return-Value: "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','','Address line 2'"

C# string.split() separate string by uppercase

I've been using the Split() method to split strings. But this work if you set some character for condition in string.Split(). Is there any way to split a string when is see Uppercase?
Is it possible to get few words from some not separated string like:
DeleteSensorFromTemplate
And the result string is to be like:
Delete Sensor From Template

Use Regex.split
string[] split = Regex.Split(str, #"(?<!^)(?=[A-Z])");

Another way with regex:
public static string SplitCamelCase(string input)
{
return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
}

If you do not like RegEx and you really just want to insert the missing spaces, this will do the job too:
public static string InsertSpaceBeforeUpperCase(this string str)
{
var sb = new StringBuilder();
char previousChar = char.MinValue; // Unicode '\0'
foreach (char c in str)
{
if (char.IsUpper(c))
{
// If not the first character and previous character is not a space, insert a space before uppercase
if (sb.Length != 0 && previousChar != ' ')
{
sb.Append(' ');
}
}
sb.Append(c);
previousChar = c;
}
return sb.ToString();
}

I had some fun with this one and came up with a function that splits by case, as well as groups together caps (it assumes title case for whatever follows) and digits.
Examples:
Input -> "TodayIUpdated32UPCCodes"
Output -> "Today I Updated 32 UPC Codes"
Code (please excuse the funky symbols I use)...
public string[] SplitByCase(this string s) {
var ʀ = new List<string>();
var ᴛ = new StringBuilder();
var previous = SplitByCaseModes.None;
foreach(var ɪ in s) {
SplitByCaseModes mode_ɪ;
if(string.IsNullOrWhiteSpace(ɪ.ToString())) {
mode_ɪ = SplitByCaseModes.WhiteSpace;
} else if("0123456789".Contains(ɪ)) {
mode_ɪ = SplitByCaseModes.Digit;
} else if(ɪ == ɪ.ToString().ToUpper()[0]) {
mode_ɪ = SplitByCaseModes.UpperCase;
} else {
mode_ɪ = SplitByCaseModes.LowerCase;
}
if((previous == SplitByCaseModes.None) || (previous == mode_ɪ)) {
ᴛ.Append(ɪ);
} else if((previous == SplitByCaseModes.UpperCase) && (mode_ɪ == SplitByCaseModes.LowerCase)) {
if(ᴛ.Length > 1) {
ʀ.Add(ᴛ.ToString().Substring(0, ᴛ.Length - 1));
ᴛ.Remove(0, ᴛ.Length - 1);
}
ᴛ.Append(ɪ);
} else {
ʀ.Add(ᴛ.ToString());
ᴛ.Clear();
ᴛ.Append(ɪ);
}
previous = mode_ɪ;
}
if(ᴛ.Length != 0) ʀ.Add(ᴛ.ToString());
return ʀ.ToArray();
}
private enum SplitByCaseModes { None, WhiteSpace, Digit, UpperCase, LowerCase }

Here's another different way if you don't want to be using string builders or RegEx, which are totally acceptable answers. I just want to offer a different solution:
string Split(string input)
{
string result = "";
for (int i = 0; i < input.Length; i++)
{
if (char.IsUpper(input[i]))
{
result += ' ';
}
result += input[i];
}
return result.Trim();
}

how to split comma with double quotes in c#?

string strExample =
"\"10553210\",\"na\",\"398,633,000\",\"20130709\",\"20130502\",\"20120724\",";
how to split above string with ","
I need an answer like
string[] arrExample = YourFunc(strExample);
arrExample[0] == "10553210";
arrExample[1] == "na";
arrExample[2] == "398,633,000";
...
with split option.
thanks in advance

Here is an easy way,
using Microsoft.VisualBasic.FileIO;
IList<string> arrExample;
using(var csvParser = new TextFieldParser(new StringReader(strExample))
{
fields = csvParser.ReadFields();
}

You may split not by comma "," but by whole string "\",\"".
Do not forget to Trim leading and trailing quotations ":
String strExample =
"\"10553210\",\"na\",\"398,633,000\",\"20130709\",\"20130502\",\"20120724\"";
string[] arrExample = St.Trim('"').Split(new String[] {"\",\""}, StringSplitOptions.None);

You can split on "," , The first and last entry you have to clean the " in the last and first entry:
string[] arr = strExample .Split(new string[] { "\",\"" },
StringSplitOptions.None);
//remove the extra quotes from the last and the first entry
arr[0] = arr[0].SubString(1,arr[0].Length - 1);
int last = arr.Length - 1;
arr[last] = arr[last].SubString(0,arr[last].Length - 1);

string[] arrExample = strExample.Split(",");
would do it, but your code won't compile. I assume you meant:
string strExample = "10553210,na,398,633,000,20130709,20130502,20120724";
If this isn't what you meant, please correct the question.

Assuming you meant this:
string strExample = "\"10553210\",\"na\",\"398,633,000\",\"20130709\",\"20130502\",\"20120724\"";
Split then Select the substring:
string[] parts = strExample.Split(',').Select(x => x.Substring(1, x.Length - 2)).ToArray();
Result:

strExample.Split(',');
You need to escape the double quotes if they're meant to be contained in your example string.

Using the example from Jodrell
private string[] SplitFields(string csvValue)
{
//if there aren't quotes, use the faster function
if (!csvValue.Contains('\"') && !csvValue.Contains('\''))
{
return csvValue.Trim(',').Split(',');
}
else
{
//there are quotes, use this built in text parser
using(var csvParser = new Microsoft.VisualBasic.FileIO.TextFieldParser(new StringReader(csvValue.Trim(','))))
{
csvParser.Delimiters = new string[] { "," };
csvParser.HasFieldsEnclosedInQuotes = true;
return csvParser.ReadFields();
}
}
}

This worked for me
public static IEnumerable<string> SplitCSV(string strInput)
{
string[] str = strInput.Split(',');
if (str == null)
yield return null;
StringBuilder quoteS = null;
foreach (string s in str)
{
if (s.StartsWith("\""))
{
if (s.EndsWith("\""))
{
yield return s;
}
quoteS = new StringBuilder(s);
continue;
}
if (quoteS != null)
{
quoteS.Append($",{s}");
if (s.EndsWith("\""))
{
string s1 = quoteS.ToString();
quoteS = null;
yield return s1;
}
else
continue;
}
yield return s;
}
}
static void Main(string[] args)
{
string s = "111,222,\"33,44,55\",666,\"77,88\",\"99\"";
Console.WriteLine(s);
var sp = SplitCSV(s);
foreach (string s1 in sp)
{
Console.WriteLine(s1);
}
Console.ReadKey();
}

you can do that by doing this ..
string stringname= "10553210,na,398,633,000,20130709,20130502,20120724";
List<String> asd = stringname.Split(',');
or if you wanr array then
array[] asd = stringname.Split(',').ToArray;

Is there a better way to implement Shift+Tab or Decrease Indent?

this is how i implemented Shift-Tab or decrease indent... the result on screenr
if ((Keyboard.Modifiers & ModifierKeys.Shift) == ModifierKeys.Shift && e.Key == Key.Tab)
{
// Shift+Tab
int selStart = txtEditor.SelectionStart;
int selLength = txtEditor.SelectionLength;
string selText = txtEditor.SelectedText;
string text = txtEditor.Text;
// find new lines that are followed by 1 or more spaces
Regex regex = new Regex(Environment.NewLine + #"(\s+)");
Match m = regex.Match(selText);
string spaces;
while (m.Success)
{
GroupCollection grps = m.Groups;
spaces = grps[1].Value;
int i = 0;
// remove 1 space on each loop to a max of 4 spaces
while (i < 4 && spaces.Length > 0)
{
spaces = spaces.Remove(0, 1);
i++;
}
// update spaces in selText
selText = selText.Remove(grps[1].Index, grps[1].Length).Insert(grps[1].Index, spaces);
m = regex.Match(selText, grps[1].Index + spaces.Length);
}
// commit changes to selText to text
text = text.Remove(selStart, selLength).Insert(selStart, selText);
// decrease indent of 1st line
// - find 1st character of selection
regex = new Regex(#"\w");
m = regex.Match(text, selStart);
int start = selStart;
if (m.Success) {
start = m.Index;
}
// - start search for spaces
regex = new Regex(Environment.NewLine + #"(\s+)", RegexOptions.RightToLeft);
m = regex.Match(text, start);
if (m.Success) {
spaces = m.Groups[1].Value;
int i = 0;
while (i < 4 && spaces.Length > 0) {
spaces = spaces.Remove(0, 1); // remove 1 space
i++;
}
text = text.Remove(m.Groups[1].Index, m.Groups[1].Length).Insert(m.Groups[1].Index, spaces);
selStart = m.Groups[1].Index;
}
txtEditor.Text = text;
txtEditor.SelectionStart = selStart;
txtEditor.SelectionLength = selText.Length;
e.Handled = true;
}
the code looks messy and i wonder if theres a better way.

Personally, I wouldn't use Regex for this.
Untested, probably needs modification:
public static class StringExtensions
{
// Removes leading white-spaces in a string up to a maximum
// of 'level' characters
public static string ReduceIndent(this string line, int level)
{
// Produces an IEnumerable<char> with the characters
// of the string verbatim, other than leading white-spaces
var unindentedChars = line.SkipWhile((c, index) => char.IsWhiteSpace(c) && index < level);
return new string(unindentedChars.ToArray());
}
// Applies a transformation to each line of a string and returns the
// transformed string
public static string LineTransform(this string text, Func<string,string> transform)
{
//Splits the string into an array of lines
var lines = text.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
//Applies the transformation to each line
var transformedLines = lines.Select(transform);
//Joins the transformed lines into a new string
return string.Join(Environment.NewLine, transformedLines.ToArray());
}
}
...
if ((Keyboard.Modifiers & ModifierKeys.Shift) == ModifierKeys.Shift && e.Key == Key.Tab)
{
// Reduces the indent level of the selected text by applying the
// 'ReduceIndent' transformation to each line of the text.
string replacement = txtEditor.SelectedText
.LineTransform(line => line.ReduceIndent(4));
int selStart = txtEditor.SelectionStart;
int selLength = txtEditor.SelectionLength;
txtEditor.Text = txtEditor.Text
.Remove(selStart, selLength)
.Insert(selStart, replacement);
txtEditor.SelectionStart = selStart;
txtEditor.SelectionLength = replacement.Length;
e.Handled = true;
}
EDIT:
Added comments to the code as per the request of the OP.
For more info:
Extension Methods
Func<T, TResult> delegate
Enumerable.SkipWhile extension method
Lambda Expressions

I'm thinking freely as I have never implemented a text editor.
What if you represent each line by an object with an indentation property, which is reflected in the rendering of the line. Then it would be easy to increase and decrease the indent.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

text parsing application c# without third party libraries - c#

Split using double quotes first. And Split using comma on the first string.

Related

Can't properly rebuild a string with Replacement values from Dictionary

C# string parse

C# string.split() separate string by uppercase

how to split comma with double quotes in c#?

Is there a better way to implement Shift+Tab or Decrease Indent?

Categories

Resources