C# string parse - c#

I have string like this
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'"
Each pair of single quote is a field delimited by a comma. I want to empty the 8th field in the string. I cannot simply do replace("MUL,NBLD,NITA,NUND","") because that field could contain anything. also please note the the 4th field is a number and therefore has no single quote around 5.
How can I achieve this?

static void Main()
{
var temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
var parts = Split(temp).ToArray();
parts[7] = null;
var ret = string.Join(",", parts);
// or replace the above 3 lines with this...
//var ret = string.Join(",", Split(temp).Select((v,i)=>i!=7 ? v : null));
//ret == "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40',,'','Address line 2'"
}
public static IEnumerable<string> Split(string input, char delimiter = ',', char quote = '\'')
{
string temp = "";
bool skipDelimiter = false;
foreach (var c in input)
{
if (c == quote)
skipDelimiter = !skipDelimiter;
else if (c == delimiter && !skipDelimiter)
{
//do split
yield return temp;
temp = "";
continue;
}
temp += c;
}
yield return temp;
}

I made a small implementation below. I explain the logic in the comments. Basically you want to write a simple parser to accomplish what you described.
edit0: just realized I did the opposite of what you asked for oops..fixed now
edit1: replacing the string with null as opposed to eliminating the entire field from the comma-delimited list.
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL,NBLD,NITA,NUND','','Address line 2'";
//keep track of the single quotes
int singleQuoteCount= 0;
//keep track of commas
int comma_count = 0;
String field = "";
foreach (Char chr in temp)
{
//add to the field string if we are not between the 7th and 8th comma not counting commas between single quotes
if (comma_count != 7)
field += chr;
//plug in null string between two single quotes instead of whatever chars are in the eigth field.
else if (chr == '\'' && singleQuoteCount %2 ==1)
field += "\'',";
if (chr == '\'') singleQuoteCount++;
//only want to add to comma_count if we are outside of single quotes.
if (singleQuoteCount % 2 == 0 && chr == ',') comma_count++;
}
}

If you would use '-' (or other char) instead of ',' inside of the fields (exam: 'MUL-NBLD-NITA-NUND'), you could use this code:
static void Main(string[] args)
{
string temp = "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','MUL-NBLD-NITA-NUND','','Address line 2'";
temp = replaceField(temp, 8);
}
static string replaceField(string list, int field)
{
string[] fields = list.Split(',');
string chosenField = fields[field - 1 /*<--Arrays start at 0!*/];
if(!(field == fields.Length))
list = list.Replace(chosenField + ",", "");
else
list = list.Replace("," + chosenField, "");
return list;
}
//Return-Value: "'ADDR_LINE_2','MODEL','TABLE',5,'S','Y','C40','','Address line 2'"

Related

How can I trim strings down to the first occurrence of a ";" or a "["

I have c# strings that look like this:
"かたむく;かたぶく[ok]"
"そば[側,傍];そく[側];はた"
"くすり"
"おととい[gikun];おとつい[gikun];いっさくじつ"
How can I trim these down so that the output has only text up to the first occurrence of a ";" character (not a normal semicolon) or a "[" or if neither are present then the new string would be the same as the existing.
"かたむく"
"そば"
"くすり"
"おととい"
Is that something that would be best done with Regex or should I use some indexOf type of code to do this?
You don't need a Regex, just string.IndexOfAny. Something like:
var inputs = new[]
{
"かたむく;かたぶく[ok]",
"そば[側,傍];そく[側];はた",
"くすり",
"おととい[gikun];おとつい[gikun];いっさくじつ"
};
var separators = new[] {' ', '['};
foreach (var input in inputs)
{
var separatorPosition = input.IndexOfAny(separators);
if (separatorPosition >= 0)
{
Debug.WriteLine($"Split: {input.Substring(0, separatorPosition)}");
}
else
{
Debug.WriteLine($"No Split: {input}");
}
}
I get the following output from your inputs:
Split: かたむく;かたぶく
Split: そば
No Split: くすり
Split: おととい
It doesn't quite match what you show, but I think it's correct (and what you show isn't)
Expanding on my comment, "IndexOf can be used to find the first index of the [ character, and Substring can return the string up to that point."
public static string GetSubstringToChar(string input, char delimeter = '[')
{
if (input == null || !input.Contains(delimeter)) return input;
return input.Substring(0, input.IndexOf(delimeter));
}
To make this work with multiple delimeters, we can pass in an array of delimeter characters and use IndexOfAny:
public static string GetSubstringToChar(string input, char[] delimeters)
{
if (input == null || !input.Any(delimeters.Contains)) return input;
return input.Substring(0, input.IndexOfAny(delimeters));
}
You could then call this like:
var strings = new List<string>
{
"かたむく;かたぶく[ok]",
"そば[側,傍];そく[側];はた",
"くすり",
"おととい[gikun];おとつい[gikun];いっさくじつ",
};
var delimeters = new[] { ';', '[' };
foreach (var str in strings)
{
Console.WriteLine(GetSubstringToChar(str, delimeters));
}
An extension method with a little validation will do the job.
public static string GetUntil(this string input, char[] delimiters)
{
if (input == null || input.IndexOfAny(delimiters) == -1)
return input;
else
return input.Split(delimiters)[0];
}
then call like:
var test = "かたむく;かたぶく[ok]".GetUntil(new char[] { ' ', '[' });

text parsing application c# without third party libraries

For example, there is a line:
name, tax, company.
To separate them i need a split method.
string[] text = File.ReadAllLines("file.csv", Encoding.Default);
foreach (string line in text)
{
string[] words = line.Split(',');
foreach (string word in words)
{
Console.WriteLine(word);
}
}
Console.ReadKey();
But how to divide if in quotes the text with a comma is indicated:
name, tax, "company, Ariel";<br>
"name, surname", tax, company;<br> and so on.
To make it like this :
Max | 12.3 | company, Ariel
Alex, Smith| 13.1 | Oriflame
It is necessary to take into account that the input data will not always be in an ideal format (as in the example). That is, there may be 3 quotes in a row or a string without commas. The program should not fall in any case. If it is impossible to parse, then issue a message about it.
Split using double quotes first. And Split using comma on the first string.
You can use TextFieldParser from Microsoft.VisualBasic.FileIO
var list = new List<Data>();
var isHeader=true;
using (TextFieldParser parser = new TextFieldParser(filePath))
{
parser.Delimiters = new string[] { "," };
while (true)
{
string[] parts = parser.ReadFields();
if(isHeader)
{
isHeader = false;
continue;
}
if (parts == null)
break;
list.Add(new Data
{
People = parts[0],
Tax = Double.Parse(parts[1]),
Company = parts[2]
});
}
}
Where Data is defined as
public class Data
{
public string People{get;set;}
public double Tax{get;set;}
public string Company{get;set;}
}
Please note you need to include Microsoft.VisualBasic.FileIO
Example Data,
Name,Tax,Company
Max,12.3,"company, Ariel"
Ariel,13.1,"company, Oriflame"
Output
Here's a bit of code that might help, not the most efficient but I use it to 'see' what is going on with the parsing if a particular line is giving trouble.
string[] text = File.ReadAllLines("file.csv", Encoding.Default);
string[] datArr;
string tmpStr;
foreach (string line in text)
{
ParseString(line, ",", "!####!", out datArr, out tmpStr)
foreach(string s in datArr)
{
Console.WriteLine(s);
}
}
Console.ReadKey();
private static void ParseString(string inputString, string origDelim, string newDelim, out string[] retArr, out string retStr)
{
string tmpStr = inputString;
retArr = new[] {""};
retStr = "";
if (!string.IsNullOrWhiteSpace(tmpStr))
{
//If there is only one Quote character in the line, ignore/remove it:
if (tmpStr.Count(f => f == '"') == 1)
tmpStr = tmpStr.Replace("\"", "");
string[] tmpArr = tmpStr.Split(new[] {origDelim}, StringSplitOptions.None);
var inQuote = 0;
StringBuilder lineToWrite = new StringBuilder();
foreach (var s in tmpArr)
{
if (s.Contains("\""))
inQuote++;
switch (inQuote)
{
case 1:
//Begin quoted text
lineToWrite.Append(lineToWrite.Length > 0
? newDelim + s.Replace("\"", "")
: s.Replace("\"", ""));
if (s.Length > 4 && s.Substring(0, 2) == "\"\"" && s.Substring(s.Length - 2, 2) != "\"\"")
{
//if string has two quotes at the beginning and is > 4 characters and the last two characters are NOT quotes,
//inquote needs to be incremented.
inQuote++;
}
else if ((s.Substring(0, 1) == "\"" && s.Substring(s.Length - 1, 1) == "\"" &&
s.Length > 1) || (s.Count(x => x == '\"') % 2 == 0))
{
//if string has more than one character and both begins and ends with a quote, then it's ok and counter should be reset.
//if string has an EVEN number of quotes, it should be ok and counter should be reset.
inQuote = 0;
}
else
{
inQuote++;
}
break;
case 2:
//text between the quotes
//If we are here the origDelim value was found between the quotes
//include origDelim so there is no data loss.
//Example quoted text: "Dr. Mario, Sr, MD";
// ", Sr" would be handled here
// ", MD" would be handled in case 3 end of quoted text.
lineToWrite.Append(origDelim + s);
break;
case 3:
//End quoted text
//If we are here the origDelim value was found between the quotes
//and we are at the end of the quoted text
//include origDelim so there is no data loss.
//Example quoted text: "Dr. Mario, MD"
// ", MD" would be handled here.
lineToWrite.Append(origDelim + s.Replace("\"", ""));
inQuote = 0;
break;
default:
lineToWrite.Append(lineToWrite.Length > 0 ? newDelim + s : s);
break;
}
}
if (lineToWrite.Length > 0)
{
retStr = lineToWrite.ToString();
retArr = tmpLn.Split(new[] {newDelim}, StringSplitOptions.None);
}
}
}

C# string.split() separate string by uppercase

I've been using the Split() method to split strings. But this work if you set some character for condition in string.Split(). Is there any way to split a string when is see Uppercase?
Is it possible to get few words from some not separated string like:
DeleteSensorFromTemplate
And the result string is to be like:
Delete Sensor From Template
Use Regex.split
string[] split = Regex.Split(str, #"(?<!^)(?=[A-Z])");
Another way with regex:
public static string SplitCamelCase(string input)
{
return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
}
If you do not like RegEx and you really just want to insert the missing spaces, this will do the job too:
public static string InsertSpaceBeforeUpperCase(this string str)
{
var sb = new StringBuilder();
char previousChar = char.MinValue; // Unicode '\0'
foreach (char c in str)
{
if (char.IsUpper(c))
{
// If not the first character and previous character is not a space, insert a space before uppercase
if (sb.Length != 0 && previousChar != ' ')
{
sb.Append(' ');
}
}
sb.Append(c);
previousChar = c;
}
return sb.ToString();
}
I had some fun with this one and came up with a function that splits by case, as well as groups together caps (it assumes title case for whatever follows) and digits.
Examples:
Input -> "TodayIUpdated32UPCCodes"
Output -> "Today I Updated 32 UPC Codes"
Code (please excuse the funky symbols I use)...
public string[] SplitByCase(this string s) {
var ʀ = new List<string>();
var ᴛ = new StringBuilder();
var previous = SplitByCaseModes.None;
foreach(var ɪ in s) {
SplitByCaseModes mode_ɪ;
if(string.IsNullOrWhiteSpace(ɪ.ToString())) {
mode_ɪ = SplitByCaseModes.WhiteSpace;
} else if("0123456789".Contains(ɪ)) {
mode_ɪ = SplitByCaseModes.Digit;
} else if(ɪ == ɪ.ToString().ToUpper()[0]) {
mode_ɪ = SplitByCaseModes.UpperCase;
} else {
mode_ɪ = SplitByCaseModes.LowerCase;
}
if((previous == SplitByCaseModes.None) || (previous == mode_ɪ)) {
ᴛ.Append(ɪ);
} else if((previous == SplitByCaseModes.UpperCase) && (mode_ɪ == SplitByCaseModes.LowerCase)) {
if(ᴛ.Length > 1) {
ʀ.Add(ᴛ.ToString().Substring(0, ᴛ.Length - 1));
ᴛ.Remove(0, ᴛ.Length - 1);
}
ᴛ.Append(ɪ);
} else {
ʀ.Add(ᴛ.ToString());
ᴛ.Clear();
ᴛ.Append(ɪ);
}
previous = mode_ɪ;
}
if(ᴛ.Length != 0) ʀ.Add(ᴛ.ToString());
return ʀ.ToArray();
}
private enum SplitByCaseModes { None, WhiteSpace, Digit, UpperCase, LowerCase }
Here's another different way if you don't want to be using string builders or RegEx, which are totally acceptable answers. I just want to offer a different solution:
string Split(string input)
{
string result = "";
for (int i = 0; i < input.Length; i++)
{
if (char.IsUpper(input[i]))
{
result += ' ';
}
result += input[i];
}
return result.Trim();
}

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

Is there a better way to implement Shift+Tab or Decrease Indent?

this is how i implemented Shift-Tab or decrease indent... the result on screenr
if ((Keyboard.Modifiers & ModifierKeys.Shift) == ModifierKeys.Shift && e.Key == Key.Tab)
{
// Shift+Tab
int selStart = txtEditor.SelectionStart;
int selLength = txtEditor.SelectionLength;
string selText = txtEditor.SelectedText;
string text = txtEditor.Text;
// find new lines that are followed by 1 or more spaces
Regex regex = new Regex(Environment.NewLine + #"(\s+)");
Match m = regex.Match(selText);
string spaces;
while (m.Success)
{
GroupCollection grps = m.Groups;
spaces = grps[1].Value;
int i = 0;
// remove 1 space on each loop to a max of 4 spaces
while (i < 4 && spaces.Length > 0)
{
spaces = spaces.Remove(0, 1);
i++;
}
// update spaces in selText
selText = selText.Remove(grps[1].Index, grps[1].Length).Insert(grps[1].Index, spaces);
m = regex.Match(selText, grps[1].Index + spaces.Length);
}
// commit changes to selText to text
text = text.Remove(selStart, selLength).Insert(selStart, selText);
// decrease indent of 1st line
// - find 1st character of selection
regex = new Regex(#"\w");
m = regex.Match(text, selStart);
int start = selStart;
if (m.Success) {
start = m.Index;
}
// - start search for spaces
regex = new Regex(Environment.NewLine + #"(\s+)", RegexOptions.RightToLeft);
m = regex.Match(text, start);
if (m.Success) {
spaces = m.Groups[1].Value;
int i = 0;
while (i < 4 && spaces.Length > 0) {
spaces = spaces.Remove(0, 1); // remove 1 space
i++;
}
text = text.Remove(m.Groups[1].Index, m.Groups[1].Length).Insert(m.Groups[1].Index, spaces);
selStart = m.Groups[1].Index;
}
txtEditor.Text = text;
txtEditor.SelectionStart = selStart;
txtEditor.SelectionLength = selText.Length;
e.Handled = true;
}
the code looks messy and i wonder if theres a better way.
Personally, I wouldn't use Regex for this.
Untested, probably needs modification:
public static class StringExtensions
{
// Removes leading white-spaces in a string up to a maximum
// of 'level' characters
public static string ReduceIndent(this string line, int level)
{
// Produces an IEnumerable<char> with the characters
// of the string verbatim, other than leading white-spaces
var unindentedChars = line.SkipWhile((c, index) => char.IsWhiteSpace(c) && index < level);
return new string(unindentedChars.ToArray());
}
// Applies a transformation to each line of a string and returns the
// transformed string
public static string LineTransform(this string text, Func<string,string> transform)
{
//Splits the string into an array of lines
var lines = text.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
//Applies the transformation to each line
var transformedLines = lines.Select(transform);
//Joins the transformed lines into a new string
return string.Join(Environment.NewLine, transformedLines.ToArray());
}
}
...
if ((Keyboard.Modifiers & ModifierKeys.Shift) == ModifierKeys.Shift && e.Key == Key.Tab)
{
// Reduces the indent level of the selected text by applying the
// 'ReduceIndent' transformation to each line of the text.
string replacement = txtEditor.SelectedText
.LineTransform(line => line.ReduceIndent(4));
int selStart = txtEditor.SelectionStart;
int selLength = txtEditor.SelectionLength;
txtEditor.Text = txtEditor.Text
.Remove(selStart, selLength)
.Insert(selStart, replacement);
txtEditor.SelectionStart = selStart;
txtEditor.SelectionLength = replacement.Length;
e.Handled = true;
}
EDIT:
Added comments to the code as per the request of the OP.
For more info:
Extension Methods
Func<T, TResult> delegate
Enumerable.SkipWhile extension method
Lambda Expressions
I'm thinking freely as I have never implemented a text editor.
What if you represent each line by an object with an indentation property, which is reflected in the rendering of the line. Then it would be easy to increase and decrease the indent.

Categories