I want to replace the delimiter comma with tabs in a CSV file
Input
Output
Note that commas shouldn't be replaced for words enclosed by quotes. Also in the output, we want to omit the double quotes
I tried the following, but the code also replaces commas for words enclosed by quotes
public void Replace_comma_with_tabs(string path)
{
var file = File
.ReadLines(path)
.SkipWhile(line => string.IsNullOrWhiteSpace(line)) // To be on the safe side
.Select((line, index) => line.Replace(',', '\t')) // replace ',' with '\t'
.ToList(); // Materialization, since we write into the same file
File.WriteAllLines(path, file);
}
How can I skip commas for the words enclosed by quotes?
Here is one way of doing it. It uses flag quotesStarted to check if comma should be treated as delimiter or part of the text in column. I also used StringBuilder since that class has good performance with string concatenation. It reads lines and then for each line it iterates through its characters and checks for those with special meaning (comma, single quote, tab, comma between single quotes):
static void Main(string[] args)
{
var path = "data.txt";
var file = File.ReadLines(path).ToArray();
StringBuilder sbFile = new StringBuilder();
foreach (string line in file)
{
if (String.IsNullOrWhiteSpace(line) == false)
{
bool quotesStarted = false;
StringBuilder sbLine = new StringBuilder();
foreach (char currentChar in line)
{
if (currentChar == '"')
{
quotesStarted = !quotesStarted;
sbLine.Append(currentChar);
}
else if (currentChar == ',')
{
if (quotesStarted)
sbLine.Append(currentChar);
else
sbLine.Append("\t");
}
else if (currentChar == '\t')
throw new Exception("Tab found");
else
sbLine.Append(currentChar);
}
sbFile.AppendLine(sbLine.ToString());
}
}
File.WriteAllText("Result-" + path, sbFile.ToString());
}
There's a lot of ways to do this but here's one. This only includes the code to transform a string that has comma delimited text with quoted text. You'd use "ToTabs" instead of "Replace" inside your Select statement. You'll have to harden this to add some error checking.
This will handle escaped quotes inside of quoted fields and it transforms existing tabs to spaces, but it's not a full blown CSV parser.
static class CsvHelper
{
public static string ToTabs(this string source)
{
Func<char,char> getState = NotInQuotes;
char last = ' ';
char InQuotes(char ch)
{
if ('"' == ch && last != '"')
getState = NotInQuotes;
else if ('\t' == ch)
ch = ' ';
last = ch;
return ch;
}
char NotInQuotes(char ch)
{
last = ch;
if ('"' == ch)
getState = InQuotes;
else if (',' == ch)
return '\t';
else if ('\t' == ch)
ch = ' ';
return ch;
}
return string.Create(source.Length, getState, (buffer,_) =>
{
for (int i = 0; i < source.Length; ++i)
{
buffer[i] = getState(source[i]);
}
});
}
}
static void Main(string[] _)
{
const string Source = "a,string,with,commas,\"field,with,\"\"commas\", and, another";
var withTabs = Source.ToTabs();
Console.WriteLine(Source);
Console.WriteLine(withTabs);
}
To change commas in a string to tabs, use Replace method.
Example:
str2.Replace(",", "hit tab key");
string str = "Lucy, John, Mark, Grace";
string str2 = str.Replace(",", " ");
Related
How to split text into words?
Example text:
'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'
The words in that line are:
Oh
you
can't
help
that
said
the
Cat
we're
all
mad
here
I'm
mad
You're
mad
Split text on whitespace, then trim punctuation.
var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
var punctuation = text.Where(Char.IsPunctuation).Distinct().ToArray();
var words = text.Split().Select(x => x.Trim(punctuation));
Agrees exactly with example.
First, Remove all special characeters:
var fixedInput = Regex.Replace(input, "[^a-zA-Z0-9% ._]", string.Empty);
// This regex doesn't support apostrophe so the extension method is better
Then split it:
var split = fixedInput.Split(' ');
For a simpler C# solution for removing special characters (that you can easily change), add this extension method (I added a support for an apostrophe):
public static string RemoveSpecialCharacters(this string str) {
var sb = new StringBuilder();
foreach (char c in str) {
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '\'' || c == ' ') {
sb.Append(c);
}
}
return sb.ToString();
}
Then use it like so:
var words = input.RemoveSpecialCharacters().Split(' ');
You'll be surprised to know that this extension method is very efficient (surely much more efficient then the Regex) so I'll suggest you use it ;)
Update
I agree that this is an English only approach but to make it Unicode compatible all you have to do is replace:
(c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
With:
char.IsLetter(c)
Which supports Unicode, .Net Also offers you char.IsSymbol and char.IsLetterOrDigit for the variety of cases
Just to add a variation on #Adam Fridental's answer which is very good, you could try this Regex:
var text = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
var matches = Regex.Matches(text, #"\w+[^\s]*\w+|\w");
foreach (Match match in matches) {
var word = match.Value;
}
I believe this is the shortest RegEx that will get all the words
\w+[^\s]*\w+|\w
If you don't want to use a Regex object, you could do something like...
string mystring="Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.";
List<string> words=mystring.Replace(",","").Replace(":","").Replace(".","").Split(" ").ToList();
You'll still have to handle the trailing apostrophe at the end of "that,'"
This is one of solution, i dont use any helper class or method.
public static List<string> ExtractChars(string inputString) {
var result = new List<string>();
int startIndex = -1;
for (int i = 0; i < inputString.Length; i++) {
var character = inputString[i];
if ((character >= 'a' && character <= 'z') ||
(character >= 'A' && character <= 'Z')) {
if (startIndex == -1) {
startIndex = i;
}
if (i == inputString.Length - 1) {
result.Add(GetString(inputString, startIndex, i));
}
continue;
}
if (startIndex != -1) {
result.Add(GetString(inputString, startIndex, i - 1));
startIndex = -1;
}
}
return result;
}
public static string GetString(string inputString, int startIndex, int endIndex) {
string result = "";
for (int i = startIndex; i <= endIndex; i++) {
result += inputString[i];
}
return result;
}
If you want to use the "for cycle" to check each char and save all punctuation in the input string I've create this class. The method GetSplitSentence() return a list of SentenceSplitResult. In this list there are saved all the words and all the punctuation & numbers. Each punctuation or numbers saved is an item in the list. The sentenceSplitResult.isAWord is used to check if is a word or not. [Sorry for my English]
public class SentenceSplitResult
{
public string word;
public bool isAWord;
}
public class StringsHelper
{
private readonly List<SentenceSplitResult> outputList = new List<SentenceSplitResult>();
private readonly string input;
public StringsHelper(string input)
{
this.input = input;
}
public List<SentenceSplitResult> GetSplitSentence()
{
StringBuilder sb = new StringBuilder();
try
{
if (String.IsNullOrEmpty(input)) {
Logger.Log(new ArgumentNullException(), "GetSplitSentence - input is null or empy");
return outputList;
}
bool isAletter = IsAValidLetter(input[0]);
// Each char i checked if is a part of a word.
// If is YES > I can store the char for later
// IF is NO > I Save the word (if exist) and then save the punctuation
foreach (var _char in input)
{
isAletter = IsAValidLetter(_char);
if (isAletter == true)
{
sb.Append(_char);
}
else
{
SaveWord(sb.ToString());
sb.Clear();
SaveANotWord(_char);
}
}
SaveWord(sb.ToString());
}
catch (Exception ex)
{
Logger.Log(ex);
}
return outputList;
}
private static bool IsAValidLetter(char _char)
{
if ((Char.IsPunctuation(_char) == true) || (_char == ' ') || (Char.IsNumber(_char) == true))
{
return false;
}
return true;
}
private void SaveWord(string word)
{
if (String.IsNullOrEmpty(word) == false)
{
outputList.Add(new SentenceSplitResult()
{
isAWord = true,
word = word
});
}
}
private void SaveANotWord(char _char)
{
outputList.Add(new SentenceSplitResult()
{
isAWord = false,
word = _char.ToString()
});
}
You could try using a regex to remove the apostrophes that aren't surrounded by letters (i.e. single quotes) and then using the Char static methods to strip all the other characters. By calling the regex first you can keep the contraction apostrophes (e.g. can't) but remove the single quotes like in 'Oh.
string myText = "'Oh, you can't help that,' said the Cat: 'we're all mad here. I'm mad. You're mad.'";
Regex reg = new Regex("\b[\"']\b");
myText = reg.Replace(myText, "");
string[] listOfWords = RemoveCharacters(myText);
public string[] RemoveCharacters(string input)
{
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
if (Char.IsLetter(c) || Char.IsWhiteSpace(c) || c == '\'')
sb.Append(c);
}
return sb.ToString().Split(' ');
}
I have a string that represent an action,
each arg in the action is seporated by the char ';',
for each arg I want to replace the char ',' with the char '.' but only if the ',' is not between ' char using Regex replace
For example:
1- "ActionName('1,b';1,2)"
2- "ActionName('a,b';1,2;1.2;'1,3')"
Desire result:
1- "ActionName('1,b';1.2)"
2- "ActionName('a,b';1.2;1.2;'1,3')
Conditions:
The ',' can appear multiple times inside a string.
Currntly I split the string for ';' loop over all the parts and each part I split for '\''.
Example Code:
public string Transform(string expression)
{
string newExpression = string.Empty;
string[] expParts = expression.Split(';');
for (int i = 0; i < expParts.Length; i++)
{
string newSubExpression = string.Empty;
string[] subExpParts = expParts[i].Split(new char[] { '\'' });
for (int subIndex = 0; subIndex < subExpParts.Length; subIndex += 2)
{
newSubExpression += subExpParts[subIndex].Replace(',', ".");
if (subIndex < subExpParts.Length - 1)
newSubExpression += "\'" + subExpParts[subIndex + 1] + "\'";
}
newExpression += newSubExpression;
if (i < expParts.Length - 1)
newExpression = newExpression + ",";
}
return newExpression;
}
You can use (?<=^([^']|'[^']*')*),
var myPattern= "(?<=^([^']|'[^']*')*),";
var regex = new Regex(myPattern);
var result = regex.Replace("ActionName('a,b';1,2;1.2;'1,3')", ".");
Output
ActionName('a,b';1.2;1.2;'1,3')
Demo here
Since you have tagged the question a regex, I post a regex that works for your input (at least what you posted):
(,(?![\w\d]*'))
Just an example, I think that it can be useful for you as a starting point...
You need to replace the matching regex with a ., in C# you can do like this:
result = Regex.Replace(input, #"(,(?![\w\d]*'))", #".");
Take a look at regex lookaround documentation for more information.
A simple FSM (Finite State Machine) will do. Please, notice that we have just two states (encoded with inQuotation): are we within quotated chunk or not.
public static string Transform(string expression) {
if (string.IsNullOrEmpty(expression))
return expression; // Or throw ArgumentNullException
StringBuilder sb = new StringBuilder(expression.Length);
bool inQuotation = false;
foreach (char c in expression)
if (c == ',' && !inQuotation)
sb.Append('.');
else {
if (c == '\'')
inQuotation = !inQuotation;
sb.Append(c);
}
return sb.ToString();
}
Tests:
string[] tests = new string[] {
"ActionName('1,b';1,2)",
"ActionName('a,b';1,2;1.2;'1,3')",
};
var result = tests
.Select((line, index) => $"{index + 1}- {Transform(line)}");
Console.WriteLine(string.Join(Environment.NewLine, result));
Outcome:
1- ActionName('1,b';1.2)
2- ActionName('a,b';1.2;1.2;'1,3')
So I have a string that I need to split by semicolon's
Email address: "one#tw;,.'o"#hotmail.com;"some;thing"#example.com
Both of the email addresses are valid
So I want to have a List<string> of the following:
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
But the way I am currently splitting the addresses is not working:
var addresses = emailAddressString.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim()).ToList();
Because of the multiple ; characters I end up with invalid email addresses.
I have tried a few different ways, even going down working out if the string contains quotes and then finding the index of the ; characters and working it out that way, but it's a real pain.
Does anyone have any better suggestions?
Assuming that double-quotes are not allowed, except for the opening and closing quotes ahead of the "at" sign #, you can use this regular expression to capture e-mail addresses:
((?:[^#"]+|"[^"]*")#[^;]+)(?:;|$)
The idea is to capture either an unquoted [^#"]+ or a quoted "[^"]*" part prior to #, and then capture everything up to semicolon ; or the end anchor $.
Demo of the regex.
var input = "\"one#tw;,.'o\"#hotmail.com;\"some;thing\"#example.com;hello#world";
var mm = Regex.Matches(input, "((?:[^#\"]+|\"[^\"]*\")#[^;]+)(?:;|$)");
foreach (Match m in mm) {
Console.WriteLine(m.Groups[1].Value);
}
This code prints
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
hello#world
Demo 1.
If you would like to allow escaped double-quotes inside double-quotes, you could use a more complex expression:
((?:(?:[^#\"]|(?<=\\)\")+|\"([^\"]|(?<=\\)\")*\")#[^;]+)(?:;|$)
Everything else remains the same.
Demo 2.
I obviously started writing my anti regex method at around the same time as juharr (Another answer). I thought that since I already have it written I would submit it.
public static IEnumerable<string> SplitEmailsByDelimiter(string input, char delimiter)
{
var startIndex = 0;
var delimiterIndex = 0;
while (delimiterIndex >= 0)
{
delimiterIndex = input.IndexOf(';', startIndex);
string substring = input;
if (delimiterIndex > 0)
{
substring = input.Substring(0, delimiterIndex);
}
if (!substring.Contains("\"") || substring.IndexOf("\"") != substring.LastIndexOf("\""))
{
yield return substring;
input = input.Substring(delimiterIndex + 1);
startIndex = 0;
}
else
{
startIndex = delimiterIndex + 1;
}
}
}
Then the following
var input = "blah#blah.com;\"one#tw;,.'o\"#hotmail.com;\"some;thing\"#example.com;hello#world;asdasd#asd.co.uk;";
foreach (var email in SplitEmailsByDelimiter(input, ';'))
{
Console.WriteLine(email);
}
Would give this output
blah#blah.com
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
hello#world
asdasd#asd.co.uk
You can also do this without using regular expressions. The following extension method will allow you to specify a delimiter character and a character to begin and end escape sequences. Note it does not validate that all escape sequences are closed.
public static IEnumerable<string> SpecialSplit(
this string str, char delimiter, char beginEndEscape)
{
int beginIndex = 0;
int length = 0;
bool escaped = false;
foreach (char c in str)
{
if (c == beginEndEscape)
{
escaped = !escaped;
}
if (!escaped && c == delimiter)
{
yield return str.Substring(beginIndex, length);
beginIndex += length + 1;
length = 0;
continue;
}
length++;
}
yield return str.Substring(beginIndex, length);
}
Then the following
var input = "\"one#tw;,.'o\"#hotmail.com;\"some;thing\"#example.com;hello#world;\"D;D#blah;blah.com\"";
foreach (var address in input.SpecialSplit(';', '"'))
Console.WriteLine(v);
While give this output
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
hello#world
"D;D#blah;blah.com"
Here's the version that works with an additional single escape character. It assumes that two consecutive escape characters should become one single escape character and it's escaping both the beginEndEscape charter so it will not trigger the beginning or end of an escape sequence and it also escapes the delimiter. Anything else that comes after the escape character will be left as is with the escape character removed.
public static IEnumerable<string> SpecialSplit(
this string str, char delimiter, char beginEndEscape, char singleEscape)
{
StringBuilder builder = new StringBuilder();
bool escapedSequence = false;
bool previousEscapeChar = false;
foreach (char c in str)
{
if (c == singleEscape && !previousEscapeChar)
{
previousEscapeChar = true;
continue;
}
if (c == beginEndEscape && !previousEscapeChar)
{
escapedSequence = !escapedSequence;
}
if (!escapedSequence && !previousEscapeChar && c == delimiter)
{
yield return builder.ToString();
builder.Clear();
continue;
}
builder.Append(c);
previousEscapeChar = false;
}
yield return builder.ToString();
}
Finally you probably should add null checking for the string that is passed in and note that both will return a sequence with one empty string if you pass in an empty string.
I know how to get substrings from a string which are coma seperated but here's a complication: what if substring contains a coma.
If a substring contains a coma, new line or double quotes the entire substring is encapsulated with double quotes.
If a substring contains a double quote the double quote is escaped with another double quote.
Worst case scenario would be if I have something like this:
first,"second, second","""third"" third","""fourth"", fourth"
In this case substrings are:
first
second, second
"third" third
"fourth", fourth
second, second is encapsulated with double quotes, I don't want those double quotes in a list/array.
"third" third is encapsulated with double quotes because it contains double quotes and those are escaped with aditional double quotes. Again I don't want the encapsulating double quotes in a list/array and i don't want the double quotes that escape double quotes, but I want original double quotes which are a part of the substring.
One way using TextFieldParser:
using (var reader = new StringReader("first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\""))
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader))
{
parser.Delimiters = new[] { "," };
parser.HasFieldsEnclosedInQuotes = true;
while (!parser.EndOfData)
{
foreach (var field in parser.ReadFields())
Console.WriteLine(field);
}
}
For
first
second, second
"third" third
"fourth", fourth
Try this
string input = "first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\"";
string[] output = input.Split(new string[] {"\",\""}, StringSplitOptions.RemoveEmptyEntries);
I would suggest you to construct a small state machine for this problem. You would have states like:
Out - before the first field is reached
InQuoted - you were Out and " arrived; now you're in and the field is quoted
InQuotedMaybeOut - you were InQuoted and " arrived; now you wait for the next character to figure whether it is another " or something else; if else, then select the next valid state (character could be space, new line, comma, so you decide the next state); otherwise, if " arrived, you push " to the output and step back to InQuoted
In - after Out, when any character has arrived except , and ", you are automatically inside a new field which is not quoted.
This will certainly read CSV correctly. You can also make the separator configurable, so that you support TSV or semicolon-separated format.
Also keep in mind one very important case in CSV format: Quoted field may contain new line! Another special case to keep an eye on: empty field (like: ,,).
This is not the most elegant solution but it might help you. I would loop through the characters and do an odd-even count of the quotes. For example you have a bool that is true if you have encountered an odd number of quotes and false for an even number of quotes.
Any comma encountered while this bool value is true should not be considered as a separator. If you know it is a separator you can do several things with that information. Below I replaced the delimiter with something more manageable (not very efficient though):
bool odd = false;
char replacementDelimiter = "|"; // Or some very unlikely character
for(int i = 0; i < str.len; ++i)
{
if(str[i] == '\"')
odd = !odd;
else if (str[i] == ',')
{
if(!odd)
str[i] = replacementDelimiter;
}
}
string[] commaSeparatedTokens = str.Split(replacementDelimiter);
At this point you should have an array of strings that are separated on the commas that you have intended. From here on it will be simpler to handle the quotes.
I hope this can help you.
Mini parser
using System;
using System.Collections.Generic;
using System.Text;
namespace ConsoleApp
{
class Program
{
private static IEnumerable<string> Parse(string input)
{
if (string.IsNullOrWhiteSpace(input))
{
// empty string => nothing to do
yield break;
}
int count = input.Length;
StringBuilder sb = new StringBuilder();
int j;
for (int i = 0; i < count; i++)
{
char c = input[i];
if (c == ',')
{
yield return sb.ToString();
sb.Clear();
}
else if (c == '"')
{
// begin quoted string
sb.Clear();
for (j = i + 1; j < count; j++)
{
if (input[j] == '"')
{
// quote
if (j < count - 1 && input[j + 1] == '"')
{
// double quote
sb.Append('"');
j++;
}
else
{
break;
}
}
else
{
sb.Append(input[j]);
}
}
yield return sb.ToString();
// clear buffer and skip to next comma
sb.Clear();
for (i = j + 1; i < count && input[i] != ','; i++) ;
}
else
{
sb.Append(c);
}
}
}
[STAThread]
static void Main(string[] args)
{
foreach (string str in Parse("first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\""))
{
Console.WriteLine(str);
}
Console.WriteLine();
Console.WriteLine("Press any key to continue...");
Console.ReadKey();
}
}
}
Result
first
second, second
"third" third
"fourth", fourth
Thank you for your answers, but before I got to see them I wrote this solution, it's not pretty but it works for me.
string line = "first,\"second, second\",\"\"\"third\"\" third\",\"\"\"fourth\"\", fourth\"";
var substringArray = new List<string>();
string substring = null;
var doubleQuotesCount = 0;
for (var i = 0; i < line.Length; i++)
{
if (line[i] == ',' && (doubleQuotesCount % 2) == 0)
{
substringArray.Add(substring);
substring = null;
doubleQuotesCount = 0;
continue;
}
else
{
if (line[i] == '"')
doubleQuotesCount++;
substring += line[i];
//If it is a last character
if (i == line.Length - 1)
{
substringArray.Add(substring);
substring = null;
doubleQuotesCount = 0;
}
}
}
for(var i = 0; i < substringArray.Count; i++)
{
if (substringArray[i] != null)
{
//remove first double quote
if (substringArray[i][0] == '"')
{
substringArray[i] = substringArray[i].Substring(1);
}
//remove last double quote
if (substringArray[i][substringArray[i].Length - 1] == '"')
{
substringArray[i] = substringArray[i].Remove(substringArray[i].Length - 1);
}
//Replace double double quotes with single double quote
substringArray[i] = substringArray[i].Replace("\"\"", "\"");
}
}
I need verify if string contains + in a interval of single quotes.
Example: string str = "'Name + R405'".
But, it may happen that this string has more than one range with these values.
Example: string str = "'Name + R405' + '(Name)'". In this case, the second + has a particular function in my code (it is out of single quotes).
In other words, I need identify only + that are within the single quotes. If have a other way for this, please explain for me.
Update:
Within single quotes (where is the text I need) may contain other single quotes. Therefore, I can not simply do checks to observe the beginning and end of a pair of single quotes.
Update 2:
I have a problem that might be a little complicated. My system has functions that take certain strings, and those strings are manipulated according to certain parameter:
Text in single quotes are not altered / manipulated;
To separate one text from another, is used +;
My string must accept any character (this is a problem, I know).
For example: "'Name' + On + 'Sector'". Strings like this, only have the part "On" manipulated by these methods. However, I have strings like "'Name + Code' + On + 'Sector'" or "'Name'+Code '+ On +'Sector'". The "Name + Code"/"Name'+Code" will not be manipulated. Thus, the methods are "confused" with this kind of text and using the + and single quotes that are within parts of the text should that not be changed. But I can not change the methods, must perform a treatment in the string before calling the methods for them.
You can do this by iterating through the characters and keeping track of the single quotes you have seen.
public static bool HasPlusBetweenSingleQuotes(string str)
{
bool inSingleQuotes = false;
foreach (char c in str)
{
if (c == '\'')
{
inSingleQuotes = !inSingleQuotes;
}
else if (c == '+' && inSingleQuotes)
{
return true;
}
}
return false;
}
If you need the indexes of the plus signs within single quotes you can do the following.
public static IEnumerable<int> PlusBetweenSingleQuotesIndexes(string str)
{
bool inSingleQuotes = false;
for(int i=0;i<str.Length;i++)
{
if (str[i] == '\'')
{
inSingleQuotes = !inSingleQuotes;
}
else if (str[i] == '+' && inSingleQuotes)
{
yield return i;
}
}
}
Note that these methods do not verify that every opening single quote has a closing single quote.
EDIT
If you have delimited quotes you just check if the previous character is the delimiter like \.
public static bool HasPlusBetweenSingleQuotes(string str)
{
bool inSingleQuotes = false;
char previous = ' '; // just defaulting to a space.
foreach (char c in str)
{
if (c == '\'' && previous != '\\')
{
inSingleQuotes = !inSingleQuotes;
}
else if (c == '+' && inSingleQuotes)
{
return true;
}
previous = c;
}
return false;
}
I'm not sure if this can be done with a regular expression (It might be possible?). It would be easier just to do this with a loop of characters and track if you are in or outside of quotes.
bool inBlock = false;
foreach(var aChar in string mySentence) {
//Testing with ascii codes + == +, ' == '
inBlock = (aChar == "'") ? !inBlock : inBlock;
if(inBlock && aChar == "+")
// do stuff here
}
As a note, the code might not work, I didn't test it.
Why not invert the logic here and use the "concatenation sequences" as the structure for the pattern? These can be described as a sequence of + or +On+ (with optional spaces) that are in between single quoted (possibly non-balanced) strings. Match the "glue" sequence bookended by a lookbehind for a ' and a lookahead for a ', and you can parse the string into "single quoted strings" and "glue" tokens:
var strings = new string[]
{"'Name'+Code '+ On +'Sector'",
"'Name + R405' + '(Name)'",
"'Name + Code' + On + 'Sector'",
"'Name''+'Sector'"
};
const string pattern = #"(?<=')(\s*\+\s*|\s*\+\s*On\s*\+\s*)(?=')";
foreach (string s in strings)
{
Console.WriteLine("input:"+s);
string[] tokens = Regex.Split(s, pattern);
foreach (string token in tokens)
{
Console.WriteLine("token:->{0}<-", token);
}
//tokens.Where((x, i) => i % 2 == 0) //single quoted strings
//tokens.Where((x, i) => i % 2 != 0) //glue sequences
}