Regex to match only \" not \\\" - c#

I have to be able to split readed line of code by File.ReadLines() by ';' when i got something like that in source code (two or more code lines in one line):
string firstString = "xyzxyz"; string secodnString = "zyxzyx";
the problem is that inside those strings can be another ; or even ", and then this line:
string firstString = "xyz;xyz\"inside quote\""; string secondString =
"zyx;zyx";
readed looks like this:
"string firstString = \"xyz;xyz\\\"inside quote\\\"\"; string secondString
= \"zyx;zyx\";
So I figured that I can determine if ';' is inside string due to difference in \" and \\" by Regex, but i cant figure aout how to match \" but not to match \\" i've tried:
"[^\\\\]\"" or "[^\\]\""
but it does not work. Thanks in andvace.
EDIT, my only problem is the regex, rest of it i got already writen like that:
List<string> vrlSplitedLine = vrlLines[i].Trim().Split(';').ToList();
List<string> vrlFinallSplitedLine = new List<string>();
string vrlReatachedString = string.Empty;
for(int j = 0; j < vrlSplitedLine.Count; j++)
{
if(Regex.Matches(vrlSplitedLine[j], "[^\\\\]\"").Count % 2 != 0)
{
vrlReatachedString = vrlSplitedLine[j];
int k = j;
do
{
k++;
vrlReatachedString = vrlReatachedString + ';' + vrlSplitedLine[k];
}
while (Regex.Matches(vrlSplitedLine[k], "[^\\\\]\"").Count % 2 == 0);
vrlFinallSplitedLine.Add(vrlReatachedString);
j = k;
}
else
{
vrlFinallSplitedLine.Add(vrlSplitedLine[j]);
}
}

Related

C#: Need to split a string into a string[] and keeping the delimiter (also a string) at the beginning of the string

I think I am too dumb to solve this problem...
I have some formulas which need to be "translated" from one syntax to another.
Let's say I have a formula that goes like that (it's a simple one, others have many "Ceilings" in it):
string formulaString = "If([Param1] = 0, 1, Ceiling([Param2] / 0.55) * [Param3])";
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
My attempt is to split the fomulaString at "Ceiling(" so I am able to iterate through the string array and insert my string at the correct index (counting every "(" and ")" to get the right index)
What I have so far:
//splits correct, but loses "CEILING("
string[] parts = formulaString.Split(new[] { "CEILING(" }, StringSplitOptions.None);
//splits almost correct, "CEILING(" is in another group
string[] parts = Regex.Split(formulaString, #"(CEILING\()");
//splits almost every letter
string[] parts = Regex.Split(formulaString, #"(?=[(CEILING\()])");
When everything is done, I concat the string so I have my complete formula again.
What do I have to set as Regex pattern to achieve this sample? (Or any other method that will help me)
part1 = "If([Param1] = 0, 1, ";
part2 = "Ceiling([Param2] / 0.55) * [Param3])";
//part3 = next "CEILING(" in a longer formula and so on...
As I mention in a comment, you almost got it: (?=Ceiling). This is incomplete for your use case unfortunately.
I need to replace "Ceiling()" with "Ceiling(; 1)" (basically, insert "; 1" before the ")").
Depending on your regex engine (for example JS) this works:
string[] parts = Regex.Split(formulaString, #"(?<=Ceiling\([^)]*(?=\)))");
string modifiedFormula = String.join("; 1", parts);
The regex
(?<=Ceiling\([^)]*(?=\)))
(?<= ) Positive lookbehind
Ceiling\( Search for literal "Ceiling("
[^)] Match any char which is not ")" ..
* .. 0 or more times
(?=\)) Positive lookahead for ")", effectively making us stop before the ")"
This regex is a zero-assertion, therefore nothing is lost and it will cut your strings before the last ")" in every "Ceiling()".
This solution would break whenever you have nested "Ceiling()". Then your only solution would be writing your own parser for the same reasons why you can't parse markup with regex.
Regex.Replace(formulaString, #"(?<=Ceiling\()(.*?)(?=\))","$1; 1");
Note: This will not work for nested "Ceilings", but it does for Ceiling(), It will also not work fir Ceiling(AnotherFunc(x)). For that you need something like:
Regex.Replace(formulaString, #"(?<=Ceiling\()((.*\((?>[^()]+|(?1))*\))*|[^\)]*)(\))","$1; 1$3");
but I could not get that to work with .NET, only in JavaScript.
This is my solution:
private string ConvertCeiling(string formula)
{
int ceilingsCount = formula.CountOccurences("Ceiling(");
int startIndex = 0;
int bracketCounter;
for (int i = 0; i < ceilingsCount; i++)
{
startIndex = formula.IndexOf("Ceiling(", startIndex);
bracketCounter = 0;
for (int j = 0; j < formula.Length; j++)
{
if (j < startIndex) continue;
var c = formula[j];
if (c == '(')
{
bracketCounter++;
}
if (c == ')')
{
bracketCounter--;
if (bracketCounter == 0)
{
// found end
formula = formula.Insert(j, "; 1");
startIndex++;
break;
}
}
}
}
return formula;
}
And CountOccurence:
public static int CountOccurences(this string value, string parameter)
{
int counter = 0;
int startIndex = 0;
int indexOfCeiling;
do
{
indexOfCeiling = value.IndexOf(parameter, startIndex);
if (indexOfCeiling < 0)
{
break;
}
else
{
startIndex = indexOfCeiling + 1;
counter++;
}
} while (true);
return counter;
}

Convert string ToTitleCase in c# [duplicate]

This question already has answers here:
Best way to convert Pascal Case to a sentence
(17 answers)
Closed 9 years ago.
I am looking for some solution to my problem, here is what i need, just an example
i have phrase
"ProgrammingIsIntresting"
i need it to split and make a string like "Programming Is Intresting".
CultureInfo.CurrentCulture.TextInfo.ToTitleCase work here but how can i put space literal here.
here is what i have and it seems i am stuck here.
var UpperChars = mystring.Where(c => Char.IsUpper(c));
foreach (var ch in UpperChars)
{
if (mystring.IndexOf(ch) == 0)
continue;
}
Try this:
return Regex.Replace(input, "([A-Z])"," $1", RegexOptions.Compiled).Trim();
from http://weblogs.asp.net/jgalloway/archive/2005/09/27/426087.aspx
or:
var splitted = Regex.Replace("ProgrammingIsIntresting",
#"(\B[A-Z]+?(?=[A-Z][^A-Z])|\B[A-Z]+?(?=[^A-Z]))", " $1");
second one will deal with SQLIsCool example
string myString = "ProgrammingIsIntresting";
String newString = "";
char intermediate;
for (int i = 0; i < myString.Length; i++)
{
intermediate = myString[i];
if(char.IsUpper(intermediate) && (i != 0))
newString = newString + " " + intermediate.ToString();
else
newString = newString + intermediate.ToString();
}
Console.WriteLine(newString);
Another possible approach could be this which I just tried:
string a = "IAmVahid";
List<char> b= new List<char>();
int j = 0;
for (int i = 0; i < a.Length; i++)
{
if (char.IsUpper(a[i]))
{
b.Add(' ');
}
b.Add(a[i]);
}
ouputTxt.Text = new String(b.ToArray()) ;

Better way for the special concatenation of two strings

I want to concatenate two strings in such a way, that after the first character of the first string, the first character of second string comes, and then the second character of first string comes and then the second character of the second string comes and so on. Best explained by some example cases:
s1="Mark";
s2="Zukerberg"; //Output=> MZaurkkerberg
if:
s1="Zukerberg";
s2="Mark" //Output=> ZMuakrekrberg
if:
s1="Zukerberg";
s2="Zukerberg"; //Output=> ZZuukkeerrbbeerrgg
I've written the following code which gives the expected output but its seems to be a lot of code. Is there any more efficient way for doing this?
public void SpecialConcat(string s1, string s2)
{
string[] concatArray = new string[s1.Length + s2.Length];
int k = 0;
string final = string.Empty;
string superFinal = string.Empty;
for (int i = 0; i < s1.Length; i++)
{
for (int j = 0; j < s2.Length; j++)
{
if (i == j)
{
concatArray[k] = s1[i].ToString() + s2[j].ToString();
final = string.Join("", concatArray);
}
}
k++;
}
if (s1.Length > s2.Length)
{
string subOne = s1.Remove(0, s2.Length);
superFinal = final + subOne;
}
else if (s2.Length > s1.Length)
{
string subTwo = s2.Remove(0, s1.Length);
superFinal = final + subTwo;
}
else
{
superFinal = final;
}
Response.Write(superFinal);
}
}
I have written the same logic in Javascript also, which works fine but again a lot of code.
var s1 = "Mark";
var s2 = "Zukerberg";
var common = string.Concat(s1.Zip(s2, (a, b) => new[]{a, b}).SelectMany(c => c));
var shortestLength = Math.Min(s1.Length, s2.Length);
var result =
common + s1.Substring(shortestLength) + s2.Substring(shortestLength);
var stringBuilder = new StringBuilder();
for (int i = 0; i < Math.Max(s1.Length, s2.Length); i++)
{
if (i < s1.Length)
stringBuilder.Append(s1[i]);
if (i < s2.Length)
stringBuilder.Append(s2[i]);
}
string result = stringBuilder.ToString();
In JavaScript, when working with strings, you are also working with arrays, so it will be easier. Also + will concatenate for you. Replace string indexing with charAt if you want IE7- support.
Here is the fiddle:
http://jsfiddle.net/z6XLh/1
var s1 = "Mark";
var s2 = "ZuckerFace";
var out ='';
var l = s1.length > s2.length ? s1.length : s2.length
for(var i = 0; i < l; i++) {
if(s1[i]) {
out += s1[i];
}
if(s2[i]){
out += s2[i];
}
}
console.log(out);
static string Join(string a, string b)
{
string returnVal = "";
int length = Math.Min(a.Length, b.Length);
for (int i = 0; i < length; i++)
returnVal += "" + a[i] + b[i];
if (a.Length > length)
returnVal += a.Substring(length);
else if(b.Length > length)
returnVal += b.Substring(length);
return returnVal;
}
Could possibly be improved through stringbuilder
Just for the sake of curiosity, here's an unreadable one-liner (which I have nevertheless split over multiple lines ;))
This uses the fact that padding a string to a certain length does nothing if the string is already at least that length. That means padding each string to the length of the other string will have the result of padding out with spaces the shorter one to the length of the longer one.
Then we use .Zip() to concatenate each of the pairs of characters into a string.
Then we call string.Concat(IEnumerable<string>) to concatenate the zipped strings into a single string.
Finally, we remove the extra padding spaces we introduced earlier by using string.Replace().
var result = string.Concat
(
s1.PadRight(s2.Length)
.Zip
(
s2.PadRight(s1.Length),
(a,b)=>string.Concat(a,b)
)
).Replace(" ", null);
On one line [insert Coding Horror icon here]:
var result = string.Concat(s1.PadRight(s2.Length).Zip(s2.PadRight(s1.Length), (a,b)=>string.Concat(a,b))).Replace(" ", null);
Just off the top of my head, this is how I might do it.
var s1Length = s1.Length;
var s2Length = s2.Length;
var count = 0;
var o = "";
while (s1Length + s2Length > 0) {
if (s1Length > 0) {
s1Length--;
o += s1[count];
}
if (s2Length > 0) {
s2Length--;
o += s2[count];
}
count++;
}
Here's another one-liner:
var s1 = "Mark";
var s2 = "Zukerberg";
var result = string.Join("",
Enumerable.Range(0, s1.Length).ToDictionary(x => x * 2, x => s1[x])
.Concat(Enumerable.Range(0, s2.Length).ToDictionary(x => x * 2+1, x => s2[x]))
.OrderBy(d => d.Key).Select(d => d.Value));
Basically, this converts both strings into dictionaries with keys that will get the resulting string to order itself correctly. The Enumerable range is used to associate an index with each letter in the string. When we store the dictionaries, it multiplies the index on s1 by 2, resulting in <0,M>,<2,a>,<4,r>,<6,k>, and multiplies s2 by 2 then adds 1, resulting in <1,Z>,<3,u>,<5,k>, etc.
Once we have these dictionaries, we combine them with the .Concat and sort them with the .OrderBy,which gives us <0,M>,<1,Z>,<2,a>,<3,u>,... Then we just dump them into the final string with the string.join at the beginning.
Ok, this is the *second shortest solution I could come up with:
public string zip(string s1, string s2)
{
return (string.IsNullOrWhiteSpace(s1+s2))
? (s1[0] + "" + s2[0] + zip(s1.Substring(1) + " ", s2.Substring(1) + " ")).Replace(" ", null)
: "";
}
var result = zip("mark","zukerberg");
Whoops! My original shortest was the same as mark's above...so, second shortest i could come up with! I had hoped I could really trim it down with the recursion, but not so much.
var sWordOne = "mark";// ABCDEF
var sWordTwo = "zukerberg";// 123
var result = (sWordOne.Length > sWordTwo.Length) ? zip(sWordOne, sWordTwo) : zip(sWordTwo, sWordOne);
//result = "zmuakrekrberg"
static string zip(string sBiggerWord, string sSmallerWord)
{
if (sBiggerWord.Length < sSmallerWord.Length) return string.Empty;// Invalid
if (sSmallerWord.Length == 0) sSmallerWord = " ";
return string.IsNullOrEmpty(sBiggerWord) ? string.Empty : (sBiggerWord[0] + "" + sSmallerWord[0] + zip(sBiggerWord.Substring(1),sSmallerWord.Substring(1))).Replace(" ","");
}
A simple alternative without Linq witchcraft:
string Merge(string one, string two)
{
var buffer = new char[one.Length + two.Length];
var length = Math.Max(one.Length, two.Length);
var index = 0;
for (var i = 0; i < length; i ++)
{
if (i < one.Length) buffer[index++] = one[i];
if (i < two.Length) buffer[index++] = two[i];
}
return new string(buffer);
}

Concatenate neighboring characters of a special character "-"

i am developing an application using c#.net in which i need that if a input entered by user contains the character '-'(hyphen) then i want the immediate neighbors of the hyphen(-) to be concatenated for example if a user enters
A-B-C then i want it to be replaced with ABC
AB-CD then i want it to be replaced like BC
ABC-D-E then i want it to be replaced like CDE
AB-CD-K then i want it to be replaced like BC and DK both separated by keyword and
after getting this i have to prepare my query to database.
i hope i made the problem clear but if need more clarification let me know.
Any help will be appreciated much.
Thanks,
Devjosh
Use:
string[] input = {
"A-B-C",
"AB-CD",
"ABC-D-E",
"AB-CD-K"
};
var regex = new Regex(#"\w(?=-)|(?<=-)\w", RegexOptions.Compiled);
var result = input.Select(s => string.Concat(regex.Matches(s)
.Cast<Match>().Select(m => m.Value)));
foreach (var s in result)
{
Console.WriteLine(s);
}
Output:
ABC
BC
CDE
BCDK
Untested, but this should do the trick, or at the very least lead you in the right direction.
private string Prepare(string input)
{
StringBuilder output = new StringBuilder();
char[] chars = input.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
if (chars[i] == '-')
{
if (i > 0)
{
output.Append(chars[i - 1]);
}
if (++i < chars.Length)
{
output.Append(chars[i])
}
else
{
break;
}
}
}
return output.ToString();
}
If you want each pair to form a separate object in an array, try the following code:
private string[] Prepare(string input)
{
List<string> output = new List<string>();
char[] chars = input.ToCharArray();
for (int i = 0; i < chars.Length; i++)
{
if (chars[i] == '-')
{
string o = string.Empty;
if (i > 0)
{
o += chars[i - 1];
}
if (++i < chars.Length)
{
o += chars[i]
}
output.Add(o);
}
}
return output.ToArray();
}
Correct me if I am wrong but surely all you need to do is remove the '-'?
like this:
"A-B-C".Replace("-","");
You can even solve this with a one-liner (although a bit ugly):
String.Join(String.Empty, input.Split('-').Select(q => (q.Length == 0 ? String.Empty : (q.Length > 1 ? (q.First() + q.Last()).ToString() : q.First().ToString())))).Substring(((input[0] + input[1]).ToString().Contains('-') ? 0 : 1), input.Length - ((input[0] + input[1]).ToString().Contains('-') ? 0 : 1) - ((input[input.Length - 1] + input[input.Length - 2]).ToString().Contains('-') ? 0 : 1));
first it splits the string to an array on each '-', then it concatenates only the first and the last character of each string (or just the only character if there's only one, and it leaves the empty string if there's nothing there), and then it concatenates the resulting enumerable to a String. Finally we strip the first and the last letter, if they are not in the needed range.
I know, it's ugly, I'm just saying that it's possible..
Probably it's way better to just use a simple
new Regex(#"\w(?=-)|(?<=-)\w", RegexOptions.Compiled)
and then work with that..
EDIT #Kirill Polishchuk was faster.. his solution should work..
EDIT 2
After the Question has been updated, here's a snippet that should do the trick:
string input = "A-B-C";
string s2;
string s3 = "";
string s4 = "";
var splitted = input.Split('-');
foreach(string s in splitted) {
if (s.Length == 0)
s2 = String.Empty;
else
if (s.Length > 1)
s2 = (s.First() + s.Last()).ToString();
else
s2 = s.First().ToString();
s3 += s4 + s2;
s4 = " and ";
}
int beginning;
int end;
if (input.Length > 1)
{
if ((input[0] + input[1]).ToString().Contains('-'))
beginning = 0;
else
beginning = 1;
if ((input[input.Length - 1] + input[input.Length - 2]).ToString().Contains('-'))
end = 0;
else
end = 1;
}
else
{
if ((input[0]).ToString().Contains('-'))
beginning = 0;
else
beginning = 1;
if ((input[input.Length - 1]).ToString().Contains('-'))
end = 0;
else
end = 1;
}
string result = s3.Substring(beginning, s3.Length - beginning - end);
It's not very elegant, but it should work (not tested though..). it works nearly the same as the one-liner above...

CSV Validation in C# - Making sure each row has the same number of commas

I wish to implement a fairly simple CSV checker in my C#/ASP.NET application - my project automatically generates CSV's from GridView's for users, but I want to be able to quickly run through each line and see if they have the same amount of commas, and throw an exception if any differences occur. So far I have this, which does work but there are some issues I'll describe soon:
int? CommaCount = null;
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
String Str = null;
//This loops through all the headerrow cells and writes them to the stringbuilder
for (int k = 0; k <= (grd.Columns.Count - 1); k++)
{
sw.Write(grd.HeaderRow.Cells[k].Text + ",");
}
sw.WriteLine(",");
//This loops through all the main rows and writes them to the stringbuilder
for (int i = 0; i <= grd.Rows.Count - 1; i++)
{
StringBuilder RowString = new StringBuilder();
for (int j = 0; j <= grd.Columns.Count - 1; j++)
{
//We'll need to strip meaningless junk such as <br /> and
Str = grd.Rows[i].Cells[j].Text.ToString().Replace("<br />", "");
if (Str == " ")
{
Str = "";
}
Str = "\"" + Str + "\"" + ",";
RowString.Append(Str);
sw.Write(Str);
}
sw.WriteLine();
//The below code block ensures that each row contains the same number of commas, which is crucial
int RowCommaCount = CheckChar(RowString.ToString(), ',');
if (CommaCount == null)
{
CommaCount = RowCommaCount;
}
else
{
if (CommaCount!= RowCommaCount)
{
throw new Exception("CSV generated is corrupt - line " + i + " has " + RowCommaCount + " commas when it should have " + CommaCount);
}
}
}
sw.Close();
And my CheckChar method:
protected static int CheckChar(string Input, char CharToCheck)
{
int Counter = 0;
foreach (char StringChar in Input)
{
if (StringChar == CharToCheck)
{
Counter++;
}
}
return Counter;
}
Now my problem is, if a cell in the grid contains a comma, my check char method will still count these as delimiters so will return an error. As you can see in the code, I wrap all the values in " characters to 'escape' them. How simple would it be to ignore commas in values in my method? I assume I'll need to rewrite the method quite a lot.
var rx = new Regex("^ ( ( \"[^\"]*\" ) | ( (?!$)[^\",] )+ | (?<1>,) )* $", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace | RegexOptions.Multiline);
var matches = rx.Matches("Hello,World,How,Are\nYou,Today,This,Is,\"A beautiful, world\",Hi!");
for (int i = 1; i < matches.Count; i++) {
if (matches[i].Groups[1].Captures.Count != matches[i - 1].Groups[1].Captures.Count) {
throw new Exception();
}
}
You could just use a regular expression that matches one item and count the number of matches in your line. An example of such a regex is the following:
var itemsRegex =
new Regex(#"(?<=(^|[\" + separator + #"]))((?<item>[^""\" + separator +
#"\n]*)|(?<item>""([^""]|"""")*""))(?=($|[\" + separator + #"]))");
Just do something like the following (assuming you don't want to have " inside your fields (otherwise these need some extra handling)):
protected static int CheckChar(string Input, char CharToCheck, char fieldDelimiter)
{
int Counter = 0;
bool inValue = false;
foreach (char StringChar in Input)
{
if (StringChar == fieldDelimiter)
inValue = !inValue;
else if (!inValue && StringChar == CharToCheck)
Counter++;
}
return Counter;
}
This will cause inValue to be true while inside fields. E.g. pass '"' as fieldDelimiter to ignore everything between "...". Just note that this won't handle escaped " (like "" or \"). You'd have to add such handling yourself.
Instead of checking the resulting string (the cake) you should check the fields (ingredients) before you concatenate (mix) them. That would give you the change to do something constructive (escaping/replacing) and throwing an exception only as a last resort.
In general, "," are legal in .csv fields, as long as the string fields are quoted. So internal "," should not be a problem, but the quotes may well be.

Categories