how to deal with string.split by position - c#

I'd like to ask one question about String.Split
For example:
char[] semicolon=new [] {';'};
char[] bracket=new [] {'[',']'};
string str="AND[Firstpart;Sndpart]";
I can split str by bracket and then split by semicolon.
Finally,I get the Firstpart and Sndpart in the bracket.
But If str="AND[AND[Firstpart;Sndpart];sndpart];
How can I get AND[Firpart;Sndpart] and sndpart?
Is there a way to tell c# to split by second semicolon?
Thanks for your help

One way is to hide characters inside bracket with a character that is not used in any of your strings.
Method HideSplit: This method will change separator characters inside brackets with fake ones. Then it will perform split and will give back the result with original characters.
This method maybe an overkill if you want to do this many times. but you should be able to optimize it easily if you got the idea.
private static void Main()
{
char[] semicolon = new[] { ';' };
char[] bracket = new[] { '[', ']' };
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] splitbyBracket = HideSplit(str, bracket);
}
private static string[] HideSplit(string str,char[] separator)
{
int counter = 0; // When counter is more than 0 it means we are inside brackets
StringBuilder result = new StringBuilder(); // To build up string as result
foreach (char ch in str)
{
if(ch == ']') counter--;
if (counter > 0) // if we are inside brackets perform hide
{
if (ch == '[') result.Append('\uFFF0'); // add '\uFFF0' instead of '['
else if (ch == ']') result.Append('\uFFF1');
else if (ch == ';') result.Append('\uFFF2');
else result.Append(ch);
}
else result.Append(ch);
if (ch == '[') counter++;
}
string[] split = result.ToString().Split(separator); // Perform split. (characters are hidden now)
return split.Select(x => x
.Replace('\uFFF0', '[')
.Replace('\uFFF1', ']')
.Replace('\uFFF2', ';')).ToArray(); // unhide characters and give back result.
// dont forget: using System.Linq;
}
Some examples :
string[] a1 = HideSplit("AND[AND[Firstpart;Sndpart];sndpart]", bracket);
// Will give you this array { AND , AND[Firstpart;Sndpart];sndpart }
string[] a2 = HideSplit("AND[Firstpart;Sndpart];sndpart", semicolon);
// Will give you this array { AND[Firstpart;Sndpart] , sndpart }
string[] a3 = HideSplit("AND[Firstpart;Sndpart]", bracket);
// Will give you this array { AND , Firstpart;Sndpart }
string[] a4 = HideSplit("Firstpart;Sndpart", semicolon);
// Will give you this array { Firstpart , Sndpart }
And you can continue splitting this way.

Is there a way to tell c# to split by second semicolon?
There is no direct way to do that, but if that is precisely what you want, it's not hard to achieve:
string str="AND[AND[Firstpart;Sndpart];sndpart];
string[] tSplits = str.Split(';', 3);
string[] splits = { tSplits[0] + ";" + tSplits[1], tSplits[2] };
You could achieve the same result using a combination of IndexOf() and Substring(), however that is most likely not what you'll end up using as it's too specific and not very helpful for various inputs.
For your case, you need something that understands context.
In real-world complex cases you'd probably use a lexer / parser, but that seems like an overkill here.
Your best effort would probably be to use a loop, walk through all characters while counting +/- square brackets and spliting when you find a semicolon & the count is 1.

You can use Regex.Split, which is a more flexible form of String.Split:
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] arr = Regex.Split(str, #"(.*?;.*?;)");
foreach (var s in arr)
Console.WriteLine("'{0}'", s);
// output: ''
// 'AND[AND[Firstpart;Sndpart];'
// 'sndpart]'
Regex.Split splits not by chars, but by a string matching a regex expression, so it comes down to constructing a regex pattern meeting particular requirements. Splitting by a second semicolon is in practice splitting by a string that ends in a semicolon and that contains another semicolon before, so the matching pattern by which you split the input string could be for example: (.*?;.*?;).
The returned array has three elements instead of two because the splitting regex matches the beginning of the input string, in this case the empty string is returned as the first element.
You can read more on Regex.Split on msdn.

Related

Fixing badly formatted string with number and thousands seperator

I am receiving a string with numbers, nulls, and delimiters that are the same as characters in the numbers. Also there are quotes around numbers that contain a comma(s). With C#, I want to parse out the string, such that I have a nice, pipe delimited series of numbers, no commas, 2 decimal places.
I tried the standard replace, removing certain string patterns to clean it up but I can't hit every case. I've removed the quotes first, but then I get extra numbers as the thousands separator turns into a delimiter. I attempted to use Regex.Replace with wildcards but can't get anything out of it due to the multiple numbers with quotes and commas inside the quotes.
edit for Silvermind: temp = Regex.Replace(temp, "(?:\",.*\")","($1 = .\n)");
I don't have control over the file I receive. I can get most of the data cleaned up. It's when the string looks like the following, that there is a problem:
703.36,751.36,"1,788.36",887.37,891.37,"1,850.37",843.37,"1,549,797.36",818.36,749.36,705.36,0.00,"18,979.70",934.37
Should I look for the quote character, find the next quote character, remove commas from everything between those 2 chars, and move on? This is where I'm headed but there has to be something more elegant out there (yes - I don't program in C# that often - I'm a DBA).
I would like to see the thousands separator removed, and no quotes.
This regex pattern will match all of the individual numbers in your string:
(".*?")|(\d+(.\d+)?)
(".*?") matches things like "123.45"
(\d+(.\d+)?) matches things like 123.45 or 123
From there, you can do a simple search and replace on each match to get a "clean" number.
Full code:
var s = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
Regex r = new Regex("(\".*?\")|(\\d+(.\\d+)?)");
List<double> results = new List<double>();
foreach (Match m in r.Matches(s))
{
string cleanNumber = m.Value.Replace("\"", "");
results.Add(double.Parse(cleanNumber));
}
Console.WriteLine(string.Join(", ", results));
Output:
703.36, 751.36, 1788.36, 887.37, 891.37, 1850.37, 843.37, 1549797.36, 818.36, 749.36, 705.36, 0, 18979.7, 934.37
This would be simpler to solve with a parser type solution which keeps track of state. Regex is for regular text anytime you have context it gets hard to solve with regex. Something like this would work.
internal class Program
{
private static string testString = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
private static void Main(string[] args)
{
bool inQuote = false;
List<string> numbersStr = new List<string>();
int StartPos = 0;
StringBuilder SB = new StringBuilder();
for(int x = 0; x < testString.Length; x++)
{
if(testString[x] == '"')
{
inQuote = !inQuote;
continue;
}
if(testString[x] == ',' && !inQuote )
{
numbersStr.Add(SB.ToString());
SB.Clear();
continue;
}
if(char.IsDigit(testString[x]) || testString[x] == '.')
{
SB.Append(testString[x]);
}
}
if(SB.Length != 0)
{
numbersStr.Add(SB.ToString());
}
var nums = numbersStr.Select(x => double.Parse(x));
foreach(var num in nums)
{
Console.WriteLine(num);
}
Console.ReadLine();
}
}

What is the regular expression to replace white space with a specified character?

I have searched lot of questions and answers but, I just got lengthy and complicated expressions. Now I want to replace all white spaces from the string. I know it can be done by regex. but, I don't have enough knowledge about regex and how to replace all white space with ','(comma) using it. I have checked some links but, I didn't get exact answer. If you have any link of posted question or answer like this. please suggest me.
My string is defined as below.
string sText = "BankMaster AccountNo decimal To varchar";
and the result should be return as below.
"BankMaster,AccountNo,decimal,To,varchar"
Full Code:
string sItems = Clipboard.GetText();
string[] lines = sItems.Split('\n');
for (int iLine =0; iLine<lines.Length;iLine++)
{
string sLine = lines[iLine];
sLine = //CODE TO REPLACE WHITE SPACE WITH ','
string[] cells = sLine.Split(',');
grdGrid.Rows.Add(iLine, cells[0], cells[1], cells[2], cells[4]);
}
Additional Details
I have more than 16000 line in a list. and all lines are same formatted like given example above. So, I am going to use regular expression instead of loop and recursive function call. If you have any other way to make this process more faster than regex then please suggest me.
string result = Regex.Replace(sText, "\\s+", ",");
\s+ stands for "capture all sequential whitespaces of any kind".
By whitespace regex engine undeerstands space (), tab (\t), newline (\n) and caret return (\r)
string a = "Some text with spaces";
Regex rgx = new Regex("\\s+");
string result = rgx.Replace(a, ",");
Console.WriteLine(result);
The code above will replace all the white spaces with ',' character
there are lot's of samples to do that by regular expressions:
Flex: replace all spaces with comma,
Regex replace all commas with value,
http://www.perlmonks.org/?node_id=896548,
http://www.dslreports.com/forum/r20971008-sed-help-whitespace-to-comma
Try This:
string str = "BankMaster AccountNo decimal To varchar";
StringBuilder temp = new StringBuilder();
str=str.Trim(); //trim before logic to avoid any trailing/leading whitespaces.
foreach(char ch in str)
{
if (ch == ' ' && temp[temp.Length-1] != ',')
{
temp.Append(",");
}
else if (ch != ' ')
{
temp.Append(ch.ToString());
}
}
Console.WriteLine(temp);
Output:
BankMaster,AccountNo,decimal,To,varchar
Try this:
sText = Regex.Replace(sText , #"\s+", ",");

split strings into many strings by newline?

i have incoming data that needs to be split into multiple values...ie.
2345\n564532\n345634\n234 234543\n1324 2435\n
The length is inconsistent when i receive it, the spacing is inconsistent when it is present, and i want to analyze the last 3 digits before each \n. how do i break off the string and turn it into a new string? like i said, this round, it may have 3 \n commands, next time, it may have 10, how do i create 3 new strings, analyze them, then destroy them before the next 10 come in?
string[] result = x.Split('\r');
result = x.Split(splitAtReturn, StringSplitOptions.None);
string stringToAnalyze = null;
foreach (string s in result)
{
if (s != "\r")
{
stringToAnalyze += s;
}
else
{
how do i analyze the characters here?
}
}
You could use the string.Split method. In particular I suggest to use the overload that use a string array of possible separators. This because splitting on the newline character poses an unique problem. In you example all the newline chars are simply a '\n', but for some OS the newline char is '\r\n' and if you can't rule out the possibility to have the twos in the same file then
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {"\n", "\r\n"}, StringSplitOptions.RemoveEmptyEntries);
Instead if your are certain that the file contains only the newline separator allowed by your OS then you could use
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
The StringSplitOptions.RemoveEmptyEntries allows to capture a pair of consecutive newline or an ending newline as an empty string.
Now you can work on the array examining the last 3 digits of every string
foreach(string s in result)
{
// Check to have at least 3 chars, no less
// otherwise an exception will occur
int maxLen = Math.Min(s.Length, 3);
string lastThree = s.Substring(s.Length - maxLen, maxLen);
... work on last 3 digits
}
Instead, if you want to work only using the index of the newline character without splitting the original string, you could use string.IndexOf in this way
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
int pos = -1;
while((pos = test.IndexOf('\n', pos + 1)) != -1)
{
if(pos < test.Length)
{
string last3part = test.Substring(pos - 3, 3);
Console.WriteLine(last3part);
}
}
string lines = "2345\n564532\n345634\n234 234543\n1324 2435\n";
var last3Digits = lines.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Select(line => line.Substring(line.Length - 3))
.ToList();
foreach(var my3digitnum in last3Chars)
{
}
last3Digits : [345, 532, 634, 543, 435]
This has been answered before, check this thread:
Easiest way to split a string on newlines in .NET?
An alternative way is using StringReader:
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Your answer is: theStringYouGot.Split('\n'); where you get an array of strings to do your processing for.

Fastest way to remove the leading special characters in string in c#

I am using c# and i have a string like
-Xyz
--Xyz
---Xyz
-Xyz-Abc
--Xyz-Abc
i simply want to remove any leading special character until alphabet comes , Note: Special characters in the middle of string will remain same . What is the fastest way to do this?
You could use string.TrimStart and pass in the characters you want to remove:
var result = yourString.TrimStart('-', '_');
However, this is only a good idea if the number of special characters you want to remove is well-known and small.
If that's not the case, you can use regular expressions:
var result = Regex.Replace(yourString, "^[^A-Za-z0-9]*", "");
I prefer this two methods:
List<string> strings = new List<string>()
{
"-Xyz",
"--Xyz",
"---Xyz",
"-Xyz-Abc",
"--Xyz-Abc"
};
foreach (var s in strings)
{
string temp;
// String.Trim Method
char[] charsToTrim = { '*', ' ', '\'', '-', '_' }; // Add more
temp = s.TrimStart(charsToTrim);
Console.WriteLine(temp);
// Enumerable.SkipWhile Method
// Char.IsPunctuation Method (se also Char.IsLetter, Char.IsLetterOrDigit, etc.)
temp = new String(s.SkipWhile(x => Char.IsPunctuation(x)).ToArray());
Console.WriteLine(temp);
}

How to find the number of occurrences of a letter in only the first sentence of a string?

I want to find number of letter "a" in only first sentence. The code below finds "a" in all sentences, but I want in only first sentence.
static void Main(string[] args)
{
string text; int k = 0;
text = "bla bla bla. something second. maybe last sentence.";
foreach (char a in text)
{
char b = 'a';
if (b == a)
{
k += 1;
}
}
Console.WriteLine("number of a in first sentence is " + k);
Console.ReadKey();
}
This will split the string into an array seperated by '.', then counts the number of 'a' char's in the first element of the array (the first sentence).
var count = Text.Split(new[] { '.', '!', '?', })[0].Count(c => c == 'a');
This example assumes a sentence is separated by a ., ? or !. If you have a decimal number in your string (e.g. 123.456), that will count as a sentence break. Breaking up a string into accurate sentences is a fairly complex exercise.
This is perhaps more verbose than what you were looking for, but hopefully it'll breed understanding as you read through it.
public static void Main()
{
//Make an array of the possible sentence enders. Doing this pattern lets us easily update
// the code later if it becomes necessary, or allows us easily to move this to an input
// parameter
string[] SentenceEnders = new string[] {"$", #"\.", #"\?", #"\!" /* Add Any Others */};
string WhatToFind = "a"; //What are we looking for? Regular Expressions Will Work Too!!!
string SentenceToCheck = "This, but not to exclude any others, is a sample."; //First example
string MultipleSentencesToCheck = #"
Is this a sentence
that breaks up
among multiple lines?
Yes!
It also has
more than one
sentence.
"; //Second Example
//This will split the input on all the enders put together(by way of joining them in [] inside a regular
// expression.
string[] SplitSentences = Regex.Split(SentenceToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase);
//SplitSentences is an array, with sentences on each index. The first index is the first sentence
string FirstSentence = SplitSentences[0];
//Now, split that single sentence on our matching pattern for what we should be counting
string[] SubSplitSentence = Regex.Split(FirstSentence, WhatToFind, RegexOptions.IgnoreCase);
//Now that it's split, it's split a number of times that matches how many matches we found, plus one
// (The "Left over" is the +1
int HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the first sentence, {0} '{1}'.", HowMany, WhatToFind));
//Do all this again for the second example. Note that ideally, this would be in a separate function
// and you wouldn't be writing code twice, but I wanted you to see it without all the comments so you can
// compare and contrast
SplitSentences = Regex.Split(MultipleSentencesToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
SubSplitSentence = Regex.Split(SplitSentences[0], WhatToFind, RegexOptions.IgnoreCase | RegexOptions.Singleline);
HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the second sentence, {0} '{1}'.", HowMany, WhatToFind));
}
Here is the output:
We found, in the first sentence, 3 'a'.
We found, in the second sentence, 4 'a'.
You didn't define "sentence", but if we assume it's always terminated by a period (.), just add this inside the loop:
if (a == '.') {
break;
}
Expand from this to support other sentence delimiters.
Simply "break" the foreach(...) loop when you encounter a "." (period)
Well, assuming you define a sentence as being ended with a '.''
Use String.IndexOf() to find the position of the first '.'. After that, searchin a SubString instead of the entire string.
find the place of the '.' in the text ( you can use split )
count the 'a' in the text from the place 0 to instance of the '.'
string SentenceToCheck = "Hi, I can wonder this situation where I can do best";
//Here I am giving several way to find this
//Using Regular Experession
int HowMany = Regex.Split(SentenceToCheck, "a", RegexOptions.IgnoreCase).Length - 1;
int i = Regex.Matches(SentenceToCheck, "a").Count;
// Simple way
int Count = SentenceToCheck.Length - SentenceToCheck.Replace("a", "").Length;
//Linq
var _lamdaCount = SentenceToCheck.ToCharArray().Where(t => t.ToString() != string.Empty)
.Select(t => t.ToString().ToUpper().Equals("A")).Count();
var _linqAIEnumareable = from _char in SentenceToCheck.ToCharArray()
where !String.IsNullOrEmpty(_char.ToString())
&& _char.ToString().ToUpper().Equals("A")
select _char;
int a =linqAIEnumareable.Count;
var _linqCount = from g in SentenceToCheck.ToCharArray()
where g.ToString().Equals("a")
select g;
int a = _linqCount.Count();

Categories