split strings into many strings by newline? - c#

i have incoming data that needs to be split into multiple values...ie.
2345\n564532\n345634\n234 234543\n1324 2435\n
The length is inconsistent when i receive it, the spacing is inconsistent when it is present, and i want to analyze the last 3 digits before each \n. how do i break off the string and turn it into a new string? like i said, this round, it may have 3 \n commands, next time, it may have 10, how do i create 3 new strings, analyze them, then destroy them before the next 10 come in?
string[] result = x.Split('\r');
result = x.Split(splitAtReturn, StringSplitOptions.None);
string stringToAnalyze = null;
foreach (string s in result)
{
if (s != "\r")
{
stringToAnalyze += s;
}
else
{
how do i analyze the characters here?
}
}

You could use the string.Split method. In particular I suggest to use the overload that use a string array of possible separators. This because splitting on the newline character poses an unique problem. In you example all the newline chars are simply a '\n', but for some OS the newline char is '\r\n' and if you can't rule out the possibility to have the twos in the same file then
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {"\n", "\r\n"}, StringSplitOptions.RemoveEmptyEntries);
Instead if your are certain that the file contains only the newline separator allowed by your OS then you could use
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
The StringSplitOptions.RemoveEmptyEntries allows to capture a pair of consecutive newline or an ending newline as an empty string.
Now you can work on the array examining the last 3 digits of every string
foreach(string s in result)
{
// Check to have at least 3 chars, no less
// otherwise an exception will occur
int maxLen = Math.Min(s.Length, 3);
string lastThree = s.Substring(s.Length - maxLen, maxLen);
... work on last 3 digits
}
Instead, if you want to work only using the index of the newline character without splitting the original string, you could use string.IndexOf in this way
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
int pos = -1;
while((pos = test.IndexOf('\n', pos + 1)) != -1)
{
if(pos < test.Length)
{
string last3part = test.Substring(pos - 3, 3);
Console.WriteLine(last3part);
}
}

string lines = "2345\n564532\n345634\n234 234543\n1324 2435\n";
var last3Digits = lines.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Select(line => line.Substring(line.Length - 3))
.ToList();
foreach(var my3digitnum in last3Chars)
{
}
last3Digits : [345, 532, 634, 543, 435]

This has been answered before, check this thread:
Easiest way to split a string on newlines in .NET?
An alternative way is using StringReader:
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}

Your answer is: theStringYouGot.Split('\n'); where you get an array of strings to do your processing for.

Related

Fixing badly formatted string with number and thousands seperator

I am receiving a string with numbers, nulls, and delimiters that are the same as characters in the numbers. Also there are quotes around numbers that contain a comma(s). With C#, I want to parse out the string, such that I have a nice, pipe delimited series of numbers, no commas, 2 decimal places.
I tried the standard replace, removing certain string patterns to clean it up but I can't hit every case. I've removed the quotes first, but then I get extra numbers as the thousands separator turns into a delimiter. I attempted to use Regex.Replace with wildcards but can't get anything out of it due to the multiple numbers with quotes and commas inside the quotes.
edit for Silvermind: temp = Regex.Replace(temp, "(?:\",.*\")","($1 = .\n)");
I don't have control over the file I receive. I can get most of the data cleaned up. It's when the string looks like the following, that there is a problem:
703.36,751.36,"1,788.36",887.37,891.37,"1,850.37",843.37,"1,549,797.36",818.36,749.36,705.36,0.00,"18,979.70",934.37
Should I look for the quote character, find the next quote character, remove commas from everything between those 2 chars, and move on? This is where I'm headed but there has to be something more elegant out there (yes - I don't program in C# that often - I'm a DBA).
I would like to see the thousands separator removed, and no quotes.
This regex pattern will match all of the individual numbers in your string:
(".*?")|(\d+(.\d+)?)
(".*?") matches things like "123.45"
(\d+(.\d+)?) matches things like 123.45 or 123
From there, you can do a simple search and replace on each match to get a "clean" number.
Full code:
var s = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
Regex r = new Regex("(\".*?\")|(\\d+(.\\d+)?)");
List<double> results = new List<double>();
foreach (Match m in r.Matches(s))
{
string cleanNumber = m.Value.Replace("\"", "");
results.Add(double.Parse(cleanNumber));
}
Console.WriteLine(string.Join(", ", results));
Output:
703.36, 751.36, 1788.36, 887.37, 891.37, 1850.37, 843.37, 1549797.36, 818.36, 749.36, 705.36, 0, 18979.7, 934.37
This would be simpler to solve with a parser type solution which keeps track of state. Regex is for regular text anytime you have context it gets hard to solve with regex. Something like this would work.
internal class Program
{
private static string testString = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
private static void Main(string[] args)
{
bool inQuote = false;
List<string> numbersStr = new List<string>();
int StartPos = 0;
StringBuilder SB = new StringBuilder();
for(int x = 0; x < testString.Length; x++)
{
if(testString[x] == '"')
{
inQuote = !inQuote;
continue;
}
if(testString[x] == ',' && !inQuote )
{
numbersStr.Add(SB.ToString());
SB.Clear();
continue;
}
if(char.IsDigit(testString[x]) || testString[x] == '.')
{
SB.Append(testString[x]);
}
}
if(SB.Length != 0)
{
numbersStr.Add(SB.ToString());
}
var nums = numbersStr.Select(x => double.Parse(x));
foreach(var num in nums)
{
Console.WriteLine(num);
}
Console.ReadLine();
}
}

how to deal with string.split by position

I'd like to ask one question about String.Split
For example:
char[] semicolon=new [] {';'};
char[] bracket=new [] {'[',']'};
string str="AND[Firstpart;Sndpart]";
I can split str by bracket and then split by semicolon.
Finally,I get the Firstpart and Sndpart in the bracket.
But If str="AND[AND[Firstpart;Sndpart];sndpart];
How can I get AND[Firpart;Sndpart] and sndpart?
Is there a way to tell c# to split by second semicolon?
Thanks for your help
One way is to hide characters inside bracket with a character that is not used in any of your strings.
Method HideSplit: This method will change separator characters inside brackets with fake ones. Then it will perform split and will give back the result with original characters.
This method maybe an overkill if you want to do this many times. but you should be able to optimize it easily if you got the idea.
private static void Main()
{
char[] semicolon = new[] { ';' };
char[] bracket = new[] { '[', ']' };
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] splitbyBracket = HideSplit(str, bracket);
}
private static string[] HideSplit(string str,char[] separator)
{
int counter = 0; // When counter is more than 0 it means we are inside brackets
StringBuilder result = new StringBuilder(); // To build up string as result
foreach (char ch in str)
{
if(ch == ']') counter--;
if (counter > 0) // if we are inside brackets perform hide
{
if (ch == '[') result.Append('\uFFF0'); // add '\uFFF0' instead of '['
else if (ch == ']') result.Append('\uFFF1');
else if (ch == ';') result.Append('\uFFF2');
else result.Append(ch);
}
else result.Append(ch);
if (ch == '[') counter++;
}
string[] split = result.ToString().Split(separator); // Perform split. (characters are hidden now)
return split.Select(x => x
.Replace('\uFFF0', '[')
.Replace('\uFFF1', ']')
.Replace('\uFFF2', ';')).ToArray(); // unhide characters and give back result.
// dont forget: using System.Linq;
}
Some examples :
string[] a1 = HideSplit("AND[AND[Firstpart;Sndpart];sndpart]", bracket);
// Will give you this array { AND , AND[Firstpart;Sndpart];sndpart }
string[] a2 = HideSplit("AND[Firstpart;Sndpart];sndpart", semicolon);
// Will give you this array { AND[Firstpart;Sndpart] , sndpart }
string[] a3 = HideSplit("AND[Firstpart;Sndpart]", bracket);
// Will give you this array { AND , Firstpart;Sndpart }
string[] a4 = HideSplit("Firstpart;Sndpart", semicolon);
// Will give you this array { Firstpart , Sndpart }
And you can continue splitting this way.
Is there a way to tell c# to split by second semicolon?
There is no direct way to do that, but if that is precisely what you want, it's not hard to achieve:
string str="AND[AND[Firstpart;Sndpart];sndpart];
string[] tSplits = str.Split(';', 3);
string[] splits = { tSplits[0] + ";" + tSplits[1], tSplits[2] };
You could achieve the same result using a combination of IndexOf() and Substring(), however that is most likely not what you'll end up using as it's too specific and not very helpful for various inputs.
For your case, you need something that understands context.
In real-world complex cases you'd probably use a lexer / parser, but that seems like an overkill here.
Your best effort would probably be to use a loop, walk through all characters while counting +/- square brackets and spliting when you find a semicolon & the count is 1.
You can use Regex.Split, which is a more flexible form of String.Split:
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] arr = Regex.Split(str, #"(.*?;.*?;)");
foreach (var s in arr)
Console.WriteLine("'{0}'", s);
// output: ''
// 'AND[AND[Firstpart;Sndpart];'
// 'sndpart]'
Regex.Split splits not by chars, but by a string matching a regex expression, so it comes down to constructing a regex pattern meeting particular requirements. Splitting by a second semicolon is in practice splitting by a string that ends in a semicolon and that contains another semicolon before, so the matching pattern by which you split the input string could be for example: (.*?;.*?;).
The returned array has three elements instead of two because the splitting regex matches the beginning of the input string, in this case the empty string is returned as the first element.
You can read more on Regex.Split on msdn.

String splitting with a special structure

I have strings of the following form:
str = "[int]:[int],[int]:[int],[int]:[int],[int]:[int], ..." (for undefined number of times).
What I did was this:
string[] str_split = str.Split(',');
for( int i = 0; i < str_split.Length; i++ )
{
string[] str_split2 = str_split[i].Split(':');
}
Unfortunately this breaks when some of the numbers have extra ',' inside a number. For example, we have something like this:
695,000:14,306,000:12,136000:12,363000:6
in which the followings are the numbers, ordered from the left to the right:
695,000
14
306,000
12
136000
12
363000
6
How can I resolve this string splitting problem?
If it is the case that only the number to the left of the colon separator can contain commas, then you could simply express this as:
string s = "695,000:14,306,000:12,136000:12,363000:6";
var parts = Regex.Split(s, #":|(?<=:\d+),");
The regex pattern, which identifies the separators, reads: "any colon, or any comma that follows a colon and a sequence of digits (but not another comma)".
A simple solution is split using : as delimiter. The resultant array will have numbers of the format [int],[int]. Parse through the array and split each entry using , as the delimiter. This will give you an array of [int] numbers.
It might not be the best way to do it and it might not work all the time but here's what I'd do.
string[] leftRightDoubles = str.Split(':');
foreach(string substring in leftRightDoubles){
string[] indivNumbers = str.Split(',');
//if indivNumbers.Length == 2, you know that these two are separate numbers
//if indivNumbers.Length > 2, use heuristics to determine which parts belong to which number
if(indivNumbers.Length > 2) {
for(int i = 0, i < indivNumbers.Length, i++) {
if(indivNumbers[i] != '000') { //Or use some other heuristic
//It's a new number
} else {
//It's the rest of previous number
}
}
}
}
//It's sort of pseudocode with comments (haven't touched C# in a while so I don't want to write full C# code)

Length of string WITHOUT spaces (C#)

Quick little question...
I need to count the length of a string, but WITHOUT the spaces inside of it.
E.g. for a string like "I am Bob", string.Length would return 8 (6 letters + 2 spaces).
I need a method, or something, to give me the length (or number of) just the letters (6 in the case of "I am Bob")
I have tried the following
s.Replace (" ", "");
s.Replace (" ", null);
s.Replace (" ", string.empty);
to try and get "IamBob", which I did, but it didn't solve my problem because it still counted "" as a character.
Any help?
This returns the number of non-whitespace characters:
"I am Bob".Count(c => !Char.IsWhiteSpace(c));
Demo
Char.IsWhiteSpace:
White space characters are the following Unicode characters:
Members of the SpaceSeparator category, which includes the characters SPACE (U+0020), OGHAM SPACE MARK (U+1680), MONGOLIAN VOWEL SEPARATOR (U+180E), EN QUAD (U+2000), EM QUAD (U+2001), EN SPACE (U+2002), EM SPACE (U+2003), THREE-PER-EM SPACE (U+2004), FOUR-PER-EM SPACE (U+2005), SIX-PER-EM SPACE (U+2006), FIGURE SPACE (U+2007), PUNCTUATION SPACE (U+2008), THIN SPACE (U+2009), HAIR SPACE (U+200A), NARROW NO-BREAK SPACE (U+202F), MEDIUM MATHEMATICAL SPACE (U+205F), and IDEOGRAPHIC SPACE (U+3000).
Members of the LineSeparator category, which consists solely of the LINE SEPARATOR character (U+2028).
Members of the ParagraphSeparator category, which consists solely of the PARAGRAPH SEPARATOR character (U+2029).
The characters CHARACTER TABULATION (U+0009), LINE FEED (U+000A), LINE TABULATION (U+000B), FORM FEED (U+000C), CARRIAGE RETURN (U+000D), NEXT LINE (U+0085), and NO-BREAK SPACE (U+00A0).
No. It doesn't.
string s = "I am Bob";
Console.WriteLine(s.Replace(" ", "").Length); // 6
Console.WriteLine(s.Replace(" ", null).Length); //6
Console.WriteLine(s.Replace(" ", string.Empty).Length); //6
Here is a DEMO.
But what are whitespace characters?
http://en.wikipedia.org/wiki/Whitespace_character
You probably forgot to reassign the result of Replace. Try this:
string s = "I am bob";
Console.WriteLine(s.Length); // 8
s = s.Replace(" ", "");
Console.WriteLine(s.Length); // 6
A pretty simple way is to write an extension method that will do just that- count the characters without the white spaces. Here's the code:
public static class MyExtension
{
public static int CharCountWithoutSpaces(this string str)
{
string[] arr = str.Split(' ');
string allChars = "";
foreach (string s in arr)
{
allChars += s;
}
int length = allChars.Length;
return length;
}
}
To execute, simply call the method on the string:
string yourString = "I am Bob";
int count = yourString.CharCountWithoutSpaces();
Console.WriteLine(count); //=6
Alternatively, you can split the string an way you want if you don't want to include say, periods or commas:
string[] arr = str.Split('.');
or:
string[] arr = str.Split(',');
this is fastest way:
var spaceCount = 0;
for (var i 0; i < #string.Lenght; i++)
{
if (#string[i]==" ") spaceCount++;
}
var res = #string.Lenght-spaceCount;
Your problem is probably related to Replace() method not actually changing the string, rather returning the replaced value;
string withSpaces = "I am Bob";
string withoutSpaces = withSpaces.Replace(" ","");
Console.WriteLine(withSpaces);
Console.WriteLine(withoutSpaces);
Console.WriteLine(withSpaces.Length);
Console.WriteLine(withoutSpaces.Length);
//output
//I am Bob
//IamBob
//8
//6
You can use a combination of Length and Count functions on the string object. Here is a simple example.
string sText = "This is great text";
int nSpaces = sText.Length - sText.Count(Char.IsWhiteSpace);
This will count single or multiple (consistent) spaces accurately.
Hope it helps.

How to indicate whitespaces while reading from a .txt file

I have a simple .txt file with X,Y-values in it. It is structured like this:
-25.7754 35.87
-22.1233 32.16
-20.361 30.75
etc.
I am able to read single lines or the whole text to the end, with objstream.ReadToEnd(); & objstream.ReadLine().
But here's my question how could I indicate when the String after the first value ends so I can save/parse it to float & proceed reading the value of the next string?
Here is the read functionality I have so far :)
StreamReader objStream = new StreamReader("C:blablabla\\Text.asc");
textBox1.Text = objStream.ReadLine();
Thanks in advance,
BC++
Use String.split()
As requested, an example :
string s = "there is a cat";
//
// Split string on spaces.
// ... This will separate all the words.
//
string[] words = s.Split(' ');
foreach (string word in words)
{
Console.WriteLine(word);
}
The output is :
there
is
a
cat
Look at the string.Split methods:
var line1 = objStream.ReadLine();
var lineParts = line1.Split(" ".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
textBox1.Text = lineParts[0];
textBox2.Text = lineParts[1];
Note the use of an overload that uses StringSplitOptions.RemoveEmptyEntries - the means that if you have multiple spaces in succession, the result will not contain empty entries.
If you really mean white-space and not space then you have to go this way:
string line = "-25.7754 35.87";
string[] values = line.Split(new char[] { }, StringSplitOptions.RemoveEmptyEntries);
The difference from the other answers in the splitting character. If this not defined then white-space characters are assumed to be the delimiters. In other words you will get the same result for
string line = "-25.7754\t35.87"; // tab instead of spaces.
You will have the flexibility to split correctly fixed length or tab delimited lines using the same code.

Categories