Fixing badly formatted string with number and thousands seperator - c#

I am receiving a string with numbers, nulls, and delimiters that are the same as characters in the numbers. Also there are quotes around numbers that contain a comma(s). With C#, I want to parse out the string, such that I have a nice, pipe delimited series of numbers, no commas, 2 decimal places.
I tried the standard replace, removing certain string patterns to clean it up but I can't hit every case. I've removed the quotes first, but then I get extra numbers as the thousands separator turns into a delimiter. I attempted to use Regex.Replace with wildcards but can't get anything out of it due to the multiple numbers with quotes and commas inside the quotes.
edit for Silvermind: temp = Regex.Replace(temp, "(?:\",.*\")","($1 = .\n)");
I don't have control over the file I receive. I can get most of the data cleaned up. It's when the string looks like the following, that there is a problem:
703.36,751.36,"1,788.36",887.37,891.37,"1,850.37",843.37,"1,549,797.36",818.36,749.36,705.36,0.00,"18,979.70",934.37
Should I look for the quote character, find the next quote character, remove commas from everything between those 2 chars, and move on? This is where I'm headed but there has to be something more elegant out there (yes - I don't program in C# that often - I'm a DBA).
I would like to see the thousands separator removed, and no quotes.

This regex pattern will match all of the individual numbers in your string:
(".*?")|(\d+(.\d+)?)
(".*?") matches things like "123.45"
(\d+(.\d+)?) matches things like 123.45 or 123
From there, you can do a simple search and replace on each match to get a "clean" number.
Full code:
var s = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
Regex r = new Regex("(\".*?\")|(\\d+(.\\d+)?)");
List<double> results = new List<double>();
foreach (Match m in r.Matches(s))
{
string cleanNumber = m.Value.Replace("\"", "");
results.Add(double.Parse(cleanNumber));
}
Console.WriteLine(string.Join(", ", results));
Output:
703.36, 751.36, 1788.36, 887.37, 891.37, 1850.37, 843.37, 1549797.36, 818.36, 749.36, 705.36, 0, 18979.7, 934.37

This would be simpler to solve with a parser type solution which keeps track of state. Regex is for regular text anytime you have context it gets hard to solve with regex. Something like this would work.
internal class Program
{
private static string testString = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
private static void Main(string[] args)
{
bool inQuote = false;
List<string> numbersStr = new List<string>();
int StartPos = 0;
StringBuilder SB = new StringBuilder();
for(int x = 0; x < testString.Length; x++)
{
if(testString[x] == '"')
{
inQuote = !inQuote;
continue;
}
if(testString[x] == ',' && !inQuote )
{
numbersStr.Add(SB.ToString());
SB.Clear();
continue;
}
if(char.IsDigit(testString[x]) || testString[x] == '.')
{
SB.Append(testString[x]);
}
}
if(SB.Length != 0)
{
numbersStr.Add(SB.ToString());
}
var nums = numbersStr.Select(x => double.Parse(x));
foreach(var num in nums)
{
Console.WriteLine(num);
}
Console.ReadLine();
}
}

Related

c# add comma before every numbers in my string except first number

I am developing as application in asp.net mvc.
I have a string like below
string myString = "1A5#3a2#"
now I want to add a comma after every occurrence of number in my string except the first occurrence.
like
string myNewString "1A,5#,3a,2#";
I know I can use loop for this like below
myNewString
foreach(var ch in myString)
{
if (ch >= '0' && ch <= '9')
{
myNewString = myNewString ==""?"":myNewString + "," + Convert.ToString(ch);
}
else
{
myNewString = myNewString ==""? Convert.ToString(ch): myNewString + Convert.ToString(ch);
}
}
You could use this StringBuilder approach:
public static string InsertBeforeEveryDigit(string input, char insert)
{
StringBuilder sb = new(input);
for (int i = sb.Length - 2; i >= 0; i--)
{
if (!char.IsDigit(sb[i]) && char.IsDigit(sb[i+1]))
{
sb.Insert(i+1, insert);
}
}
return sb.ToString();
}
Console.Write(InsertBeforeEveryDigit("1A5#3a2#", ',')); // 1A,5#,3a,2#
Update: This answer gives a different result than the one from TWM if the string contains consecutive digits like here: "12A5#3a2#". My answer gives: 12A,5#,3a,2#,
TWM's gives: 1,2A,5#,3a,2#. Not sure what is desired.
so, as I understood the below code will work for you
StringBuilder myNewStringBuilder = new StringBuilder();
foreach(var ch in myString)
{
if (ch >= '0' && ch <= '9')
{
if (myNewStringBuilder.Length > 0)
{
myNewStringBuilder.Append(",");
}
myNewStringBuilder.Append(ch);
}
else
{
myNewStringBuilder.Append(ch);
}
}
myString = myNewStringBuilder.ToString();
NOTE
Instead of using myNewString variable, I've used StringBuilder object to build up the new string. This is more efficient than concatenating strings, as concatenating strings creates new strings and discards the old ones. The StringBuilder object avoids this by efficiently storing the string in a mutable buffer, reducing the number of object allocations and garbage collections.
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. 😊
string myString = "1A5#3a2#";
var result = Regex.Replace(myString, #"(?<=\d\D*)\d\D*", ",$&");
Regex explanation (#regex101):
\d\D* - matches every occurrence of a digit with any following non-digits (zero+)
(?<=\d\D*) - negative lookbehind so we have at least one group with digit before (i.e. ignore first)
This can be updated if you need to handle consecutive digits (i.e. "1a55b" -> "1a,55b") by changing \d to \d+:
var result = Regex.Replace(myString, #"(?<=\d+\D*)\d+\D*", ",$&");

how to deal with string.split by position

I'd like to ask one question about String.Split
For example:
char[] semicolon=new [] {';'};
char[] bracket=new [] {'[',']'};
string str="AND[Firstpart;Sndpart]";
I can split str by bracket and then split by semicolon.
Finally,I get the Firstpart and Sndpart in the bracket.
But If str="AND[AND[Firstpart;Sndpart];sndpart];
How can I get AND[Firpart;Sndpart] and sndpart?
Is there a way to tell c# to split by second semicolon?
Thanks for your help
One way is to hide characters inside bracket with a character that is not used in any of your strings.
Method HideSplit: This method will change separator characters inside brackets with fake ones. Then it will perform split and will give back the result with original characters.
This method maybe an overkill if you want to do this many times. but you should be able to optimize it easily if you got the idea.
private static void Main()
{
char[] semicolon = new[] { ';' };
char[] bracket = new[] { '[', ']' };
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] splitbyBracket = HideSplit(str, bracket);
}
private static string[] HideSplit(string str,char[] separator)
{
int counter = 0; // When counter is more than 0 it means we are inside brackets
StringBuilder result = new StringBuilder(); // To build up string as result
foreach (char ch in str)
{
if(ch == ']') counter--;
if (counter > 0) // if we are inside brackets perform hide
{
if (ch == '[') result.Append('\uFFF0'); // add '\uFFF0' instead of '['
else if (ch == ']') result.Append('\uFFF1');
else if (ch == ';') result.Append('\uFFF2');
else result.Append(ch);
}
else result.Append(ch);
if (ch == '[') counter++;
}
string[] split = result.ToString().Split(separator); // Perform split. (characters are hidden now)
return split.Select(x => x
.Replace('\uFFF0', '[')
.Replace('\uFFF1', ']')
.Replace('\uFFF2', ';')).ToArray(); // unhide characters and give back result.
// dont forget: using System.Linq;
}
Some examples :
string[] a1 = HideSplit("AND[AND[Firstpart;Sndpart];sndpart]", bracket);
// Will give you this array { AND , AND[Firstpart;Sndpart];sndpart }
string[] a2 = HideSplit("AND[Firstpart;Sndpart];sndpart", semicolon);
// Will give you this array { AND[Firstpart;Sndpart] , sndpart }
string[] a3 = HideSplit("AND[Firstpart;Sndpart]", bracket);
// Will give you this array { AND , Firstpart;Sndpart }
string[] a4 = HideSplit("Firstpart;Sndpart", semicolon);
// Will give you this array { Firstpart , Sndpart }
And you can continue splitting this way.
Is there a way to tell c# to split by second semicolon?
There is no direct way to do that, but if that is precisely what you want, it's not hard to achieve:
string str="AND[AND[Firstpart;Sndpart];sndpart];
string[] tSplits = str.Split(';', 3);
string[] splits = { tSplits[0] + ";" + tSplits[1], tSplits[2] };
You could achieve the same result using a combination of IndexOf() and Substring(), however that is most likely not what you'll end up using as it's too specific and not very helpful for various inputs.
For your case, you need something that understands context.
In real-world complex cases you'd probably use a lexer / parser, but that seems like an overkill here.
Your best effort would probably be to use a loop, walk through all characters while counting +/- square brackets and spliting when you find a semicolon & the count is 1.
You can use Regex.Split, which is a more flexible form of String.Split:
string str = "AND[AND[Firstpart;Sndpart];sndpart]";
string[] arr = Regex.Split(str, #"(.*?;.*?;)");
foreach (var s in arr)
Console.WriteLine("'{0}'", s);
// output: ''
// 'AND[AND[Firstpart;Sndpart];'
// 'sndpart]'
Regex.Split splits not by chars, but by a string matching a regex expression, so it comes down to constructing a regex pattern meeting particular requirements. Splitting by a second semicolon is in practice splitting by a string that ends in a semicolon and that contains another semicolon before, so the matching pattern by which you split the input string could be for example: (.*?;.*?;).
The returned array has three elements instead of two because the splitting regex matches the beginning of the input string, in this case the empty string is returned as the first element.
You can read more on Regex.Split on msdn.

String splitting with a special structure

I have strings of the following form:
str = "[int]:[int],[int]:[int],[int]:[int],[int]:[int], ..." (for undefined number of times).
What I did was this:
string[] str_split = str.Split(',');
for( int i = 0; i < str_split.Length; i++ )
{
string[] str_split2 = str_split[i].Split(':');
}
Unfortunately this breaks when some of the numbers have extra ',' inside a number. For example, we have something like this:
695,000:14,306,000:12,136000:12,363000:6
in which the followings are the numbers, ordered from the left to the right:
695,000
14
306,000
12
136000
12
363000
6
How can I resolve this string splitting problem?
If it is the case that only the number to the left of the colon separator can contain commas, then you could simply express this as:
string s = "695,000:14,306,000:12,136000:12,363000:6";
var parts = Regex.Split(s, #":|(?<=:\d+),");
The regex pattern, which identifies the separators, reads: "any colon, or any comma that follows a colon and a sequence of digits (but not another comma)".
A simple solution is split using : as delimiter. The resultant array will have numbers of the format [int],[int]. Parse through the array and split each entry using , as the delimiter. This will give you an array of [int] numbers.
It might not be the best way to do it and it might not work all the time but here's what I'd do.
string[] leftRightDoubles = str.Split(':');
foreach(string substring in leftRightDoubles){
string[] indivNumbers = str.Split(',');
//if indivNumbers.Length == 2, you know that these two are separate numbers
//if indivNumbers.Length > 2, use heuristics to determine which parts belong to which number
if(indivNumbers.Length > 2) {
for(int i = 0, i < indivNumbers.Length, i++) {
if(indivNumbers[i] != '000') { //Or use some other heuristic
//It's a new number
} else {
//It's the rest of previous number
}
}
}
}
//It's sort of pseudocode with comments (haven't touched C# in a while so I don't want to write full C# code)

split strings into many strings by newline?

i have incoming data that needs to be split into multiple values...ie.
2345\n564532\n345634\n234 234543\n1324 2435\n
The length is inconsistent when i receive it, the spacing is inconsistent when it is present, and i want to analyze the last 3 digits before each \n. how do i break off the string and turn it into a new string? like i said, this round, it may have 3 \n commands, next time, it may have 10, how do i create 3 new strings, analyze them, then destroy them before the next 10 come in?
string[] result = x.Split('\r');
result = x.Split(splitAtReturn, StringSplitOptions.None);
string stringToAnalyze = null;
foreach (string s in result)
{
if (s != "\r")
{
stringToAnalyze += s;
}
else
{
how do i analyze the characters here?
}
}
You could use the string.Split method. In particular I suggest to use the overload that use a string array of possible separators. This because splitting on the newline character poses an unique problem. In you example all the newline chars are simply a '\n', but for some OS the newline char is '\r\n' and if you can't rule out the possibility to have the twos in the same file then
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {"\n", "\r\n"}, StringSplitOptions.RemoveEmptyEntries);
Instead if your are certain that the file contains only the newline separator allowed by your OS then you could use
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
The StringSplitOptions.RemoveEmptyEntries allows to capture a pair of consecutive newline or an ending newline as an empty string.
Now you can work on the array examining the last 3 digits of every string
foreach(string s in result)
{
// Check to have at least 3 chars, no less
// otherwise an exception will occur
int maxLen = Math.Min(s.Length, 3);
string lastThree = s.Substring(s.Length - maxLen, maxLen);
... work on last 3 digits
}
Instead, if you want to work only using the index of the newline character without splitting the original string, you could use string.IndexOf in this way
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
int pos = -1;
while((pos = test.IndexOf('\n', pos + 1)) != -1)
{
if(pos < test.Length)
{
string last3part = test.Substring(pos - 3, 3);
Console.WriteLine(last3part);
}
}
string lines = "2345\n564532\n345634\n234 234543\n1324 2435\n";
var last3Digits = lines.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Select(line => line.Substring(line.Length - 3))
.ToList();
foreach(var my3digitnum in last3Chars)
{
}
last3Digits : [345, 532, 634, 543, 435]
This has been answered before, check this thread:
Easiest way to split a string on newlines in .NET?
An alternative way is using StringReader:
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Your answer is: theStringYouGot.Split('\n'); where you get an array of strings to do your processing for.

How to find the number of occurrences of a letter in only the first sentence of a string?

I want to find number of letter "a" in only first sentence. The code below finds "a" in all sentences, but I want in only first sentence.
static void Main(string[] args)
{
string text; int k = 0;
text = "bla bla bla. something second. maybe last sentence.";
foreach (char a in text)
{
char b = 'a';
if (b == a)
{
k += 1;
}
}
Console.WriteLine("number of a in first sentence is " + k);
Console.ReadKey();
}
This will split the string into an array seperated by '.', then counts the number of 'a' char's in the first element of the array (the first sentence).
var count = Text.Split(new[] { '.', '!', '?', })[0].Count(c => c == 'a');
This example assumes a sentence is separated by a ., ? or !. If you have a decimal number in your string (e.g. 123.456), that will count as a sentence break. Breaking up a string into accurate sentences is a fairly complex exercise.
This is perhaps more verbose than what you were looking for, but hopefully it'll breed understanding as you read through it.
public static void Main()
{
//Make an array of the possible sentence enders. Doing this pattern lets us easily update
// the code later if it becomes necessary, or allows us easily to move this to an input
// parameter
string[] SentenceEnders = new string[] {"$", #"\.", #"\?", #"\!" /* Add Any Others */};
string WhatToFind = "a"; //What are we looking for? Regular Expressions Will Work Too!!!
string SentenceToCheck = "This, but not to exclude any others, is a sample."; //First example
string MultipleSentencesToCheck = #"
Is this a sentence
that breaks up
among multiple lines?
Yes!
It also has
more than one
sentence.
"; //Second Example
//This will split the input on all the enders put together(by way of joining them in [] inside a regular
// expression.
string[] SplitSentences = Regex.Split(SentenceToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase);
//SplitSentences is an array, with sentences on each index. The first index is the first sentence
string FirstSentence = SplitSentences[0];
//Now, split that single sentence on our matching pattern for what we should be counting
string[] SubSplitSentence = Regex.Split(FirstSentence, WhatToFind, RegexOptions.IgnoreCase);
//Now that it's split, it's split a number of times that matches how many matches we found, plus one
// (The "Left over" is the +1
int HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the first sentence, {0} '{1}'.", HowMany, WhatToFind));
//Do all this again for the second example. Note that ideally, this would be in a separate function
// and you wouldn't be writing code twice, but I wanted you to see it without all the comments so you can
// compare and contrast
SplitSentences = Regex.Split(MultipleSentencesToCheck, "[" + String.Join("", SentenceEnders) + "]", RegexOptions.IgnoreCase | RegexOptions.Singleline);
SubSplitSentence = Regex.Split(SplitSentences[0], WhatToFind, RegexOptions.IgnoreCase | RegexOptions.Singleline);
HowMany = SubSplitSentence.Length - 1;
System.Console.WriteLine(string.Format("We found, in the second sentence, {0} '{1}'.", HowMany, WhatToFind));
}
Here is the output:
We found, in the first sentence, 3 'a'.
We found, in the second sentence, 4 'a'.
You didn't define "sentence", but if we assume it's always terminated by a period (.), just add this inside the loop:
if (a == '.') {
break;
}
Expand from this to support other sentence delimiters.
Simply "break" the foreach(...) loop when you encounter a "." (period)
Well, assuming you define a sentence as being ended with a '.''
Use String.IndexOf() to find the position of the first '.'. After that, searchin a SubString instead of the entire string.
find the place of the '.' in the text ( you can use split )
count the 'a' in the text from the place 0 to instance of the '.'
string SentenceToCheck = "Hi, I can wonder this situation where I can do best";
//Here I am giving several way to find this
//Using Regular Experession
int HowMany = Regex.Split(SentenceToCheck, "a", RegexOptions.IgnoreCase).Length - 1;
int i = Regex.Matches(SentenceToCheck, "a").Count;
// Simple way
int Count = SentenceToCheck.Length - SentenceToCheck.Replace("a", "").Length;
//Linq
var _lamdaCount = SentenceToCheck.ToCharArray().Where(t => t.ToString() != string.Empty)
.Select(t => t.ToString().ToUpper().Equals("A")).Count();
var _linqAIEnumareable = from _char in SentenceToCheck.ToCharArray()
where !String.IsNullOrEmpty(_char.ToString())
&& _char.ToString().ToUpper().Equals("A")
select _char;
int a =linqAIEnumareable.Count;
var _linqCount = from g in SentenceToCheck.ToCharArray()
where g.ToString().Equals("a")
select g;
int a = _linqCount.Count();

Categories