I have strings of the following form:
str = "[int]:[int],[int]:[int],[int]:[int],[int]:[int], ..." (for undefined number of times).
What I did was this:
string[] str_split = str.Split(',');
for( int i = 0; i < str_split.Length; i++ )
{
string[] str_split2 = str_split[i].Split(':');
}
Unfortunately this breaks when some of the numbers have extra ',' inside a number. For example, we have something like this:
695,000:14,306,000:12,136000:12,363000:6
in which the followings are the numbers, ordered from the left to the right:
695,000
14
306,000
12
136000
12
363000
6
How can I resolve this string splitting problem?
If it is the case that only the number to the left of the colon separator can contain commas, then you could simply express this as:
string s = "695,000:14,306,000:12,136000:12,363000:6";
var parts = Regex.Split(s, #":|(?<=:\d+),");
The regex pattern, which identifies the separators, reads: "any colon, or any comma that follows a colon and a sequence of digits (but not another comma)".
A simple solution is split using : as delimiter. The resultant array will have numbers of the format [int],[int]. Parse through the array and split each entry using , as the delimiter. This will give you an array of [int] numbers.
It might not be the best way to do it and it might not work all the time but here's what I'd do.
string[] leftRightDoubles = str.Split(':');
foreach(string substring in leftRightDoubles){
string[] indivNumbers = str.Split(',');
//if indivNumbers.Length == 2, you know that these two are separate numbers
//if indivNumbers.Length > 2, use heuristics to determine which parts belong to which number
if(indivNumbers.Length > 2) {
for(int i = 0, i < indivNumbers.Length, i++) {
if(indivNumbers[i] != '000') { //Or use some other heuristic
//It's a new number
} else {
//It's the rest of previous number
}
}
}
}
//It's sort of pseudocode with comments (haven't touched C# in a while so I don't want to write full C# code)
Related
i want to know how big are the 2 numbers in the string. The difficult thing is that the 2 numbers are variables and i dont know how long they are, they are growing with time. The Program write the variables into a txt file. Now i read all lines of the file and put it into a string. After this i need the two numbers as an int to determine the two variables, to keep counting with them.
Example code:
int num1 = 0;
int num2 = 0;
string text = "";
//imagine foreach keypress B num1++ and foreach keypress N num2++
File.WriteAllLines("C:\\ExampleFile", $"Num1 = {num1}, Num2 = {num2}!");
// between this is a programm restart
text = File.ReadAllLines("C:\\ExampleFile");
//now iwant to get the value of num1 and num2
text.substring(6, num1.length) //something like this idk
num1 = ?
num2 = ?
Hope you will understand :)
Instead of using SubString, you could use Regular Expressions for the purpose. For example,
var regex = new Regex(#"=\s*(?<Number1>\d*),\s*Num2\s*=\s*(?<Number2>\d*)");
var matches = regex.Match(text);
if(matches.Success)
{
var num1 = Int32.Parse(matches.Groups["Number1"].Value);
var num2 = Int32.Parse(matches.Groups["Number2"].Value);
}
The regex indicates two groups (indicated by "(?[expression])"), the first consisting of a number preceeded by an '=' and whitespaces. The first group would be followed by a "," & whitespace characters. This is followed by the Second group, is preceeded by the text"Num2" and has the same definition has the first group.
Demo Code
You could print them to Strings using ToString(), then call String.Lenght. While string is not technically a char[] (even if it is propably wrapping one), it can be treated like one. However that might give you the wrong results. The string representation to print out to or parse from, is determined in large parts by the culture of the user as set in Windows.
And aside from differing thousands and decimal seperators, there are even different ideas how to group numbers between cultures - like the Indian Lakh and Crore.
If the question is about the number of digits in base 10, this would not work. However I remember a recent case, where someone figured out how many repeating digits the decimal part of a division had. I think this code would be a proper adaptation to your case:
int current = Input;
int Digits = 1;
while(current >= 10){
current = current / 10;
Digits++;
}
You could just split the string on your delimiter characters and use it as an array.
var input = $"Num1 = 123, Num2 = 456!"
//Results in the array { "Num1","123","Num2","456" }
var tokens = input.Split(new char[] { '=', ',', ' ', '!'}, StringSplitOptions.RemoveEmptyEntries);
var num1 = int.Parse(tokens[1]);
var num2 = int.Parse(tokens[3]);
I am receiving a string with numbers, nulls, and delimiters that are the same as characters in the numbers. Also there are quotes around numbers that contain a comma(s). With C#, I want to parse out the string, such that I have a nice, pipe delimited series of numbers, no commas, 2 decimal places.
I tried the standard replace, removing certain string patterns to clean it up but I can't hit every case. I've removed the quotes first, but then I get extra numbers as the thousands separator turns into a delimiter. I attempted to use Regex.Replace with wildcards but can't get anything out of it due to the multiple numbers with quotes and commas inside the quotes.
edit for Silvermind: temp = Regex.Replace(temp, "(?:\",.*\")","($1 = .\n)");
I don't have control over the file I receive. I can get most of the data cleaned up. It's when the string looks like the following, that there is a problem:
703.36,751.36,"1,788.36",887.37,891.37,"1,850.37",843.37,"1,549,797.36",818.36,749.36,705.36,0.00,"18,979.70",934.37
Should I look for the quote character, find the next quote character, remove commas from everything between those 2 chars, and move on? This is where I'm headed but there has to be something more elegant out there (yes - I don't program in C# that often - I'm a DBA).
I would like to see the thousands separator removed, and no quotes.
This regex pattern will match all of the individual numbers in your string:
(".*?")|(\d+(.\d+)?)
(".*?") matches things like "123.45"
(\d+(.\d+)?) matches things like 123.45 or 123
From there, you can do a simple search and replace on each match to get a "clean" number.
Full code:
var s = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
Regex r = new Regex("(\".*?\")|(\\d+(.\\d+)?)");
List<double> results = new List<double>();
foreach (Match m in r.Matches(s))
{
string cleanNumber = m.Value.Replace("\"", "");
results.Add(double.Parse(cleanNumber));
}
Console.WriteLine(string.Join(", ", results));
Output:
703.36, 751.36, 1788.36, 887.37, 891.37, 1850.37, 843.37, 1549797.36, 818.36, 749.36, 705.36, 0, 18979.7, 934.37
This would be simpler to solve with a parser type solution which keeps track of state. Regex is for regular text anytime you have context it gets hard to solve with regex. Something like this would work.
internal class Program
{
private static string testString = "703.36,751.36,\"1,788.36\",887.37,891.37,\"1,850.37\",843.37,\"1,549,797.36\",818.36,749.36,705.36,0.00,\"18,979.70\",934.37";
private static void Main(string[] args)
{
bool inQuote = false;
List<string> numbersStr = new List<string>();
int StartPos = 0;
StringBuilder SB = new StringBuilder();
for(int x = 0; x < testString.Length; x++)
{
if(testString[x] == '"')
{
inQuote = !inQuote;
continue;
}
if(testString[x] == ',' && !inQuote )
{
numbersStr.Add(SB.ToString());
SB.Clear();
continue;
}
if(char.IsDigit(testString[x]) || testString[x] == '.')
{
SB.Append(testString[x]);
}
}
if(SB.Length != 0)
{
numbersStr.Add(SB.ToString());
}
var nums = numbersStr.Select(x => double.Parse(x));
foreach(var num in nums)
{
Console.WriteLine(num);
}
Console.ReadLine();
}
}
i have incoming data that needs to be split into multiple values...ie.
2345\n564532\n345634\n234 234543\n1324 2435\n
The length is inconsistent when i receive it, the spacing is inconsistent when it is present, and i want to analyze the last 3 digits before each \n. how do i break off the string and turn it into a new string? like i said, this round, it may have 3 \n commands, next time, it may have 10, how do i create 3 new strings, analyze them, then destroy them before the next 10 come in?
string[] result = x.Split('\r');
result = x.Split(splitAtReturn, StringSplitOptions.None);
string stringToAnalyze = null;
foreach (string s in result)
{
if (s != "\r")
{
stringToAnalyze += s;
}
else
{
how do i analyze the characters here?
}
}
You could use the string.Split method. In particular I suggest to use the overload that use a string array of possible separators. This because splitting on the newline character poses an unique problem. In you example all the newline chars are simply a '\n', but for some OS the newline char is '\r\n' and if you can't rule out the possibility to have the twos in the same file then
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {"\n", "\r\n"}, StringSplitOptions.RemoveEmptyEntries);
Instead if your are certain that the file contains only the newline separator allowed by your OS then you could use
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
string[] result = test.Split(new string[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
The StringSplitOptions.RemoveEmptyEntries allows to capture a pair of consecutive newline or an ending newline as an empty string.
Now you can work on the array examining the last 3 digits of every string
foreach(string s in result)
{
// Check to have at least 3 chars, no less
// otherwise an exception will occur
int maxLen = Math.Min(s.Length, 3);
string lastThree = s.Substring(s.Length - maxLen, maxLen);
... work on last 3 digits
}
Instead, if you want to work only using the index of the newline character without splitting the original string, you could use string.IndexOf in this way
string test = "2345\n564532\n345634\n234 234543\n1324 2435\n";
int pos = -1;
while((pos = test.IndexOf('\n', pos + 1)) != -1)
{
if(pos < test.Length)
{
string last3part = test.Substring(pos - 3, 3);
Console.WriteLine(last3part);
}
}
string lines = "2345\n564532\n345634\n234 234543\n1324 2435\n";
var last3Digits = lines.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries)
.Select(line => line.Substring(line.Length - 3))
.ToList();
foreach(var my3digitnum in last3Chars)
{
}
last3Digits : [345, 532, 634, 543, 435]
This has been answered before, check this thread:
Easiest way to split a string on newlines in .NET?
An alternative way is using StringReader:
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Your answer is: theStringYouGot.Split('\n'); where you get an array of strings to do your processing for.
I need to get all numbers from a string like this:
"156234 something 567345 another thing 45789 anything"
The result should be a collection of numbers having:
156234, 567345, 45789
I tried #"\d+", but it will only give me 156234.
EDIT: The numbers are integers, however they can also occur like this "156234 something 567345 another thing 45789 anything2345". In this case I only need the integers i.e 156234, 567345, 45789 and not 156234, 567345, 45789,2345.
Also the integers which i dont want will always be preceed with a text for ex:anything2345.
Everything is ok with your regex, you just need to come through all the matches.
Regex regex = new Regex(#"\d+");
foreach (Match match in regex.Matches("156234 something 567345 another thing 45789 anything"))
{
Console.WriteLine(match.Value);
}
You want to split on the characters, not the digits. Use the capital D
string[] myStrings = Regex.Split(sentence, #"\D+");
Source
If 2345 should not be matched in your revised sample string (156234 something 567345 another thing 45789 anything2345), you could use Dima's solution but with the regex:
\b\d+\b
This assures that the number is surrounded by word boundaries.
This'z in java. You don't actually need a Regex.. just a normal replaceAll should do the trick for you! : For ex : You can strip off the Non-Digits, and then split and calculate the sum.
public static int getSumOfNumbers(String s) {
int sum = 0;
String x = s.replaceAll("\\D+", " ");
String[] a = x.split(" ");
for(int i = 0; i < a.length; i++)
sum += Integer.parseInt(a[i]);
System.out.println(x + " : and sum is : "+sum);
return sum;
}
I have a string which consists number of ordered terms separated by lines (\n) as it shown in the following example: (note, the string I have is an element of an array of string)
term 1
term 2
.......
.......
term n
I want to split a specific number of terms, let we say (1000) only and discard the rest of the terms. I'm trying the following code :
string[] training = traindocs[tr].Trim().Split('\n');
List <string> trainterms = new List<string>();
for (int i = 0; i < 1000; i++)
{
if (i >= training.Length)
break;
trainterms.Add(training[i].Trim().Split('\t')[0]);
}
Can I conduct this operation without using List or any other data structure? I mean just extract the specific number of the terms into the the Array (training) directly ?? thanks in advance.
How about LINQ? The .Take() extension method kind of seems to fit your bill:
List<string> trainterms = traindocs[tr].Trim().Split('\n').Take(1000).ToList();
According to MSDN you can use an overloaded version of the split method.
public string[] Split( char[] separator, int count,
StringSplitOptions options )
Parameters
separator Type: System.Char[] An array of Unicode characters that
delimit the substrings in this string, an empty array that contains no
delimiters, or null.
count Type: System.Int32 The maximum number of
substrings to return.
options Type: System.StringSplitOptions
StringSplitOptions.RemoveEmptyEntries to omit empty array elements
from the array returned; or StringSplitOptions.None to include empty
array elements in the array returned.
Return Value
Type: System.String[] An array whose elements contain the substrings
in this string that are delimited by one or more characters in
separator. For more information, see the Remarks section.
So something like so:
String str = "A,B,C,D,E,F,G,H,I";
String[] str2 = str.Split(new Char[]{','}, 5, StringSplitOptions.RemoveEmptyEntries);
System.Console.WriteLine(str2.Length);
System.Console.Read();
Would print: 5
EDIT:
Upon further investigation it seems that the count parameter just instructs when the splitting stops. The rest of the string will be kept in the last element.
So, the code above, would yield the following result:[0] = A, [1] = B, [2] = C, [3] = D, [4] = E,F,G,H,I, which is not something you seem to be after.
To fix this, you would need to do something like so:
String str = "A\nB\nC\nD\nE\nF\nG\nH\nI";
List<String> myList = str.Split(new Char[]{'\n'}, 5, StringSplitOptions.RemoveEmptyEntries).ToList<String>();
myList[myList.Count - 1] = myList[myList.Count - 1].Split(new Char[] { '\n' })[0];
System.Console.WriteLine(myList.Count);
foreach (String str1 in myList)
{
System.Console.WriteLine(str1);
}
System.Console.Read();
The code above will only retain the first 5 (in your case, 1000) elements. Thus, I think that Darin's solution might be cleaner, if you will.
If you want most efficient(fastest) way, you have to use overload of String.Split, passing total number of items required.
If you want easy way, use LINQ.