I found an error in my code, where the subtring is not work, it says "startIndex cannot be larger than the length of string"
static int MyIntegerParse(string possibleInt)
{
int i;
return int.TryParse(possibleInt.Substring(2), out i) ? i : 0;
}
I used the procedure here:
var parsed = File.ReadLines(filename)
.Select(line => line.Split(' ')
.Select(MyIntegerParse)
.ToArray())
.ToArray();
But I don't understand why it's error because I already used the substring before and it's work, can I ask for a help here? thnaks.
sample string:
10192 20351 30473 40499 50449 60234
10192 20207 30206 40203 50205 60226
10192 20252 30312 40376 50334 60252
Substring will fail when possibleInt contains fewer than two characters, so you should add that test to your code as well. I suspect that you Split call produces an empty string during some circumstances. This empty string is passed into your int-parser which then fails on the Substring call. So, you should probably do two things:
Get rid of empty strings in the splitting
Handle short or empty strings deliberately in your parsing code
Getting rid of empty strings is quite easy:
var parsed = File.ReadLines(filename)
.Select(line => line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.Select(MyIntegerParse)
.ToArray())
.ToArray();
Adding deliberate handling of empty strings can be done like so:
static int MyIntegerParse(string possibleInt)
{
if (string.IsNullOrEmpty(possibleInt) || possibleInt.Length < 2)
{
return 0;
}
int i;
return int.TryParse(possibleInt.Substring(2), out i) ? i : 0;
}
...or if you are a fan of compact and hard-to-read constructs:
static int MyIntegerParse(string possibleInt)
{
int i;
return (!string.IsNullOrEmpty(possibleInt)
&& possibleInt.Length >= 2
&& int.TryParse(possibleInt.Substring(2), out i)) ? i : 0;
}
No, I have chosen to return 0 when I get strings that are too short. In your case it might make more sense to return some other value, throw an exception or use a Debug.Assert statement.
The possibleInt string needs to be at least two characters long. When it isn't then you'll see the error that you've described.
Add this before your return statement and see if that helps you figure out what's going on:
Debug.Assert(!string.IsNullOrEmpty(possibleInt) && possibleInt.Length > 2);
When running in Debug mode this will throw an exception if the two cases above are not met.
You could also use a Code Contract like this:
Contract.Assert(!string.IsNullOrEmpty(possibleInt) && possibleInt.Length > 2);
You are getting this exception because you are trying to get the substring of a string starting at an index that is greater than the length of the string.
someString.Substring(x) will give you the substring of someString starting at position x in the string, and it is zero based. You are getting this exception because in this case 2 is outside the range of the particular strings length.
Stick a try catch around it, or a breakpoint and you will see the string that is causing this exception has a length less than 3.
The line you are attempting to parse is not that long. From the C# Specification on Substring:
The zero-based starting character position of a substring in this instance.
The string you are passing in either has 0 or 1 characters in it. You need to modify your code to handle such a situation.
EDIT: Additionally, you should be removing empty elements from your file using an overload of split:
.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntires)
Related
I am writing program to add numbers from string which will be seperated from delimeters
private static readonly char[] Separators = { ',', '\n', '/','#' };
public int Add(string numbers)
{
if (numbers.Equals(string.Empty))
{
return 0;
}
return numbers.Split(Separators).Select(int.Parse).Sum();
}
When i pass the following string to Add method //#\n2#3
Then i get below error Input string was not in a correct format.
I expect answer to be 5
By default, string.Split will create empty groups if two delimiters are right next to each other. For example "3,,4".Split(','); will produce an array with three elements ("3", empty string, and "4").
You can change this in one of two ways. The first (and probably simpler) is to have the Split ignore empty entries.
numbers.Split(Separators, StringSplitOptions.RemoveEmptyEntries)
Or you can use Where in Linq
numbers.Split(Separators).Where(x => x.Length > 0)
This will prevent elements with a blank string value reaching int.Parse. Of course, there are still other things you should do to validate your input before attempting to parse, but that's another topic.
using System;
using System.IO;
namespace Test_Arrays_and_Files
{
class Program
{
static void Main(string[] args)
{
string tFile = #"C:\Programming\GLO\DC\dcw.txt";
string read = File.ReadAllText(tFile);
string[] test = read.Split(',');
int[] ints = Array.ConvertAll(test, int.Parse);
Console.WriteLine(ints[0]);
}
}
}
Input Data:
Text File Contents:(1 value per line,)
35,
35,
40,
40,
40,
getting System.FormatException: Input string was not in correct format
please help and sorry for bad post I'm new here
There are two issues in your code,
Your string is ending with , which is creating empty record after the split. This is the reason you are getting the error.
Your delimiter could be $",{Environment.NewLine}" not only ','.
So to convert given string to int array, first Trim() the input string by , and then split by $",{Environment.NewLine}".
Like,
using System.Linq;
...
var result = str.Trim(',') //Remove leading and trailing comma(s), You can use `TrimEnd()` as well
.Split($",{Environment.NewLine}", StringSplitOptions.RemoveEmptyEntries) //Split by given delimiter.
.Select(int.Parse)
.ToArray(); //Convert string[] to int[]
Try online
You're only getting it because the file has a comma at the end, which means Split ends up churning out an empty string in the very last position. int.Parse will choke on the empty string
Plenty of ways you could solve it, one is to tell Split not to return you empties, by changing the split line of your code to:
string[] test = read.Split(new[]{','}, StringSplitOptions.RemoveEmptyEntries);
You could instead trim the comma off the end, but using the above approach would mean your parsing would survive a blank line in the middle of the file too so overall it's more robust
Generally when parsing strings it's more robust to use TryParse than Parse. TryParse takes in the number variable to set the result to and returns you a Boolean telling if the parsing succeeded
int[] ints = Array.ConvertAll(test, GetIntOrMinusOne);
//put a method helper
private int GetIntOrMinusOne(string s){
if(int.TryParse(s, out var t)
return t;
return -1;
}
For this we need to get a bit more involved with the ConvertAll call. Instead of telling ConvertAll to call a "method that converts a string to an int" like int.Parse, we need to write our own mini method that tries to parse and if it fails return something like -1, then nominate that as the method to call to do the conversion, not int.Parse
It's important to note that this would introduce -1 into the resulting int[] array wherever there was bad data in the string array.. In your later processing you would then do some check to avoid them (such as skipping them)
You can shorten that code above by turning the method into a lambda:
int[] ints = Array.ConvertAll(test, s => int.TryParse(s, out var t) ? t : -1);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
a lambda; "just the important parts of a method"
But I'm not sure that you'll have come across lambdas yet, judging by the style of the rest of the code
I am trying to parse a string into array and find a very concise approach.
string line = "[1, 2, 3]";
string[] input = line.Substring(1, line.Length - 2).Split();
int[] num = input.Skip(2)
.Select(y => int.Parse(y))
.ToArray();
I tried remove Skip(2) and I cannot get the array because of non-int string. My question is that what is the execution order of those LINQ function. How many times is Skip called here?
Thanks in advance.
The order is the order that you specify. So input.Skip(2) skips the first two strings in the array, so only the last remains which is 3. That can be parsed to an int. If you remove the Skip(2) you are trying to parse all of them. That doesn't work because the commas are still there. You have splitted by white-spaces but not removed the commas.
You could use line.Trim('[', ']').Split(','); and int.TryParse:
string line = "[1, 2, 3]";
string[] input = line.Trim('[', ']').Split(',');
int i = 0;
int[] num = input.Where(s => int.TryParse(s, out i)) // you could use s.Trim but the spaces don't hurt
.Select(s => i)
.ToArray();
Just to clarify, i have used int.TryParse only to make sure that you don't get an exception if the input contains invalid data. It doesn't fix anything. It would also work with int.Parse.
Update: as has been proved by Eric Lippert in the comment section using int.TryParse in a LINQ query can be harmful. So it's better to use a helper method that encapsulates int.TryParse and returns a Nullable<int>. So an extension like this:
public static int? TryGetInt32(this string item)
{
int i;
bool success = int.TryParse(item, out i);
return success ? (int?)i : (int?)null;
}
Now you can use it in a LINQ query in this way:
string line = "[1, 2, 3]";
string[] input = line.Trim('[', ']').Split(',');
int[] num = input.Select(s => s.TryGetInt32())
.Where(n => n.HasValue)
.Select(n=> n.Value)
.ToArray();
The reason it does not work unless you skip the first two lines is that these lines have commas after ints. Your input looks like this:
"1," "2," "3"
Only the last entry can be parsed as an int; the initial two will produce an exception.
Passing comma and space as separators to Split will fix the problem:
string[] input = line
.Substring(1, line.Length - 2)
.Split(new[] {',', ' '}, StringSplitOptions.RemoveEmptyEntries);
Note the use of StringSplitOptions.RemoveEmptyEntries to remove empty strings caused by both comma and space being used between entries.
I think it would be better you do it this way:
JsonConvert.DeserializeObject(line, typeof(List<int>));
you might try
string line = "[1,2,3]";
IEnumerable<int> intValues = from i in line.Split(',')
select Convert.ToInt32(i.Trim('[', ' ', ']'));
This has been asked a few different ways but I am debating on "my way" vs "your way" with another developer. Language is C#.
I want to parse a pipe delimited string where the first 2 characters of each chunk is my tag.
The rules. Not my rules but rules I have been given and must follow.
I can't change the format of the string.
This function will be called possibly many times so efficiency is key.
I need to keep is simple.
The input string and tag I am looking for may/will change during runtime.
Example input string: AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4
Example tag I may need value for: AB
I split string into an array based on delimiter and loop through the array each time the function is called. I then looked at the first 2 characters and return the value minus the first 2 characters.
The "other guys" way is to take the string and use a combination of IndexOf and SubString to find the starting point and ending point of the field I am looking for. Then using SubString again to pullout the value minus the first 2 characters. So he would say IndexOf("|AB") the find then next pipe in the string. This would be the start and end. Then SubString that out.
Now I should think that IndexOf and SubString would parse the string each time at a char by char level so this would be less efficient than using large chunks and reading the string minus the first 2 characters. Or is there another way the is better then what both of us has proposed?
The other guy's approach is going to be more efficient in time given that input string needs to be reevaluated each time. If the input string is long, it is also won't require the extra memory that splitting the string would.
If I'm trying to code a really tight loop I prefer to directly use array/string operators rather than LINQ to avoid that additional overhead:
string inputString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";
static string FindString(string tag)
{
int startIndex;
if (inputString.StartsWith(tag))
{
startIndex = tag.Length;
}
else
{
startIndex = inputString.IndexOf(string.Format("|{0}", tag));
if (startIndex == -1)
return string.Empty;
startIndex += tag.Length + 1;
}
int endIndex = inputString.IndexOf('|', startIndex);
if (endIndex == -1)
endIndex = inputString.Length;
return inputString.Substring(startIndex, endIndex - startIndex);
}
I've done a lot of parsing in C# and I would probably take the approach suggested by the "other guys" just because it would be a bit lighter on resources used and likely to be a little faster as well.
That said, as long as the data isn't too big, there's nothing wrong with the first approach and it will be much easier to program.
Something like this may work ok
string myString = "AOVALUE1|ABVALUE2|ACVALUE3|ADVALUE4";
string selector = "AB";
var results = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, ""));
Returns: list of the matches, in this case just one "VALUE2"
If you are just looking for the first or only match this will work.
string result = myString.Split('|').Where(x => x.StartsWith(selector)).Select(x => x.Replace(selector, "")).FirstOrDefault();
SubString does not parse the string.
IndexOf does parse the string.
My preference would be the Split method, primarily code coding efficiency:
string[] inputArr = input.Split("|".ToCharArray()).Select(s => s.Substring(3)).ToArray();
is pretty concise. How many LoC does the substring/indexof method take?
This is a spin-off from the discussion in some other question.
Suppose I've got to parse a huge number of very long strings. Each string contains a sequence of doubles (in text representation, of course) separated by whitespace. I need to parse the doubles into a List<double>.
The standard parsing technique (using string.Split + double.TryParse) seems to be quite slow: for each of the numbers we need to allocate a string.
I tried to make it old C-like way: compute the indices of the beginning and the end of substrings containing the numbers, and parse it "in place", without creating additional string. (See http://ideone.com/Op6h0, below shown the relevant part.)
int startIdx, endIdx = 0;
while(true)
{
startIdx = endIdx;
// no find_first_not_of in C#
while (startIdx < s.Length && s[startIdx] == ' ') startIdx++;
if (startIdx == s.Length) break;
endIdx = s.IndexOf(' ', startIdx);
if (endIdx == -1) endIdx = s.Length;
// how to extract a double here?
}
There is an overload of string.IndexOf, searching only within a given substring, but I failed to find a method for parsing a double from substring, without actually extracting that substring first.
Does anyone have an idea?
There is no managed API to parse a double from a substring. My guess is that allocating the string will be insignificant compared to all the floating point operations in double.Parse.
Anyway, you can save the allocation by creating a "buffer" string once of length 100 consisting of whitespace only. Then, for every string you want to parse, you copy the chars into this buffer string using unsafe code. You fill the buffer string with whitespace. And for parsing you can use NumberStyles.AllowTrailingWhite which will cause trailing whitespace to be ignored.
Getting a pointer to string is actually a fully supported operation:
string l_pos = new string(' ', 100); //don't write to a shared string!
unsafe
{
fixed (char* l_pSrc = l_pos)
{
// do some work
}
}
C# has special syntax to bind a string to a char*.
if you want to do it really fast, i would use a state machine
this could look like:
enum State
{
Separator, Sign, Mantisse etc.
}
State CurrentState = State.Separator;
int Prefix, Exponent, Mantisse;
foreach(var ch in InputString)
{
switch(CurrentState)
{ // set new currentstate in dependence of ch and CurrentState
case Separator:
GotNewDouble(Prefix, Exponent, Mantisse);
}
}