How to Split in C#? - c#

I am using code for splitting :
columnVal = val.Contains(":") ? val.Split(':')[1] : val;
Example #2 Approval Level: Formulator I am getting result Formulator
As per expected ...But
Now having issue when date and time is there
Example #2 Approval Date: 11/18/2015 3:53:22 PM
Result 11/18/2015 3
I am looking for whole string 11/18/2015 3:53:22 PM
What to do in this ? Please help me

You need this overload of string.Split, where you can specify the maximum number of substrings to return:
columnVal = val.Contains(":") ? val.Split(new [] {':'}, 2)[1] : val;
So it will stop splitting after the first :.

If you only want the string split once, rather than at every possible symbol match, use string.SubString and string.IndexOf.
var index = val.IndexOf(':') + 1;
columnVal = (index < val.Length && index >= 0)
? val.Substring(index)
: val;

You need to find the first occurrence of : and take the rest of the string using string.Substring after that in order to get your desired output.
columnVal = val.Contains(":") ? val.Substring(val.IndexOf(':') + 1).Trim() : val;

string parseTarget = "Example #2 Approval Date: 11/18/2015 3:53:22 PM"
int colonPosition = parseTarget.IndexOf(":");
if (colonPosition == -1)
{
throw new Exception("Missing Delimeter");
}
string Value = parseTarget.Substring(colonPosition + 1, parseTarget.Length - colonPosition - 1);
console.WriteLine(Value);

Rather than using String.Split / String.Contains, I would advise using Regex.Split. It's more flexible in that you can specify a maximum amount of positions and different splitting parameters. On top of that it's shorter and more readable than the other answers provided, imo. See Regex documentation
Code snippet for you:
Regex rgx = new Regex(":");
string[] result = rgx.Split(val, 1);
string date = result[1];
Alternatively, you can replace the last line with the following snippet if there will ALWAYS be a split:
string date = rgx.Split(val, 1)[1];

Related

Splitting data with inconsistent delimiters

I have these data files comming in on a server that i need to split into [date time] and [value]. Most of them are delimited a single time between time and value and between date and time is a space. I already have a program processing the data with a simple split(char[]) but now found data where the delimiter is a space and i am wondering how to tackle this best.
So most files i encountered look like this:
18-06-2014 12:00:00|220.6
The delimiters vary, but i tackled that with a char[]. But today i ran into a problem on this format:
18-06-2014 12:00:00 220.6
This complicates things a little. The easy solution would be to just add a space to my split characters and when i find 3 splits combine the first two before processing?
I'm looking for a 2nd opining on this matter. Also the time format can change to something like d/m/yy and the amount of lines can run into the millions so i would like to keep it as efficient as possible.
Yes I believe the most efficient solution is to add space as a delimiter and then just combine the first two if you get three. That is going to be be more efficient than regex.
You've got a string 18-06-2014 12:00:00 220.6 where first 19 characters is a date, one character is a separation symbol and other characters is a value. So:
var test = "18-06-2014 12:00:00|220.6";
var dateString = test.Remove(19);
var val = test.Substring(20);
Added normalization:
static void Main(string[] args) {
var test = "18-06-2014 12:00:00|220.6";
var test2 = "18-6-14 12:00:00|220.6";
var test3 = "8-06-14 12:00:00|220.6";
Console.WriteLine(test);
Console.WriteLine(TryNormalizeImportValue(test));
Console.WriteLine(test2);
Console.WriteLine(TryNormalizeImportValue(test2));
Console.WriteLine(test3);
Console.WriteLine(TryNormalizeImportValue(test3));
}
private static string TryNormalizeImportValue(string value) {
var valueSplittedByDateSeparator = value.Split('-');
if (valueSplittedByDateSeparator.Length < 3) throw new InvalidDataException();
var normalizedDay = NormalizeImportDayValue(valueSplittedByDateSeparator[0]);
var normalizedMonth = NormalizeImportMonthValue(valueSplittedByDateSeparator[1]);
var valueYearPartSplittedByDateTimeSeparator = valueSplittedByDateSeparator[2].Split(' ');
if (valueYearPartSplittedByDateTimeSeparator.Length < 2) throw new InvalidDataException();
var normalizedYear = NormalizeImportYearValue(valueYearPartSplittedByDateTimeSeparator[0]);
var valueTimeAndValuePart = valueYearPartSplittedByDateTimeSeparator[1];
return string.Concat(normalizedDay, '-', normalizedMonth, '-', normalizedYear, ' ', valueTimeAndValuePart);
}
private static string NormalizeImportDayValue(string value) {
return value.Length == 2 ? value : "0" + value;
}
private static string NormalizeImportMonthValue(string value) {
return value.Length == 2 ? value : "0" + value;
}
private static string NormalizeImportYearValue(string value) {
return value.Length == 4 ? value : DateTime.Now.Year.ToString(CultureInfo.InvariantCulture).Remove(2) + value;
}
Well you can use this one to get the date and the value.
(((0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[012])-(19|20)\d\d)\s((\d{2}:?){3})|(\d+\.?\d+))
This will give you 2 matches
1º 18-06-2014 12:00:00
2º 220.6
Example:
http://regexr.com/391d3
This regex matches both kinds of strings, capturing the two tokens to Groups 1 and 2.
Note that we are not using \d because in .NET it can match any Unicode digits such as Thai...
The key is in the [ |] character class, which specifies your two allowable delimiters
Here is the regex:
^([0-9]{2}-[0-9]{2}-[0-9]{4} (?:[0-9]{2}:){2}[0-9]{2})[ |]([0-9]{3}\.[0-9])$
In the demo, please pay attention to the capture Groups in the right pane.
Here is how to retrieve the values:
var myRegex = new Regex(#"^([0-9]{2}-[0-9]{2}-[0-9]{4} (?:[0-9]{2}:){2}[0-9]{2})[ |]([0-9]{3}\.[0-9])$", RegexOptions.IgnoreCase);
string mydate = myRegex.Match(s1).Groups[1].Value;
Console.WriteLine(mydate);
string myvalue = myRegex.Match(s1).Groups[1].Value;
Console.WriteLine(myvalue);
Please let me know if you have questions
Given the provided format I'd use something like
char delimiter = ' '; //or whatever the delimiter for the specific file is, this can be set in a previous step
int index = line.LastIndexOf(delimiter);
var date = line.Remove(index);
var value = line.Substring(++index);
If there are that many lines and efficiency matters, you could obtain the delimiter once on the first line, by looping back from the end and find the first index that is not a digit or dot (or comma if the value can contain those) to determine the delimiter, and then use something such as the above.
If each line can contain a different delimiter, you could always track back to the first not value char as described above and still maintain adequate performance.
Edit: for completeness sake, to find the delimiter, you could perform the following once per file (provided that the delimiter stays consistent within the file)
char delimiter = '\0';
for (int i = line.Length - 1; i >= 0; i--)
{
var c= line[i];
if (!char.IsDigit(c) && c != '.')
{
delimiter = c;
break;
}
}

Split a string at 2 points

I have a file called file_test1.txt and I want to extract just test1 from the name and place it in a string. Whats the best way of doing this?
E.g.
string fullfile = #"C:\file_test1.txt";
string section = [test1] from fullfile; // <- expected result
I want to be able to split on 'file_' and '.txt' as the 'test1' section could be larger or smaller however the 'file_' and '.txt' will always be the same.
Try Path.GetFileNameWithoutExtension(fullfile).Substring(5) (or Substring("TEMPLATE_PREFIX".Length))
You can try spilt
var test = Path.GetFileNameWithoutExtension(fullfile).split('_')[1];
Try following
string fullfile = #"C:\file_test1.txt";
var name = fullfile.Substring(8,fullfile.Length-12)
As c:\file_ and .txt are fixed, You can take Substring starting at index 8 (skip leading name), upto length of total string length - 12 (12 => length of leading name, and trailing extension)
Thought I'd give a solution that uses Split and handles files with multiple underscores:
string.Join("_", Path.GetFileNameWithoutExtension(file).Split('_').Skip(1));
String.Split() works quite well for my uses:
http://msdn.microsoft.com/en-us/library/b873y76a.aspx
Obviously many ways to accomplish this. Here's yet another approach:
string fullfile = #"C:\file_test1.txt";
int index1 = fullfile.LastIndexOf("file_");
if (index1 != -1)
{
int index2 = fullfile.IndexOf(".", index1);
if (index2 != -1)
{
string section = fullfile.Substring(index1 + 5, index2 - index1 - 5);
}
}
You could also get "test1", or any subsequent filename (assuming your file naming convention remains constant!) using this regular expression:
var defaultRegex = new Regex(#"(?<=_).*(?=.txt)");
var matches = defaultRegex.Matches(fullfile);
var match = matches[0].Value;
The regular expression:
(?<=_).*(?=.txt)
uses positive look behind to find text preceded by '_', and also positive lookahead to find text which has '.txt' ahead of it.

get all characters to right of last dash

I have the following:
string test = "9586-202-10072"
How would I get all characters to the right of the final - so 10072. The number of characters is always different to the right of the last dash.
How can this be done?
You can get the position of the last - with str.LastIndexOf('-'). So the next step is obvious:
var result = str.Substring(str.LastIndexOf('-') + 1);
Correction:
As Brian states below, using this on a string with no dashes will result in the original string being returned.
You could use LINQ, and save yourself the explicit parsing:
string test = "9586-202-10072";
string lastFragment = test.Split('-').Last();
Console.WriteLine(lastFragment);
I can see this post was viewed over 46,000 times. I would bet many of the 46,000 viewers are asking this question simply because they just want the file name... and these answers can be a rabbit hole if you cannot make your substring verbatim using the at sign.
If you simply want to get the file name, then there is a simple answer which should be mentioned here. Even if it's not the precise answer to the question.
result = Path.GetFileName(fileName);
see https://msdn.microsoft.com/en-us/library/system.io.path.getfilename(v=vs.110).aspx
string tail = test.Substring(test.LastIndexOf('-') + 1);
YourString.Substring(YourString.LastIndexOf("-"));
With the latest C# 8 and later you can use Range Indexer as follows:-
string test = "9586-202-10072"
var foo = test?[(test.LastIndexOf('-') + 1)..];
// foo is => 10072
string atest = "9586-202-10072";
int indexOfHyphen = atest.LastIndexOf("-");
if (indexOfHyphen >= 0)
{
string contentAfterLastHyphen = atest.Substring(indexOfHyphen + 1);
Console.WriteLine(contentAfterLastHyphen );
}
See String.lastIndexOf method
I created a string extension for this, hope it helps.
public static string GetStringAfterChar(this string value, char substring)
{
if (!string.IsNullOrWhiteSpace(value))
{
var index = value.LastIndexOf(substring);
return index > 0 ? value.Substring(index + 1) : value;
}
return string.Empty;
}
test.Substring[(test.LastIndexOf('-') + 1)..]
C# 8 (late 2019) introduces range operator and simplifies it a bit further. The two dots here means from the index (inclusive) till the end of string.
test.Substring(test.LastIndexOf("-"))
and... in case you need the left part of a string:
private string AllTheLeftPart(string theString)
{
string rightPart = theString.Substring(theString.LastIndexOf('-') + 1);
string leftPart theString.Replace("-" + rightPart, String.Empty);
return leftPart ;
}

how to place - in a string

I have a string "8329874566".
I want to place - in the string like this "832-98-4566"
Which string function can I use?
I would have done something like this..
string value = "8329874566";
value = value.Insert(6, "-").Insert(3, "-");
You convert it to a number and then format the string.
What I like most about this is it's easier to read/understand what's going on then using a few substring methods.
string str = "832984566";
string val = long.Parse(str).ToString("###-##-####");
There may be a tricky-almost-unreadable regex solution, but this one is pretty readable, and easy.
The first parameter of the .Substring() method is where you start getting the characters, and the second is the number of characters you want to get, and not giving it sets a default as value.length -1 (get chars until the end of the string):
String value = "8329874566";
String Result = value.Substring(0,3) + "-" + value.Substring(3,2) + "-" + value.Substring(6);
--[edit]--
Just noticed you didn't use one of the numbers AT ALL (number '7') in the expected result example you gave, but if you want it, just change the last substring as "5", and if you want the '7' but don't want 5 numbers in the last set, let it like "5,4".
Are you trying to do this like American Social Security numbers? I.e., with a hyphen after the third and and fifth numerals? If so:
string s = "8329874566";
string t = String.Format("{0}-{1}-{2}", s.Substring(0, 3), s.Substring(3, 2), s.Substring(5));
Just out of completeness, a regular expression variant:
Regex.Replace(s, #"(\d{3})(\d{2})(\d{4})", "$1-$2-$3");
I consider the Insert variant to be the cleanest, though.
This works fine, and I think that is more clear:
String value = "8329874566";
value = value.Insert(3, "-").Insert(6, "-");
The console outputs shows this:
832-98-74566
If the hyphens are to go in the same place each time, then you could simply concatenate together the pieces of the orginal string like this:
// 0123456789 <- index
string number = "8329874566";
string new = number.Substring(0, 3) + "-" + number.Substring(3, 2) + "-" + number.Substring(5);
For a general way of making mutable strings, use the StringBuilder class. This allows deletions and insertions to be made before calling ToString to produce the final string.
You could try the following:
string strNumber = "8329874566"
string strNewNumber = strNumber.Substring(0,3) + "-" + strNumber.Substring(4,2) + "-" strNumber.Substring(6)
or something in this manner
string val = "832984566";
string result = String.Format("{0}-{1}-{2}", val.Substring(0,3), val.Substring(3,2), val.Substring(5,4));
var result = string.Concat(value.Substring(0,3), "-", value.Substring(3,2), "-", value.Substring(5,4));
or
var value = "8329874566".Insert(3, "-").Insert(6, "-");
Now how about this for a general solution?
// uglified code to fit within horizontal limits
public static string InsertAtIndices
(this string original, string insertion, params int[] insertionPoints) {
var mutable = new StringBuilder(original);
var validInsertionPoints = insertionPoints
.Distinct()
.Where(i => i >= 0 && i < original.Length)
.OrderByDescending(i => i);
foreach (int insertionPoint in validInsertionPoints)
mutable.Insert(insertionPoint, insertion);
return mutable.ToString();
}
Usage:
string ssn = "832984566".InsertAtIndices("-", 3, 5);
string crazy = "42387542342309856340924803"
.InsertAtIndices(":", 1, 2, 3, 4, 5, 6, 17, 200, -1, -1, 2, 3, 3, 4);
Console.WriteLine(ssn);
Console.WriteLine(crazy);
Output:
832-98-4566
4:2:3:8:7:5:42342309856:340924803
Overkill? Yeah, maybe...
P.S. Yes, I am regex illiterate--something I hope to rectify someday.
A straightforward (but not flexible) approach would be looping over the characters of the string while keeping a counter running. You can then construct a new string character by character. You can add the '-' character after the 3rd and 5th character.
A better approach may be to use a function to insert a single character in the middle of the string at a specific index. String.Insert() would do well. The only thing to pay attention to here is that the string indexes will get off by one with each insert.
EDIT more language-specific as per comments

Help me delete the last three chars of any string please!

Test string:
the%20matrix%20
How can I delete the last three chars? Using this code gives me an out of index exception:
y = y.Substring(y.Length - 4, y.Length - 1);
Seems this isn't your REAL problem; if you want to remove that "%20", you should use:
string test = "the%20matrix%20";
string clean = HttpUtility.UrlDecode(test);
if (clean.Length > 2) // if you still want to strip last chars...
clean = clean.Substring(0, clean.Length - 3);
As dalovega said, you need the first parameter of Substring to be 0 and the second Length - 3. As an alternative:
if(y.Length >= 3)
{
y = y.Remove(y.Length - 3)
}
You want
y.Substring(0, y.Length-4)
If you want to delete the last three characters, you need the first parameter of your Substring method to be zero.
You need to check that the string is at least 3 characters long first.
if (y.Length > 2)
{
}
As others have said the version of Substring you want parameters are startIndex and length.
Though what do you want to do with 1 or 2 character strings?
I found this post from a search I was looking for. I had a delimiter I was building with a string builder, and concat and wanted to remove the last delimiter.
var delim = "{somedelimiter}";
var sb = new StringBuilder();
//concat the values into one string
foreach (var val in values)
{
sb.Append(val);
sb.Append(delim);
}
var finalValue = sb.ToString();
finalValue = finalValue.Remove(finalValue.Length - delim.Length);
string.Remove(string.LastIndexOf(" stringTo "));

Categories