Extracting substring based on the same identifier in two locations

Extracting substring based on the same identifier in two locations - c#

I found this question, which achieves what I am looking for, however I only have one problem: the "start" and "end" of the substring are the same character.
My string is:
.0.label unicode "Area - 110"
and I want to extract the text between the inverted commas ("Area - 110").
In the linked question, the answers are all using specific identifiers, and IndexOf solutions. The problem is that if I do the same, IndexOf will likely return the same value.
Additionally, if I use Split methods, the text I want to keep is not a fixed length - it could be one word, it could be seven; so I am also having issues specifying the indexes of the first and last word in that collection as well.

The problem is that if I do the same, IndexOf will likely return the same value.
A common trick in this situation is to use LastIndexOf to find the location of the closing double-quote:
int start = str.IndexOf('"');
int end = str.LastIndexOf('"');
if (start >= 0 && end > start) {
// We have two separate locations
Console.WriteLine(str.Substring(start+1, end-start-1));
}
Demo.

I would to it like this:
string str = ".0.label unicode \"Area - 110\"";
str = input.SubString(input.IndexOf("\"") + 1);
str = input.SubString(0, input.IndexOf("\""));
In fact, this is one of my most used helper methods/extensions, because it is quite versatile:
/// <summary>
/// Isolates the text in between the parameters, exclusively, using invariant, case-sensitive comparison.
/// Both parameters may be null to skip either step. If specified but not found, a FormatException is thrown.
/// </summary>
public static string Isolate(this string str, string entryString, string exitString)
{
if (!string.IsNullOrEmpty(entryString))
{
int entry = str.IndexOf(entryString, StringComparison.InvariantCulture);
if (entry == -1) throw new FormatException($"String.Isolate failed: \"{entryString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(entry + entryString.Length);
}
if (!string.IsNullOrEmpty(exitString))
{
int exit = str.IndexOf(exitString, StringComparison.InvariantCulture);
if (exit == -1) throw new FormatException($"String.Isolate failed: \"{exitString}\" not found in string \"{str.Truncate(80)}\".");
str = str.Substring(0, exit);
}
return str;
}
You'd use that like this:
string str = ".0.label unicode \"Area - 110\"";
string output = str.Isolate("\"", "\"");

Related

How to do I cut off a certain part a String?

I have a big String in my program.
For Example:
String Newspaper = "...Blablabla... What do you like?...Blablabla... ";
Now I want to cut out the "What do you like?" an write it to a new String. But the problem is that the "Blablabla" is everytime something diffrent. Whit "cut out" I mean that you submit a start and a end word and all the things wrote between these lines should be in the new string. Because the sentence "What do you like?" changes sometimes except the start word "What" and the end word "like?"
Thanks for every responds

You can write the following method:
public static string CutOut(string s, string start, string end)
{
int startIndex = s.IndexOf(start);
if (startIndex == -1) {
return null;
}
int endIndex = s.IndexOf(end, startIndex);
if (endIndex == -1) {
return null;
}
return s.Substring(startIndex, endIndex - startIndex + end.Length);
}
It returns null if either the start or end pattern is not found. Only end patterns that follow the start pattern are searched for.
If you are working with C# 8+ and .NET Core 3.0+, you can also replace the last line with
return s[startIndex..(endIndex + end.Length)];
Test:
string input = "...Blablabla... What do you like?...Blablabla... ";
Console.WriteLine(CutOut(input, "What ", " like?"));
prints:
What do you like?
If you are happy with Regex, you can also write:
public static string CutOutRegex(string s, string start, string end)
{
Match match = Regex.Match(s, $#"\b{Regex.Escape(start)}.*{Regex.Escape(end)}");
if (match.Success) {
return match.Value;
}
return null;
}
The \b ensures that the start pattern is only found at the beginning of a word. You can drop it if you want. Also, if the end pattern occurs more than once, the result will include all of them unlike the first example with IndexOf which will only include the first one.

You have to do a substring, like the example below. See source for more information on substrings.
// A long string
string bio = "Mahesh Chand is a founder of C# Corner. Mahesh is also an
author, speaker, and software architect. Mahesh founded C# Corner in
2000.";
// Get first 12 characters substring from a string
string authorName = bio.Substring(0, 12);
Console.WriteLine(authorName);

In this case I would do it like this, cut the first part and then the second and concatenate with the fixed words using them as a parameter for cutting.
public string CutPhrase(string phrase)
{
var fst = "What";
var snd = "like?";
string[] cut1 = phrase.Split(new[] { fst }, StringSplitOptions.None);
string[] cut2 = cut1[1].Split(new[] { snd }, StringSplitOptions.None);
var rst = $"{fst} {cut2[0]} {snd}";
return rst;
}

Detecting newline in string and adding a character before it

Hi I have the folowing string:
* lalalalalaal
* 12121212121212
* 36363636363636
* 21454545454545454
every line of the list start with - "\r\n* "
is there a way to detect the "\r\n* " symbol at the beginning and maybe replace it with numbers 1, 2, 3, ...n. So in example something like this:
1. lalalalalaal
2. 12121212121212
3. 36363636363636
4. 21454545454545454
I imagine building an array and running the for loop would be required but i do not get my head around where I am supposed to start.

If I understand you correctly, you have a string that looks like this:
"\r\n* lalalalalaal\r\n* 12121212121212\r\n* 36363636363636\r\n* 21454545454545454"
And you want to replace "\r\n*" with "\r\n1.", where the number 1 increments each time the search string is found.
If so, here's one way to do it: Use the IndexOf method to find the location of the string you're searching for, keep a counter variable that increments every time you find the search term, and then use Substring to get the sub strings before and after the part to replace ('*'), and then put the counter's value between them:
static string ReplaceWithIncrementingNumber(string input, string find, string partToReplace)
{
if (input == null || find == null ||
partToReplace == null || !find.Contains(partToReplace))
{
return input;
}
// Get the index of the first occurrence of our 'find' string
var index = input.IndexOf(find);
// Track the number of occurrences we've found, to use as a replacement string
var counter = 1;
while (index > -1)
{
// Get the leading string up to '*', add the counter, then add the trailing string
input = input.Substring(0, index) +
find.Replace(partToReplace, $"{counter++}.") +
input.Substring(index + find.Length);
// Find the next occurrence of our 'find' string
index = input.IndexOf(find, index + find.Length);
}
return input;
}
Here's a sample using your input string:
static void Main()
{
var input = "\r\n* lalalalalaal\r\n* 12121212121212\r\n* " +
"36363636363636\r\n* 21454545454545454";
Console.WriteLine(ReplaceWithIncrementingNumber(input, "\r\n*", "*"));
GetKeyFromUser("\nDone! Press any key to exit...");
}
Output

You can use Linq and String.Replace to achieve it.(Believe you already have your strings as List as mentioned in second part of your OP)
var result = list.Select((x,index)=> $"{index+1}.{x.Replace("\r\n* ",string.Empty)}");
In case, you do not have it as list, then you can split your string as
var result = str.Split(new string[]{Environment.NewLine},StringSplitOptions.RemoveEmptyEntries)
.Select((x,index)=> $"{index+1}.{x.Replace("* ",string.Empty)}");

Splitting data with inconsistent delimiters

I have these data files comming in on a server that i need to split into [date time] and [value]. Most of them are delimited a single time between time and value and between date and time is a space. I already have a program processing the data with a simple split(char[]) but now found data where the delimiter is a space and i am wondering how to tackle this best.
So most files i encountered look like this:
18-06-2014 12:00:00|220.6
The delimiters vary, but i tackled that with a char[]. But today i ran into a problem on this format:
18-06-2014 12:00:00 220.6
This complicates things a little. The easy solution would be to just add a space to my split characters and when i find 3 splits combine the first two before processing?
I'm looking for a 2nd opining on this matter. Also the time format can change to something like d/m/yy and the amount of lines can run into the millions so i would like to keep it as efficient as possible.

Yes I believe the most efficient solution is to add space as a delimiter and then just combine the first two if you get three. That is going to be be more efficient than regex.

You've got a string 18-06-2014 12:00:00 220.6 where first 19 characters is a date, one character is a separation symbol and other characters is a value. So:
var test = "18-06-2014 12:00:00|220.6";
var dateString = test.Remove(19);
var val = test.Substring(20);
Added normalization:
static void Main(string[] args) {
var test = "18-06-2014 12:00:00|220.6";
var test2 = "18-6-14 12:00:00|220.6";
var test3 = "8-06-14 12:00:00|220.6";
Console.WriteLine(test);
Console.WriteLine(TryNormalizeImportValue(test));
Console.WriteLine(test2);
Console.WriteLine(TryNormalizeImportValue(test2));
Console.WriteLine(test3);
Console.WriteLine(TryNormalizeImportValue(test3));
}
private static string TryNormalizeImportValue(string value) {
var valueSplittedByDateSeparator = value.Split('-');
if (valueSplittedByDateSeparator.Length < 3) throw new InvalidDataException();
var normalizedDay = NormalizeImportDayValue(valueSplittedByDateSeparator[0]);
var normalizedMonth = NormalizeImportMonthValue(valueSplittedByDateSeparator[1]);
var valueYearPartSplittedByDateTimeSeparator = valueSplittedByDateSeparator[2].Split(' ');
if (valueYearPartSplittedByDateTimeSeparator.Length < 2) throw new InvalidDataException();
var normalizedYear = NormalizeImportYearValue(valueYearPartSplittedByDateTimeSeparator[0]);
var valueTimeAndValuePart = valueYearPartSplittedByDateTimeSeparator[1];
return string.Concat(normalizedDay, '-', normalizedMonth, '-', normalizedYear, ' ', valueTimeAndValuePart);
}
private static string NormalizeImportDayValue(string value) {
return value.Length == 2 ? value : "0" + value;
}
private static string NormalizeImportMonthValue(string value) {
return value.Length == 2 ? value : "0" + value;
}
private static string NormalizeImportYearValue(string value) {
return value.Length == 4 ? value : DateTime.Now.Year.ToString(CultureInfo.InvariantCulture).Remove(2) + value;
}

Well you can use this one to get the date and the value.
(((0[1-9]|[12][0-9]|3[01])-(0[1-9]|1[012])-(19|20)\d\d)\s((\d{2}:?){3})|(\d+\.?\d+))
This will give you 2 matches
1º 18-06-2014 12:00:00
2º 220.6
Example:
http://regexr.com/391d3

This regex matches both kinds of strings, capturing the two tokens to Groups 1 and 2.
Note that we are not using \d because in .NET it can match any Unicode digits such as Thai...
The key is in the [ |] character class, which specifies your two allowable delimiters
Here is the regex:
^([0-9]{2}-[0-9]{2}-[0-9]{4} (?:[0-9]{2}:){2}[0-9]{2})[ |]([0-9]{3}\.[0-9])$
In the demo, please pay attention to the capture Groups in the right pane.
Here is how to retrieve the values:
var myRegex = new Regex(#"^([0-9]{2}-[0-9]{2}-[0-9]{4} (?:[0-9]{2}:){2}[0-9]{2})[ |]([0-9]{3}\.[0-9])$", RegexOptions.IgnoreCase);
string mydate = myRegex.Match(s1).Groups[1].Value;
Console.WriteLine(mydate);
string myvalue = myRegex.Match(s1).Groups[1].Value;
Console.WriteLine(myvalue);
Please let me know if you have questions

Given the provided format I'd use something like
char delimiter = ' '; //or whatever the delimiter for the specific file is, this can be set in a previous step
int index = line.LastIndexOf(delimiter);
var date = line.Remove(index);
var value = line.Substring(++index);
If there are that many lines and efficiency matters, you could obtain the delimiter once on the first line, by looping back from the end and find the first index that is not a digit or dot (or comma if the value can contain those) to determine the delimiter, and then use something such as the above.
If each line can contain a different delimiter, you could always track back to the first not value char as described above and still maintain adequate performance.
Edit: for completeness sake, to find the delimiter, you could perform the following once per file (provided that the delimiter stays consistent within the file)
char delimiter = '\0';
for (int i = line.Length - 1; i >= 0; i--)
{
var c= line[i];
if (!char.IsDigit(c) && c != '.')
{
delimiter = c;
break;
}
}

Getting substring between two separators in an arbitrary position

I have following string:
string source = "Test/Company/Business/Department/Logs.tvs/v1";
The / character is the separator between various elements in the string. I need to get the last two elements of the string. I have following code for this purpose. This works fine. Is there any faster/simpler code for this?
CODE
static void Main()
{
string component = String.Empty;
string version = String.Empty;
string source = "Test/Company/Business/Department/Logs.tvs/v1";
if (!String.IsNullOrEmpty(source))
{
String[] partsOfSource = source.Split('/');
if (partsOfSource != null)
{
if (partsOfSource.Length > 2)
{
component = partsOfSource[partsOfSource.Length - 2];
}
if (partsOfSource.Length > 1)
{
version = partsOfSource[partsOfSource.Length - 1];
}
}
}
Console.WriteLine(component);
Console.WriteLine(version);
Console.Read();
}

Why no regular expression? This one is fairly easy:
.*/(?<component>.*)/(?<version>.*)$
You can even label your groups so for your match all you need to do is:
component = myMatch.Groups["component"];
version = myMatch.Groups["version"];

The following should be faster, as it only scans as much of the string as it needs to to find two / and it doesn't bother splitting up the whole string:
string component = "";
string version = "";
string source = "Test/Company/Business/Department/Logs.tvs/v1";
int last = source.LastIndexOf('/');
if (last != -1)
{
int penultimate = source.LastIndexOf('/', last - 1);
version = source.Substring(last + 1);
component = source.Substring(penultimate + 1, last - penultimate - 1);
}
That said, as with all performance questions: profile! Try the two side-by-side with a big list of real-life inputs and see which is fastest.
(Also, this will leave empty strings rather than throw an exception if there is no slash in the input... but throw if source is null, lazy me.)

Your approach is the most suitable one given that your are looking for substrings at a particular index. A LINQ expression to do the same in this case will likely not improve the code or its readability.
For reference, there is some great information from Microsoft here on working with strings and LINQ. In particular see the article here which covers some examples with both LINQ and RegEx.
EDIT: +1 For Matt's named group within RegEx approach... that's the nicest solution I've seen.

Your code mostly looks fine. A couple of points to note:
String.Split() will never return null, so you don't need the null check on it.
If the source string has fewer than two / characters, how would you deal with that? (The Original Post was updated to address this)
Do you really want to just output empty strings if your source string is null or empty (or invalid)? If you have specific expectations about the nature of the input, you may want to consider failing fast when those expectations are not met.

You could try something like this but I doubt it would be much faster. You could do some meassurements with System.Diagnostics.StopWatch to see if you feel the need.
string source = "Test/Company/Business/Department/Logs.tvs/v1";
int index1 = source.LastIndexOf('/');
string last = source.Substring(index1 + 1);
string substring = source.Substring(0, index1);
int index2 = substring.LastIndexOf('/');
string secondLast = substring.Substring(index2 + 1);

I would try
string source = "Test/Company/Business/Department/Logs.tvs/v1";
var components = source.Split('/').Reverse().Take(2);
String last = string.Empty;
var enumerable = components as string[] ?? components.ToArray();
if (enumerable.Count() == 2)
last = enumerable.FirstOrDefault();
var secondLast = enumerable.LastOrDefault();
Hope this will help

you can retrieve the last two words using the process as below:
string source = "Test/Company/Business/Department/Logs.tvs/v1";
String[] partsOfSource = source.Split('/');
if(partsOfSourch.length>2)
for(int i=partsOfSourch.length-2;i<=partsOfSource.length-1;i++)
console.writeline(partsOfSource[i]);

How to extract string at a certain character that is repeated within string?

How can I get "MyLibrary.Resources.Images.Properties" and "Condo.gif" from a "MyLibrary.Resources.Images.Properties.Condo.gif" string.
I also need it to be able to handle something like "MyLibrary.Resources.Images.Properties.legend.House.gif" and return "House.gif" and "MyLibrary.Resources.Images.Properties.legend".
IndexOf LastIndexOf wouldn't work because I need the second to last '.' character.
Thanks in advance!
UPDATE
Thanks for the answers so far but I really need it to be able to handle different namespaces. So really what I'm asking is how to I split on the second to last character in a string?

You can use LINQ to do something like this:
string target = "MyLibrary.Resources.Images.Properties.legend.House.gif";
var elements = target.Split('.');
const int NumberOfFileNameElements = 2;
string fileName = string.Join(
".",
elements.Skip(elements.Length - NumberOfFileNameElements));
string path = string.Join(
".",
elements.Take(elements.Length - NumberOfFileNameElements));
This assumes that the file name part only contains a single . character, so to get it you skip the number of remaining elements.

You can either use a Regex or String.Split with '.' as the separator and return the second-to-last + '.' + last pieces.

You can look for IndexOf("MyLibrary.Resources.Images.Properties."), add that to MyLibrary.Resources.Images.Properties.".Length and then .Substring(..) from that position

If you know exactly what you're looking for, and it's trailing, you could use string.endswith. Something like
if("MyLibrary.Resources.Images.Properties.Condo.gif".EndsWith("Condo.gif"))
If that's not the case check out regular expressions. Then you could do something like
if(Regex.IsMatch("Condo.gif"))
Or a more generic way: split the string on '.' then grab the last two items in the array.

string input = "MyLibrary.Resources.Images.Properties.legend.House.gif";
//if string isn't already validated, make sure there are at least two
//periods here or you'll error out later on.
int index = input.LastIndexOf('.', input.LastIndexOf('.') - 1);
string first = input.Substring(0, index);
string second = input.Substring(index + 1);

Try splitting the string into an array, by separating it by each '.' character.
You will then have something like:
{"MyLibrary", "Resources", "Images", "Properties", "legend", "House", "gif"}
You can then take the last two elements.

Just break down and do it in a char loop:
int NthLastIndexOf(string str, char ch, int n)
{
if (n <= 0) throw new ArgumentException();
for (int idx = str.Length - 1; idx >= 0; --idx)
if (str[idx] == ch && --n == 0)
return idx;
return -1;
}
This is less expensive than trying to coax it using string splitting methods and isn't a whole lot of code.
string s = "1.2.3.4.5";
int idx = NthLastIndexOf(s, '.', 3);
string a = s.Substring(0, idx); // "1.2"
string b = s.Substring(idx + 1); // "3.4.5"

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extracting substring based on the same identifier in two locations - c#

Related

How to do I cut off a certain part a String?

Detecting newline in string and adding a character before it

Splitting data with inconsistent delimiters

Getting substring between two separators in an arbitrary position

How to extract string at a certain character that is repeated within string?

Categories

Resources