Find all occurrences of substrings matching a pattern

Find all occurrences of substrings matching a pattern - c#

I would like to use C# to extract from a single string all occurrences of sub-strings with pattern which is: white space followed by any text.
So for example if I have a string “This is a very short sentence” then I want to be able to obtain 5 strings:
“is a very short sentence”
“a very short sentence”
“very short sentence”
“short sentence”
“sentence”
From the example above sub-strings should not include leading white space. Also being able to access each obtained string by index would be great.
I tried to use regex but I was unable to bypass first match.
Please help

Using Split and some Linq:
string text2 = "This is a very short sentence";
// Get all words except first one
var parts = text2.Split(' ').Skip(1);
// Generate various combinations
var result = Enumerable.Range(0, parts.Count())
.Select(i => string.Join(" ", parts.Skip(i)));

Make a try with looping and Substring method:
string inputStr = "This is a very short sentence";
List<string> subStringList = new List<string>();
while(inputStr.IndexOf(' ')!=-1)
{
inputStr= inputStr.Substring(inputStr.IndexOf(' ')+1);
subStringList.Add(inputStr);
}
Console.WriteLine(String.Join("\n",subStringList));
Working Example

Related

Substring until space

I have string like this:
Some data of the string Job ID_Of_the_job some other data of the string
I need to get this ID_Of_the_job
I here this stored in notes string variable
intIndex = notes.IndexOf("Job ")
strJob = notes.Substring(intIndex+4, ???)
I dont know how to get the lenght of this job.
Thanks for help,
Marc

Since you're already using string.IndexOf, here's a solution which builds on that.
Note that there's an overload of String.IndexOf which takes a parameter saying where to start searching.
We've managed to find the beginning of the Job ID, by doing:
int startIndex = notes.IndexOf("Job ") + "Job ".Length;
startIndex is the index of the "I" in "ID_Of_the_job".
We can then use IndexOf again to find the next space -- which will be the space following "ID_Of_the_job":
int endIndex = notes.IndexOf(" ", startIndex);
We can then use Substring:
string jobId = notes.Substring(startIndex, endIndex - startIndex);
Note that there's no error-handling here: if either of the IndexOf fails to find the thing you're looking for, it will return -1, and your code will do strange things. It would be a good idea to handle these cases!
Another, terser solution is to use Regex.
string jobId = Regex.Match(notes, #"Job (\S+)").Groups[1].Value
The regular expression Job (\S+) looks for the text "Job ", followed by 1 or more non-whitespace characters. It puts those non-whitespace characters into a capture group (which becomes Groups[1]), which we can read out.
In this case, jobId will be an empty string if the regex doesn't match.
See these working on dotnetfiddle.

I think I'd make life easy, split the string on spaces and take the string after the array slot that had Job in it:
var notes = "Some data of the string Job ID_Of_the_job some other data of the string";
var bits = notes.Split();
var job = bits[bits.IndexOf("Job") + 1]; //or Array.IndexOf..
If you're on a recent .net and know the job number will occur within the first 10 (say) words, then you can stop splitting after a certain number of words, with e.g. Split(new[]{' '}, 10) - this gives the first 9 words then the rest of the string in the 10th slot which could be a useful performance boost
You could also pull this fairly easily with regex:
var r = new Regex("Job (?<j>[^ ]+?)");
var m = r.Match(notes);
var job = m.Groups["j"].Value;
If you can more accurately define the format of a job number e.g. "it's between 2-3 digits, then a underscore, slash or hyphen, followed by 4 digits", then you don't even have to use Job to locate it, you can put the pattern into the regex:
var r = new Regex(#"(?<j>\d{2,3}[-_\\]\d{4})");
That will pick out a string of the given pattern (\digits {2 to 3 of}, then [hyphen or underscore or slash], then \digits {4 of}).. For example

First step you already did: find the string "Job id ". Second step is to split result by ' ' to extract id.
var input = "Some data of the string Job ID_Of_the_job some other data of the string";
Console.WriteLine(input.Substring(input.IndexOf("Job") + 4).Split(' ')[0]);
Fiddle.

C# Extract part of the string that starts with specific letters

I have a string which I extract from an HTML document like this:
var elas = htmlDoc.DocumentNode.SelectSingleNode("//a[#class='a-size-small a-link-normal a-text-normal']");
if (elas != null)
{
//
_extractedString = elas.Attributes["href"].Value;
}
The HREF attribute contains this part of the string:
gp/offer-listing/B002755TC0/
And I'm trying to extract the B002755TC0 value, but the problem here is that the string will vary by its length and I cannot simply use Substring method that C# offers to extract that value...
Instead I was thinking if there's a clever way to do this, to perhaps a match beginning of the string with what I search?
For example I know for a fact that each href has this structure like I've shown, So I would simply match these keywords:
offer-listing/
So I would find this keyword and start extracting the part of the string B002755TC0 until the next " / " sign ?
Can someone help me out with this ?

This is a perfect job for a regular expression :
string text = "gp/offer-listing/B002755TC0/";
Regex pattern = new Regex(#"offer-listing/(\w+)/");
Match match = pattern.Match(text);
string whatYouAreLookingFor = match.Groups[1].Value;
Explanation : we just match the exact pattern you need.
'offer-listing/'
followed by any combination of (at least one) 'word characters' (letters, digits, hyphen, etc...),
followed by a slash.
The parenthesis () mean 'capture this group' (so we can extract it later with match.Groups[1]).
EDIT: if you want to extract also from this : /dp/B01KRHBT9Q/
Then you could use this pattern :
Regex pattern = new Regex(#"/(\w+)/$");
which will match both this string and the previous. The $ stands for the end of the string, so this literally means :
capture the characters in between the last two slashes of the string

Though there is already an accepted answer, I thought of sharing another solution, without using Regex. Just find the position of your pattern in the input + it's lenght, so the wanted text will be the next character. to find the end, search for the first "/" after the begining of the wanted text:
string input = "gp/offer-listing/B002755TC0/";
string pat = "offer-listing/";
int begining = input.IndexOf(pat)+pat.Length;
int end = input.IndexOf("/",begining);
string result = input.Substring(begining,end-begining);
If your desired output is always the last piece, you could also use split and get the last non-empty piece:
string result2 = input.Split(new string[]{"/"},StringSplitOptions.RemoveEmptyEntries)
.ToList().Last();

Extracting only the substring containing letters from a string containing digits strings and symbols

I have a string that is like the following:
string str = hello_16_0_2016;
What I want is to extract hello from the string. As in my program the string part can occur anywhere as it is autogenerated, so I cannot fix the position of my string.
For example: I can take the first five string from above and store it in a new variable.
But as occurring of letters is random and I want to extract only letters from the string above, so can someone guide me to the correct procedure to do this?

Could you just use a simple regular expression to pull out only alphabetic characters, assuming you only need a-z?
using System.Text.RegularExpressions;
var str = "hello_16_0_2016";
var onlyLetters = Regex.Replace(str, #"[^a-zA-Z]", "");
// onlyLetters = "hello"

I'd use something like this (uses Linq):
var str = "hello_16_0_2016";
var result = string.Concat(str.Where(char.IsLetter));
Check it out
Or, if performance is a concern (because you have to do this on a tight loop, or have to convert hundreds of thousands of strings), it'd probably be faster to do:
var result = new string(str.Where(char.IsLetter).ToArray());
Check it too

But as occurring of letters is random and I want to extract only
letters from the string above, so can someone guide me to the correct
procedure to do this?
The following will extract the first text, without numbers anywhere in the string:
Console.WriteLine( Regex.Match("hello_16_0_2016", #"[A-Za-z]+").Value ); // "hello"

Looking for the simplest way to extract tow strings from another in C#

I have the following strings:
string a = "1. testdata";
string b = "12. testdata xxx";
What I would like is to be able to extract the number into one string and the characters following the number into another. I tried using .IndexOf(".") and then remove, trim and
substrings. If possible I would like to find something simpler as I have this to do in a
lot of parts of my code.

if the format is always the same you could do:
a.Split('.');

Proposed solutions so far are not correct.
First, after Split('.') or Split(".") you will have space in the beginning of second substring.
Second, if you have more than one dot - you'll have to do something yet after the split.
More robust solution is below:
string a = "11. Test string. With dots.";
var res = a.Split(new[] {". "}, 2, StringSplitOptions.None);
string number = res[0];
string val = res[1];
Argument 2 specifies maximum number of strings to return. Thus when you have several dots - it will make a split only at the first.

string[]list = a.Split(".");
string numbers = list[0];
string chars = list[1];

Please tell me what the problem is c# regex.split()

string temp_constraint = row["Constraint_Name"].ToString();
string split_string = "FK_"+tableName+"_";
string[] words = Regex.Split(temp_constraint, split_string);
I am trying to split a string using another string.
temp_constraint = FK_ss_foo_ss_fee
split_string = FK_ss_foo_
but it returns a single dimension array with the same string as in temp_constraint
Please help

Your split operation works fine for me:
string temp_constraint = "FK_ss_foo_ss_fee";
string split_string = "FK_ss_foo_";
string[] words = Regex.Split(temp_constraint, split_string);
foreach (string word in words)
{
Console.WriteLine(">{0}<", word);
}
Output:
><
>ss_fee<
I think the problem is that your variables are not set to what you think they are. You will need to debug to find the error elsewhere in your program.
I would also avoid using Split for this (both Regex and String.Split). You aren't really splitting the input - you are removing a string from the start. Split might not always do what you want. Imagine if you have a foreign key like the following:
FK_ss_foo_ss_fee_FK_ss_foo_ss_bee
You want to get ss_fee_FK_ss_foo_ss_bee but split would give you ss_fee_ and ss_bee. This is a contrived example, but it does demonstrate that what you are doing is not a split.

You should use String.Split instead
string[] words =
temp_constraint.Split(new []{split_string}, StringSplitOptions.None);

string split uses a character array to split text and does the split by each character which is not often ideal.
The following article shows how to split text by an entire word
http://www.bytechaser.com/en/functions/ufgr7wkpwf/split-text-by-words-and-not-character-arrays.aspx

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Find all occurrences of substrings matching a pattern - c#

Using Split and some Linq: string text2 = "This is a very short sentence"; // Get all words except first one var parts = text2.Split(' ').Skip(1); // Generate various combinations var result = Enumerable.Range(0, parts.Count()) .Select(i => string.Join(" ", parts.Skip(i)));

Related

Substring until space

C# Extract part of the string that starts with specific letters

Extracting only the substring containing letters from a string containing digits strings and symbols

Looking for the simplest way to extract tow strings from another in C#

Please tell me what the problem is c# regex.split()

Categories

Resources