Extracting a string before space and sequence of numbers - c#

I have a string that has numbers dash and numbers so it can be
1-2
234-45
23-8
It can be any sequence of any number up to 12 characters.
These all numbers are preceded by a string. I need to extract this string before this sequence begins.
This is a Test1 1-2
This is a test for the first time 234-45
This is a test that is good 23-8
so I need to extract
This is a Test1
This is a test for the first time
This is a test that is good
there is only one space between this string and the sequence.
Is there any way I can extract that string. Split method is not working here.
I forgot to mention that I have numbers/test before the string too so it can be
2123 This is a test for the first time 23-456
or
Ac23 This is a test for the first time 23-457
any help will be appreciated.

Here's one way:
var sample = "2123 This is a Test1 1-2";
// Find the first occurrence of a space, and record the position of
// the next letter
var start = sample.IndexOf(' ') + 1;
// Pull from the string everything starting with the index found above
// to the last space (accounting for the difference from the starting index)
var text = sample.Substring(start, sample.LastIndexOf(' ') - start);
After this, text should equal:
This is a Test1
Wrap it up in a nice little function and send your collection of strings through it:
string ParseTextFromLine(string input)
{
var start = input.IndexOf(' ') + 1;
return input.Substring(start, input.LastIndexOf(' ') - start);
}

This is pretty easy,
string s = "This is a Test1 1-2";
s = s.Substring(0,s.LastIndexOf(" ");
and now s will be "This is a Test1"

Related

Substring until space

I have string like this:
Some data of the string Job ID_Of_the_job some other data of the string
I need to get this ID_Of_the_job
I here this stored in notes string variable
intIndex = notes.IndexOf("Job ")
strJob = notes.Substring(intIndex+4, ???)
I dont know how to get the lenght of this job.
Thanks for help,
Marc
Since you're already using string.IndexOf, here's a solution which builds on that.
Note that there's an overload of String.IndexOf which takes a parameter saying where to start searching.
We've managed to find the beginning of the Job ID, by doing:
int startIndex = notes.IndexOf("Job ") + "Job ".Length;
startIndex is the index of the "I" in "ID_Of_the_job".
We can then use IndexOf again to find the next space -- which will be the space following "ID_Of_the_job":
int endIndex = notes.IndexOf(" ", startIndex);
We can then use Substring:
string jobId = notes.Substring(startIndex, endIndex - startIndex);
Note that there's no error-handling here: if either of the IndexOf fails to find the thing you're looking for, it will return -1, and your code will do strange things. It would be a good idea to handle these cases!
Another, terser solution is to use Regex.
string jobId = Regex.Match(notes, #"Job (\S+)").Groups[1].Value
The regular expression Job (\S+) looks for the text "Job ", followed by 1 or more non-whitespace characters. It puts those non-whitespace characters into a capture group (which becomes Groups[1]), which we can read out.
In this case, jobId will be an empty string if the regex doesn't match.
See these working on dotnetfiddle.
I think I'd make life easy, split the string on spaces and take the string after the array slot that had Job in it:
var notes = "Some data of the string Job ID_Of_the_job some other data of the string";
var bits = notes.Split();
var job = bits[bits.IndexOf("Job") + 1]; //or Array.IndexOf..
If you're on a recent .net and know the job number will occur within the first 10 (say) words, then you can stop splitting after a certain number of words, with e.g. Split(new[]{' '}, 10) - this gives the first 9 words then the rest of the string in the 10th slot which could be a useful performance boost
You could also pull this fairly easily with regex:
var r = new Regex("Job (?<j>[^ ]+?)");
var m = r.Match(notes);
var job = m.Groups["j"].Value;
If you can more accurately define the format of a job number e.g. "it's between 2-3 digits, then a underscore, slash or hyphen, followed by 4 digits", then you don't even have to use Job to locate it, you can put the pattern into the regex:
var r = new Regex(#"(?<j>\d{2,3}[-_\\]\d{4})");
That will pick out a string of the given pattern (\digits {2 to 3 of}, then [hyphen or underscore or slash], then \digits {4 of}).. For example
First step you already did: find the string "Job id ". Second step is to split result by ' ' to extract id.
var input = "Some data of the string Job ID_Of_the_job some other data of the string";
Console.WriteLine(input.Substring(input.IndexOf("Job") + 4).Split(' ')[0]);
Fiddle.

Remove all characters after the fourth space in a string

I want to remove all characters after the fourth space in a string.
Example:
Source: AAD BCCD QWD SDKE DJQWEK DJT
Result: AAD BCCD QWD SDKE
I tried to use 'String.indexof'. but, I couldn't.
Here is my code:
Result = source.Substring(source.IndexOf(string.Empty, source.IndexOf(string.Empty) + 3));
You could try this:
string result = string.Join(" ", source.Split(' ').Take(4));
This splits the original source string at each space character, takes the first 4 occurrences and concatenates them with a space character.
It will also work correctly in cases where there are less than 4 counts of spaces in the source string.
Maybe try this (if it's still actuall of course):
string Source = "AAD BCCD QWD SDKE DJQWEK DJT"
int space = GetNthIndex(Source, ' ', 4);
string result = sample.Substring(0, space);
You can make a loop with a counter and check each character. Pseudocode:
counter = 0;
foreach(character in string)
if(counter > 4)
exit;
else if(character == space)
counter++;
output character
else
output character

String methods cutting out string parts

I've built a string builder to add spaces into text if it is capital. The sentence entered would look like this : "ThisIsASentence." Since it starts with a capital, the string builder would modify the sentence to look like this: " This Is A Sentence."
My problem is, If I were to have a sentence like "thisIsASentence." the string builder will separate the sentence like normal : " this Is A Sentence."
Still both have a space in front of the first character.
When the sentence runs through this line:
result = result.Substring(1, 1).ToUpper() + result.Substring(2).ToLower();
If the first letter entered was lowercase, it gets cut off and the second letter becomes uppercase.
The line was meant to keep the first letter entered capitalized and set the rest lowercase.
Adding a trim statement before running that line changes nothing with the output.
Here is my overall code right now:
private void btnChange_Click(object sender, EventArgs e)
{
// New string named sentence, assigning the text input to sentence.
string sentence;
sentence = txtSentence.Text;
// String builder to let us modify string
StringBuilder sentenceSB = new StringBuilder();
/*
* For every character in the string "sentence" if the character is uppercase,
* add a space before the letter,
* if it isn't, add nothing.
*/
foreach (char c in sentence)
{
if (char.IsUpper(c))
{
sentenceSB.Append(" ");
}
sentenceSB.Append(c);
}
// Store the edited sentence into the "result" string
string result = sentenceSB.ToString();
// Starting at the 2nd spot and going 1, makes the first character capitalized
// Starting at position 3 and going to end change them to lower case.
result = result.Substring(1, 1).ToUpper() + result.Substring(2).ToLower();
// set the label text to equal "result" and set it visible.
lblChanged.Text = result.ToString();
lblChanged.Visible = true;
When you run the code with "thisIsASentence", After your foreach loop, result will be "this Is A Sentence", since it will not insert a space at the beginning.
Then your next line, will take the Character at index 1 (which is the 'h' in this), Make it uppercase, and then append the rest of the string, resulting in "His Is A Sentence"
To fix this, you can do result = result.Trim() after the loop, and then start at index 0, making the next line result = result.Substring(0, 1).ToUpper() + result.Substring(1).ToLower();
With result.SubString(1,1), you are assuming the first letter of the input is always capitalized, so your will always add a space in the beginning of the string. You have already seen that this isn't the case.
So I see basically two options for you:
Wrap that line in an if block that checks for spaces before replacing;
Capitalize the first letter of your input, if it's allowed by your spec.

String split using C#

I have the following string:
string text = "1. This is first sentence. 2. This is the second sentence. 3. This is the third sentence. 4. This is the fourth sentence."
I want to split it according to 1. 2. 3. and so on:
result[0] == "This is first sentence."
result[1] == "This is the second sentence."
result[2] == "This is the third sentence."
result[3] == "This is the fourth sentence."
Is there any way I can do it C#?
Assuming that you can't encounter such a pattern in your sentences : X. (a integer, followed by a point, followed by a space), this should work:
String[] result = Regex.Split(text, #"[0-9]+\. ");
is it possible that there will be numbers in the sentence too?
As I do not know you formatting, you already said you cannot do on EOL/New Line I would try something like...
List<string> lines = new List<string>();
string buffer = "";
int count = 1;
foreach(char c in input)
{
if(c.ToString() == count.ToString())
{
if(!string.IsNullOrEmpty(buffer))
{
lines.Add(buffer);
buffer = "";
}
count++;
}
buffer += c;
}
//lines will now contain your splitted data
You can then access each sentence like this...
string s1 = lines[0];
string s2 = lines[1];
string s3 = lines[2];
Important: Make sure you check the count of lines before getting sentence like...
string s1 = lines.Count > 0 ? lines[0] : "";
This makes a big assumption that you will not have the next lines number ID in a given sentance (i.e. sentence 2 will not contain the number 3)
If this does not help the provide you input in original format (do not add lines breaks if there are none)
EDIT: Fixed my code (wrong variable sorry)
int index = 1;
String[] result = Regex.Split(text, #"[0-9]+\. ").Where(i => !string.IsNullOrEmpty(i)).Select(i => (index++).ToString() + ". " + i).ToArray();
result will contain your sentences, including the "line number".
You could split on the '.' char and drop anything smaller than 2 char from the resulting array.
Of course, this relies on the fact that you would have no datapoints of 1 character other than the numeric indicator, if that was the case you could also check for it as a numeric value.
This answer would also drop a period from your sentences, so you'd have to add that back in. There is a lot of manipulation but this saves you from having to read each char and decision it independently.
This is the easiest way:
var str = "1. This is first sentence." +
"2. This is the second sentence." +
"3. This is the third sentence." +
"n. This is the nenth sentence";
//set your max number e.g 10000
var num = Enumerable.Range(1, 10000).Select(x=>x.ToString()+".").ToArray();
var res=str.Split(num ,StringSplitOptions.RemoveEmptyEntries);
Hope this help ;)

How can get a substring from a string in C#?

I have a large string and it’s stored in a string variable, str. And I want to get a substring from that in C#.
Suppose the string is: " Retrieves a substring from this instance. The substring starts at a specified character position, "
The substring result what I want to display is: The substring starts at a specified character position.
You could do this manually or using the IndexOf method.
Manually:
int index = 43;
string piece = myString.Substring(index);
Using IndexOf, you can see where the full stop is:
int index = myString.IndexOf(".") + 1;
string piece = myString.Substring(index);
string newString = str.Substring(0, 10);
will give you the first 10 characters (from position 0 to position 9).
See String.Substring Method.
Here is an example of getting a substring from the 14th character to the end of the string. You can modify it to fit your needs.
string text = "Retrieves a substring from this instance. The substring starts at a specified character position.";
// Get substring where 14 is the start index
string substring = text.Substring(14);
Making the assumption that you want to split on the full stop (.), then here's an approach that would capture all occurrences:
// Add # to the string to allow split over multiple lines
// (for display purposes to save the scroll bar from
// appearing on a Stack Overflow question :))
string strBig = #"Retrieves a substring from this instance.
The substring starts at a specified character position. great";
// Split the string on the full stop, if it has a length>0
// then, trim that string to remove any undesired spaces
IEnumerable<string> subwords = strBig.Split('.')
.Where(x => x.Length > 0).Select(x => x.Trim());
// Iterate around the new 'collection' to sanity check it
foreach (var subword in subwords)
{
Console.WriteLine(subword);
}
string text = "Retrieves a substring from this instance. The substring starts at a specified character position. Some other text";
string result = text.Substring(text.IndexOf('.') + 1,text.LastIndexOf('.')-text.IndexOf('.'))
This will cut the part of string which lays between the special characters.
A better solution using index in C# 8:
string s = " Retrieves a substring from this instance. The substring starts at a specified character position, ";
string subString = s[43..^2]; // The substring starts at a specified character position
var data =" Retrieves a substring from this instance. The substring starts at a specified character position.";
var result = data.Split(new[] {'.'}, 1)[0];
Output:
Retrieves a substring from this instance. The substring starts at a
specified character position.
All answers used the main string that decrease performance. You should use Span to have better performance:
var yourStringSpan = yourString.AsSpan();
int index = yourString.IndexOf(".") + 1;
string piece = yourStringSpan.slice(index);
It's easy to rewrite this code in C#...:
This method works if your value is between two substrings!
For example:
stringContent = "[myName]Alex[myName][color]red[color][etc]etc[etc]"
The calls should be:
myNameValue = SplitStringByASubstring(stringContent, "[myName]")
colorValue = SplitStringByASubstring(stringContent, "[color]")
etcValue = SplitStringByASubstring(stringContent, "[etc]")

Categories