How to extract range of characters from a string - c#

If I have a string such as the following:
String myString = "SET(someRandomName, \"hi\", u)";
where I know that "SET(" will always exists in the string, but the length of "someRandomName" is unknown, how would I go about deleting all the characters from "(" to the first instance of """? So to re-iterate, I would like to delete this substring: "SET(someRandomName, \"" from myString.
How would I do this in C#.Net?
EDIT: I don't want to use regex for this.

Providing the string will always have this structure, the easiest is to use String.IndexOf() to look-up the index of the first occurence of ". String.Substring() then gives you appropriate portion of the original string.
Likewise you can use String.LastIndexOf() to find the index of the first " from the end of the string. Then you will be able to extract just the value of the second argument ("hi" in your sample).
You will end up with something like this:
int begin = myString.IndexOf('"');
int end = myString.LastIndexOf('"');
string secondArg = myString.Substring(begin, end - begin + 1);
This will yield "\"hi\"" in secondArg.
UPDATE: To remove a portion of the string, use the String.Remove() method:
int begin = myString.IndexOf('(');
int end = myString.IndexOf('"');
string altered = myString.Remove(begin + 1, end - begin - 1);
This will yield "SET(\"hi\", u)" in altered.

I know it's been years, but .Net been has also evolved in the meantime.
Consider using range operator in case anyone looking here for an answer.
Assuming that Set( and \"hi\", u) is constant value (8 digit without the escapes):
var sub = myString[^4...^8];
myString.Replace(sub, replaceValue);
more examples and a good explanation in this article or of course in microsoft docs

This is pretty awful, but this will accomplish what you want with a simple linq statement. Just presenting as an alternative to the IndexOf answers.
string myString = "SET(someRandomName, \"hi\", 0)";
string fixedStr = new String( myString.ToCharArray().Take( 4 ).Concat( myString.ToCharArray().SkipWhile( c => c != '"' ) ).ToArray() );
yields: SET("hi", 0)
Note: the skip is hard-coded for 4 characters, you could alter it to skip over the characters in an array that contains them instead.

I assume you want to transform
SET(someRandomName, "hi", u)
into:
SET(u)
To achieve that, you can use:
String newString = "SET(" + myString.Substring(myString.LastIndexOf(',') + 1).Trim();
To explain this bit by bit:
myString.LastIndexOf(',')
will give you the index (position) of your last , character. Increment it by 1 to get the start index of the third argument in your SET function.
myString.Substring(myString.LastIndexOf(',') + 1)
The Substring method will eliminate all characters up to the specified position. In this case, we’re eliminating everything up to (and including) the last ,. In the example above, this would eliminate the SET(someRandomName, "hi", part, and leave us with u).
The Trim is necessary simply to remove the leading space character before your u.
Finally, we prepend SET( to our substring (since we had formerly removed it due to our Substring).
Edit: Based on your comment below (which contradicts what you asked in your question), you can use:
String newString = "SET(" + myString.Substring(myString.IndexOf(',') + 1).Trim();

Related

How to strip a string from the point a hyphen is found within the string C#

I'm currently trying to strip a string of data that is may contain the hyphen symbol.
E.g. Basic logic:
string stringin = "test - 9894"; OR Data could be == "test";
if (string contains a hyphen "-"){
Strip stringin;
output would be "test" deleting from the hyphen.
}
Console.WriteLine(stringin);
The current C# code i'm trying to get to work is shown below:
string Details = "hsh4a - 8989";
var regexItem = new Regex("^[^-]*-?[^-]*$");
string stringin;
stringin = Details.ToString();
if (regexItem.IsMatch(stringin)) {
stringin = stringin.Substring(0, stringin.IndexOf("-") - 1); //Strip from the ending chars and - once - is hit.
}
Details = stringin;
Console.WriteLine(Details);
But pulls in an Error when the string does not contain any hyphen's.
How about just doing this?
stringin.Split('-')[0].Trim();
You could even specify the maximum number of substrings using overloaded Split constructor.
stringin.Split('-', 1)[0].Trim();
Your regex is asking for "zero or one repetition of -", which means that it matches even if your input does NOT contain a hyphen. Thereafter you do this
stringin.Substring(0, stringin.IndexOf("-") - 1)
Which gives an index out of range exception (There is no hyphen to find).
Make a simple change to your regex and it works with or without - ask for "one or more hyphens":
var regexItem = new Regex("^[^-]*-+[^-]*$");
here -------------------------^
It seems that you want the (sub)string starting from the dash ('-') if original one contains '-' or the original string if doesn't have dash.
If it's your case:
String Details = "hsh4a - 8989";
Details = Details.Substring(Details.IndexOf('-') + 1);
I wouldn't use regex for this case if I were you, it makes the solution much more complex than it can be.
For string I am sure will have no more than a couple of dashes I would use this code, because it is one liner and very simple:
string str= entryString.Split(new [] {'-'}, StringSplitOptions.RemoveEmptyEntries)[0];
If you know that a string might contain high amount of dashes, it is not recommended to use this approach - it will create high amount of different strings, although you are looking just for the first one. So, the solution would look like something like this code:
int firstDashIndex = entryString.IndexOf("-");
string str = firstDashIndex > -1? entryString.Substring(0, firstDashIndex) : entryString;
you don't need a regex for this. A simple IndexOf function will give you the index of the hyphen, then you can clean it up from there.
This is also a great place to start writing unit tests as well. They are very good for stuff like this.
Here's what the code could look like :
string inputString = "ho-something";
string outPutString = inputString;
var hyphenIndex = inputString.IndexOf('-');
if (hyphenIndex > -1)
{
outPutString = inputString.Substring(0, hyphenIndex);
}
return outPutString;

How do I Split string only at last occurrence of special character and use both sides after split

I want to split a string only at last occurrence of special character.
I try to parse a name of a tab from browser, so my initial string looks for example like this:
Untitled - Google Chrome
That is easy to solve as there is a Split function. Here is my implementation:
var pageparts= Regex.Split(inputWindow.ToString(), " - ");
InsertWindowName(pageparts[0].ToString(), pageparts[1].ToString());//method to save string into separate columns in DB
This works, but problem occurs, when I get a page like this:
SQL injection - Wikipedia, the free encyclopedia - Mozilla Firefox
Here are two dashes, which means, that after split is done, there are 3 separate strings in array and if I would continue normally, database would contain in first column value "SQL injection" and in second column value "Wikipedia, the free encyclopedia". Last value will be completely left out.
What I want is that first column in database will have value:
SQL injection - Wikipedia, the free encyclopedia" and second column will have:
"Mozilla Firefox". Is that somehow possible?
I tried to use a Split(" - ").Last() function (even LastOrDefault() too), but then I only got a last string. I need to get both side of the original string. Just separated by last dash.
You can use String.Substring with String.LastIndexOf:
string str = "SQL injection - Wikipedia, the free encyclopedia - Mozilla Firefox";
int lastIndex = str.LastIndexOf('-');
if (lastIndex + 1 < str.Length)
{
string firstPart = str.Substring(0, lastIndex);
string secondPart = str.Substring(lastIndex + 1);
}
Create a extension method (or a simple method) to perform that operation and also add some error checking for lastIndex.
EDIT:
If you want to split on " - " (space-space) then use following to calculate lastIndex
string str = "FirstPart - Mozzila Firefox-somethingWithoutSpace";
string delimiter = " - ";
int lastIndex = str.LastIndexOf(delimiter);
if (lastIndex + delimiter.Length < str.Length)
{
string firstPart = str.Substring(0, lastIndex);
string secondPart = str.Substring(lastIndex + delimiter.Length);
}
So for string like:
"FirstPart - Mozzila Firefox-somethingWithoutSpace"
Output would be:
FirstPart
Mozzila Firefox-somethingWithoutSpace
Please forgive me for my laziness ins this solution i'm sure there is a better approach but i will give you one solution proposal i'm assuming you are codding in C#.
First of all correct me if I get wrongly the question no matter what you just want to columns returned the first (all text even of it includes dashes but the last one) and last column (all the text after last dash) if it's ok. let's do it.
// I Only use split function when I want all data in separate variable (array position) in you case I assumed that you just want 2 values (if possible), so you can use substring.
static void Main(string[] args)
{
string firstname = "";
string lastName = "";
string variablewithdata = "SQL injection - Wikipedia, -the free encyclopedia - Mozilla Firefox";
// variablewithdata.LastIndexOf('-') = returns Integer corresponding to the last position of that character.
//I suggest you validate if variablewithdata.LastIndexOf('-') is equal to -1 or not because if it don't found your character it returns -1 so if the value isn't -1 you can substring
firstname = variablewithdata.Substring(0, (variablewithdata.LastIndexOf('-') - 1));
lastName = variablewithdata.Substring(variablewithdata.LastIndexOf('-') + 1);
Console.WriteLine("FirstColumn: {0} \nLastColumn:{1}",firstname,lastName);
Console.ReadLine();
}
If it's not what you want can you explain me for example for "SQL injection - Wikipedia,- the free - encyclopedia - Mozilla Firefox" what's suppose to be returned?
Forgive me for unclean code i'm bored today.
If you don't care about reassembling strings, you could use something like :
var pageparts= Regex.Split(inputWindow.ToString(), " - ");
var firstPart = string.Join(" - ", pageparts.Take(pageparts.Length - 1));
var secondPart = pageparts.Last()
InsertWindowName(firstPart, secondPart);

Split values in arrays

I have a Long string from that I want to store the keyword in array or collection, the format of my string is like below:
Title: My Test Page Title.
Desc: My page description.
Keywords: Bessel function, legendre function, Differential Equations, Bessel, Legendre, Homogenous, Assignment & Maths Homework Help.
Bessel & Legendre Function:
Homogenous Equations of the second order of the type
+ x + ( - )y = 0, v [0, ), x [0, )………………….(1)
(1 - ) - 2x + n (n + 1)y = 0, n = 1, 2 ……, x (-1, 1)…………………(2)
In this String I want to store all Keywords in Array/collection split from comma.
My problem is that How I can find out the starting and ending point to split the keywords, I can get the Starting point from Keywords: but what should be my ending point to store the keyword in array/collection, there is no any fix format,
there is only one fix format which is there will be a Para after ending the Keyword section.
any one can suggest me regular expression for this.
there will be a Para
Seems like you should first split the string into lines.
And then the line that starts with Keywords: holds your keywords.
You can use the string.Split() method to split into lines as well as for breaking out the keywords.
It also looks like the Keywords section ends with a fullstop. So you could find the next fullstop ie IndexOf(".") after the "Keywords:" ....
I think this should do:
string afterKeywords = data.Substring(data.IndexOf("Keywords:") + 9);
string beforeNextPara = afterKeywords.Substring(0, afterKeywords.IndexOf(Environment.NewLine + Environment.NewLine));
var dataWeNeed = beforeNextPara.Split(',');

.NET String parsing performance improvement - Possible Code Smell

The code below is designed to take a string in and remove any of a set of arbitrary words that are considered non-essential to a search phrase.
I didn't write the code, but need to incorporate it into something else. It works, and that's good, but it just feels wrong to me. However, I can't seem to get my head outside the box that this method has created to think of another approach.
Maybe I'm just making it more complicated than it needs to be, but I feel like this might be cleaner with a different technique, perhaps by using LINQ.
I would welcome any suggestions; including the suggestion that I'm over thinking it and that the existing code is perfectly clear, concise and performant.
So, here's the code:
private string RemoveNonEssentialWords(string phrase)
{
//This array is being created manually for demo purposes. In production code it's passed in from elsewhere.
string[] nonessentials = {"left", "right", "acute", "chronic", "excessive", "extensive",
"upper", "lower", "complete", "partial", "subacute", "severe",
"moderate", "total", "small", "large", "minor", "multiple", "early",
"major", "bilateral", "progressive"};
int index = -1;
for (int i = 0; i < nonessentials.Length; i++)
{
index = phrase.ToLower().IndexOf(nonessentials[i]);
while (index >= 0)
{
phrase = phrase.Remove(index, nonessentials[i].Length);
phrase = phrase.Trim().Replace(" ", " ");
index = phrase.IndexOf(nonessentials[i]);
}
}
return phrase;
}
Thanks in advance for your help.
Cheers,
Steve
This appears to be an algorithm for removing stop words from a search phrase.
Here's one thought: If this is in fact being used for a search, do you need the resulting phrase to be a perfect representation of the original (with all original whitespace intact), but with stop words removed, or can it be "close enough" so that the results are still effectively the same?
One approach would be to tokenize the phrase (using the approach of your choice - could be a regex, I'll use a simple split) and then reassemble it with the stop words removed. Example:
public static string RemoveStopWords(string phrase, IEnumerable<string> stop)
{
var tokens = Tokenize(phrase);
var filteredTokens = tokens.Where(s => !stop.Contains(s));
return string.Join(" ", filteredTokens.ToArray());
}
public static IEnumerable<string> Tokenize(string phrase)
{
return string.Split(phrase, ' ');
// Or use a regex, such as:
// return Regex.Split(phrase, #"\W+");
}
This won't give you exactly the same result, but I'll bet that it's close enough and it will definitely run a lot more efficiently. Actual search engines use an approach similar to this, since everything is indexed and searched at the word level, not the character level.
I guess your code is not doing what you want it to do anyway. "moderated" would be converted to "d" if I'm right. To get a good solution you have to specify your requirements a bit more detailed. I would probably use Replace or regular expressions.
I would use a regular expression (created inside the function) for this task. I think it would be capable of doing all the processing at once without having to make multiple passes through the string or having to create multiple intermediate strings.
private string RemoveNonEssentialWords(string phrase)
{
return Regex.Replace(phrase, // input
#"\b(" + String.Join("|", nonessentials) + #")\b", // pattern
"", // replacement
RegexOptions.IgnoreCase)
.Replace(" ", " ");
}
The \b at the beginning and end of the pattern makes sure that the match is on a boundary between alphanumeric and non-alphanumeric characters. In other words, it will not match just part of the word, like your sample code does.
Yeah, that smells.
I like little state machines for parsing, they can be self-contained inside a method using lists of delegates, looping through the characters in the input and sending each one through the state functions (which I have return the next state function based on the examined character).
For performance I would flush out whole words to a string builder after I've hit a separating character and checked the word against the list (might use a hash set for that)
I would create A Hash table of Removed words parse each word if in the hash remove it only one time through the array and I believe that creating a has table is O(n).
How does this look?
foreach (string nonEssent in nonessentials)
{
phrase.Replace(nonEssent, String.Empty);
}
phrase.Replace(" ", " ");
If you want to go the Regex route, you could do it like this. If you're going for speed it's worth a try and you can compare/contrast with other methods:
Start by creating a Regex from the array input. Something like:
var regexString = "\\b(" + string.Join("|", nonessentials) + ")\\b";
That will result in something like:
\b(left|right|chronic)\b
Then create a Regex object to do the find/replace:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(regexString, System.Text.RegularExpressions.RegexOptions.IgnoreCase);
Then you can just do a Replace like so:
string fixedPhrase = regex.Replace(phrase, "");

String functions

I want to search for a given string, within another string (Ex. find if "something" exists inside "something like this". How can I do the following? :
Know the position in which "something" is located (in the curr. ex. this is = 0.
Extract everything to the left or to the right, up to the char. found (see 1).
Extract a substring beggining where the sought string was found, all the way to X amount of chars (in Visual Basic 6/VBA I would use the Mid function).
string searched = "something like this";
1.
int pos = searched.IndexOf("something");
2.
string start = searched.Substring(0, pos);
string endstring = searched.Substring(pos);
3.
string mid = searched.Substring(pos, x);
Have you looked at the String.SubString() method? You can use the IndexOf() method to see if the substring exists first.
Take a look at the System.String member functions, in particular the IndexOf method.
Use int String.IndexOf(String).
I would do something like this:
string s = "I have something like this";
//question No. 1
int pos = s.IndexOf("something");
//quiestion No. 2
string[] separator = {"something"};
string[] leftAndRightEntries = s.Split(separator, StringSplitOptions.None);
//question No. 3
int x = pos + 10;
string substring = s.Substring(pos, x);
I would avoid using Split, as it's designed to give you multiple results. I would stick with the code in the first example, though the second block should actually read...
string start = searched.Substring(0, pos);
string endstring;
if(pos < searched.Length - 1)
endstring = searched.Substring(pos + "something".Length);
else
endstring = string.Empty
The key difference is accounting for the length of the string to find (hence the rather odd-looking "something".Length, as this example is designed for you to be able to plop in your own variable).

Categories