C# spliting string with three pieces - c#

Hello Everybody i asked this question few hours ago C# get username from string. split
Now i have difficult problem. Trying to get Acid Player And m249 from this string
L 02/28/2012 - 06:14:22: "Acid<1><VALVE_ID_PENDING><CT>"
killed "Player<2><VALVE_ID_PENDING><TERRORIST>" with "m249"
I tried this
int start = Data.ToString().IndexOf('"') + 1;
int end = Data.ToString().IndexOf('<');
var Killer = Data.ToString().Substring(start, end - start);
int start1 = Data.ToString().IndexOf("killed") + 1;
int end1 = Data.ToString().IndexOf('<') + 4;
var Victim = Data.ToString().Substring(start1, end1 - start1);
but its show this exception on last line
Length cannot be less than zero.
Parameter name: length
Does it possible to get Both player name and last string (m249)
Tanks

Here is a simple example of how you can do it with regex. Depending on how much the string varies, this one may work for you. I'm assuming that quotes (") are consistent as well as the text between them. You'll need to add this line at the top:
Using System.Text.RegularExpressions;
Code:
string input = "L 02/28/2012 - 06:14:22: \"Acid<1><VALVE_ID_PENDING><CT>\" killed \"Player<2><VALVE_ID_PENDING><TERRORIST>\" with \"m249\"";
Regex reg = new Regex("[^\"]+\"([^<]+)<[^\"]+\" killed \"([A-Za-z0-9]+)[^\"]+\" with \"([A-Za-z0-9]+)\"");
Match m = reg.Match(input);
if (m.Success)
{
string player1 = m.Groups[1].ToString();
string player2 = m.Groups[2].ToString();
string weapon = m.Groups[3].ToString();
}
The syntax breakdown for the regex is this:
[^\"]+
means, go till we hit a double quote (")
\"
means take the quote as the next part of the string, since the previous term brings us to it, but doesn't go past it.
([^<]+)<
The parenthesis means we are interested in the results of this part, we will seek till we hit a less than (<). since this is the first "group" we're looking to extract, it's referred to as Groups[1] in the match. Again we have the character we were searching for to consume it and continue our search.
<[^\"]+\" killed \"
This will again search, without keeping the results due to no parenthesis, till we hit the next quote mark. We then manually specify the string of (" killed ") since we're interested in what's after that.
([A-Za-z0-9]+)
This will capture any characters for our Group[2] result that are alphanumeric, upper or lowercase.
[^\"]+\"
Search and ignore the rest till we hit the next double quote
with \"
Another literal string that we're using as a marker
([A-Za-z0-9]+)
Same as above, return alphanumeric as our Group[3] with the parenthesis
\"
End it off with the last quote.
Hopefully this explains it. A google for "Regular Expressions Cheat Sheet" is very useful for remembering these rules.

Should be super easy to parse. I recognized that it was CS. Take a look at Valve's documentation here:
https://developer.valvesoftware.com/wiki/HL_Log_Standard#057._Kills
Update:
If you're not comfortable with regular expressions, this implementation will do what you want as well and is along the lines of what you attempted to do:
public void Parse(string killLog)
{
string[] parts = killLog.Split(new[] { " killed ", " with " }, StringSplitOptions.None);
string player1 = parts[0].Substring(1, parts[0].IndexOf('<') - 1);
string player2 = parts[1].Substring(1, parts[1].IndexOf('<') - 1);
string weapon = parts[2].Replace("\"", "");
}
Personally, I would use a RegEx.

Related

Substring until space

I have string like this:
Some data of the string Job ID_Of_the_job some other data of the string
I need to get this ID_Of_the_job
I here this stored in notes string variable
intIndex = notes.IndexOf("Job ")
strJob = notes.Substring(intIndex+4, ???)
I dont know how to get the lenght of this job.
Thanks for help,
Marc
Since you're already using string.IndexOf, here's a solution which builds on that.
Note that there's an overload of String.IndexOf which takes a parameter saying where to start searching.
We've managed to find the beginning of the Job ID, by doing:
int startIndex = notes.IndexOf("Job ") + "Job ".Length;
startIndex is the index of the "I" in "ID_Of_the_job".
We can then use IndexOf again to find the next space -- which will be the space following "ID_Of_the_job":
int endIndex = notes.IndexOf(" ", startIndex);
We can then use Substring:
string jobId = notes.Substring(startIndex, endIndex - startIndex);
Note that there's no error-handling here: if either of the IndexOf fails to find the thing you're looking for, it will return -1, and your code will do strange things. It would be a good idea to handle these cases!
Another, terser solution is to use Regex.
string jobId = Regex.Match(notes, #"Job (\S+)").Groups[1].Value
The regular expression Job (\S+) looks for the text "Job ", followed by 1 or more non-whitespace characters. It puts those non-whitespace characters into a capture group (which becomes Groups[1]), which we can read out.
In this case, jobId will be an empty string if the regex doesn't match.
See these working on dotnetfiddle.
I think I'd make life easy, split the string on spaces and take the string after the array slot that had Job in it:
var notes = "Some data of the string Job ID_Of_the_job some other data of the string";
var bits = notes.Split();
var job = bits[bits.IndexOf("Job") + 1]; //or Array.IndexOf..
If you're on a recent .net and know the job number will occur within the first 10 (say) words, then you can stop splitting after a certain number of words, with e.g. Split(new[]{' '}, 10) - this gives the first 9 words then the rest of the string in the 10th slot which could be a useful performance boost
You could also pull this fairly easily with regex:
var r = new Regex("Job (?<j>[^ ]+?)");
var m = r.Match(notes);
var job = m.Groups["j"].Value;
If you can more accurately define the format of a job number e.g. "it's between 2-3 digits, then a underscore, slash or hyphen, followed by 4 digits", then you don't even have to use Job to locate it, you can put the pattern into the regex:
var r = new Regex(#"(?<j>\d{2,3}[-_\\]\d{4})");
That will pick out a string of the given pattern (\digits {2 to 3 of}, then [hyphen or underscore or slash], then \digits {4 of}).. For example
First step you already did: find the string "Job id ". Second step is to split result by ' ' to extract id.
var input = "Some data of the string Job ID_Of_the_job some other data of the string";
Console.WriteLine(input.Substring(input.IndexOf("Job") + 4).Split(' ')[0]);
Fiddle.

Console application that counts the number of words in a sentence. (C#)

This program checks every character in a sentence. Every time the character is a space(" ") the numberOfWords (variable) will be incremented by 1.
Is this the right way to do it?
string sentence;
int numberOfWords;
int sentenceLength;
int counter;
string letter;
Console.Write("Sentence :");
sentence = Console.ReadLine();
sentenceLength = sentence.Length;
numberOfWords = 1;
counter = 0;
while (counter < sentenceLength)
{
letter = Convert.ToString(sentence[counter]);
if (letter == " ")
{
numberOfWords++;
counter++;
}
else
{
counter++;
}
}
Console.Write("Number of words in this sentence :");
Console.WriteLine(numberOfWords);
Console.ReadLine();
Well, the easy answer is; don't reinvent the wheel, use existing tools:
var numberOfWords =
sentence.Split(
' ',
StringSplitOptions.
RemoveEmptyEntries).Length;
But that would be cheating...
So, taking your code, there are a few things that need to be fixed:
First, don't make your method do too many things. There is no reason why a method counting words should know anything about how to output the result to any given user interface. Simply make a method that knows how to count words and returns the number of words:
public static int CountWords(string sentence) { ...}
Now you can reuse this method in any type of application; console, windows forms, WPF, etc.
Second, take corner or trivial cases out of the equation fast. Null sentences are either an error or have no words. Make a choice on how you want to process this scenario. If 0 words makes sense, you can solve a few cases in one strike:
if (string.IsNullOrWhiteSpace(sentence))
return 0;
Third, don't perform unnecessary conversions; converting chars to strings simply to perform an equality check with " " is wasteful. Compare chars directly (' '), or use the aptly named char.IsWhiteSpace(+) method.
Fourth, your logic is flawed. Double spaces, leading spaces, etc. will all give you wrong results. The reason being that your condition on when to count a word is faulty. Encountering a whitespace doesn't necessarily mean a new word is on the way; another whitespace might be waiting, you’ve already encountered a white space in the previous iteration, the sentence might end, etc.
In order to make your logic work you need to keep track of what happened before, what’s happening now and what will happen next... if that sounds messy and over complicated, don’t worry, you are absolutely right.
A simpler way is to shift your logic just a little; let’s say we encounter a new word everytime we find a non whitespace(*) that is preceded by a whitespace. What happens after is irrelevant, so we’ve just made things a lot easier:
var counter = 0;
var words = 0;,
var previousIsWhiteSpace = false;
while (counter < sentence.Length)
{
if (char.IsWhiteSpace(sentence[counter]))
{
previousIsWhiteSpace = true;
}
else if (previousIsWhiteSpace)
{
words += 1;
previousIsWhiteSpace = false;
}
counter += 1;
}
Put it all together and you are done.
(+) this will actually flag more than a regular space as a valid whitespace; tab, new line, etc. will all return true.
(*) I’m ignoring scenarios involving punctuation marks, separators, etc.
Just sticking with the style of your implementation, assuming the input is only split by single spaces, it's much nicer to just split your sentence string on each white space character.
string[] words = sentence.Trim().Split(null);
using null as the argument, white space will be used to split. Trim() removes trailing and leading space characters.
Then, using words.Length you can easily get how many words there are separated by white space. However, this won't account for double spaces or empty sentences. Removing double or more spaces is best achieved with regex.
RegexOptions options = RegexOptions.None;
Regex regex = new Regex("[ ]{2,}", options);
sentence = regex.Replace(sentence, " ");

C#: Remove Excess Text From String

Okay, so after looking around here on SO, I have found a solution that meets about 95% of my requirement, although I believe it may need to be redone at this point.
ISSUE
Say I have a value range supplied as "1000 - 1009 ABC1 ABC SOMETHING ELSE" where I just need the 1000 - 1009 part. I need to be able to remove excess characters from the string supplied, even if they truly are accepted characters, but only if they are part of secondary strings with text. (Sorry if that description seems odd, my mind isn't full power today.)
CURRENT SOLUTION
I currently have a simple method utilizing Linq to return only accepted characters, however this will return "1000 - 10091" which is not the range I am needing. I've thought about looping through the strings individual characters and comparing to previous characters as I go using IsDigit and IsLetter to my advantage, but then comes the issue of replacing the unacceptable characters or removing them. I think if I gave it a day or two I could figure it out with a clear mind, but it needs to be done by the end of the day, and I am banging my head against the keyboard.
void RemoveExcessText(ref string val) {
string allowedChars = "0123456789-+>";
val = new string(val.Where(c => allowedChars.Contains(c)).ToArray());
}
// Alternatively?
char previousChar = ' ';
for (int i = 0; i < val.Length; i++) {
if (char.IsLetter(val[i])) {
previousChar = val[i];
val.Remove(i, 1);
} else if (char.IsDigit(val[i])) {
if (char.IsLetter(previousChar)) {
val.Remove(i, 1);
}
}
}
But how do I calculate white space and leave in the +, -, and > charactrers? I am losing my mind on this one today.
Why not use a regular expression?
Regex.Match("1000 - 1009 ABC1 ABC SOMETHING ELSE", #"^(\d+)([\s\-]+)(\d+)");
Should give you what you want
I made a fiddle
You use a regular expression with a capturing group:
Regex r = new Regex("^(?<v>[-0-9 ]+?)");
This means "from the start of the input string (^) match [0 to 9 or space or hyphen] and keep going for as many occurrences of these characters as are available (+?) and store it into variable v (?)"
We get it out like this:
r.Matches(input)[0].Groups["v"].Value
Note though that if the input string doesn't match, the match collection will be 0 long and a call to [0] will crash. To this end you might want to robust it up with some extra error checking:
MatchCollection mc = r.Matches(input);
if(mc.Length > 0)
MessageBox.Show(mc[0].Groups["v"].Value;
You could match this with a regular expression. \d{1,4} means match a decimal digit at least once up to 4 times. Followed by space, hyphen, space, and 1 to 4 digits again, then anything else. Only the part inside parenthesis is output in your results.
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
var pattern = #"(^\d{1,4} - \d{1,4}).*";
string input = ("1000 - 1009 ABC1 ABC SOMETHING ELSE");
string replacement = "$1";
string result = Regex.Replace(input, pattern, replacement);
Console.WriteLine(result);
}
}
https://dotnetfiddle.net/cZGlX4

Capture two blocks in a string

I have a string that's in this format:
Message: Something bad happened in This.Place < Description> Some sort of information here< /Description>< Error> Some other stuff< /Error>< Message> Some message here.
I can't seem to figure out how to match everything in the Description block and also everything in the Message block using regex.
My question is in two parts: 1.) Is regex the right choice for this?
2.) If so, how can I match those two blocks and exclude the rest?
I can match the first part with a simple < Description>.*< /Description>, but can't match < Message>. I've tried excluding everything inbetween by trying to use what's described here http://blog.codinghorror.com/excluding-matches-with-regular-expressions/
With all the disclaimers about parsing xml in regex, it's still good do know how to do this with regex.
For instance, if you had your back against the wall, this would works for the < Description> tag (adapt it for the other tag).
(?<=< Description>).*?(?=< /Description>)
Some things you need to know:
The (?<=< Description>) is a lookbehind that asserts that at that position in the string, what precedes is < Description>. So if you change the spaces in your tag, all bets are off. To handle potential typing errors (depending on the origin of your text), you can insert optional spaces: (?<=< *Description *>) where the * repeats the space character zero or more times. The lookbehind is only an assertion, it does not consume any characters.
The .*? lazily eats up all characters until it can find what follows...
Which is the (?=< /Description>) lookahead that asserts that at that position in the string, what follows is < /Description>
In code, this becomes something like:
description = Regex.Match(yourstring, "(?<=< *Description *>).*?(?=< */Description *>)").Value;
This is how I'd parse it. Caveat: I've written the regex assuming the format shown in the example you've provided is pretty rigid; if the data varies a little (say, there isn't always a space after the '<' characters), you'll need to tweak it a little. But this should get you going.
var text = "Message: Something bad happened in This.Place < Description> Some"+
" sort of information here< /Description>< Error> Some other stuff"+
"< /Error>< Message> Some message here.";
var regex = new Regex(
"^.*?<\\sDescription\\>(?<description>.*?)<\\s/Description\\>"+
".*?<\\sMessage\\>(?<message>.*?)$",
RegexOptions.IgnoreCase | RegexOptions.Singleline
);
var matches = regex.Match(text);
if (matches.Success) {
var desc = matches.Groups["description"].Value;
// " Some sort of information here"
var msg = matches.Groups["message"].Value;
// " Some message here."
}
It was fairly difficult to try to remove the non-XML-formatted data from the text, so IndexOf and Substring ended up being what I used. IndexOf will find the index of a specified character or string, and Substring captures characters based on a starting point and a count of how many it should capture.
int descriptionBegin = 0;
int descriptionEnd = 0;
int messageBegin = 0;
int messageEnd = 0;
foreach (string j in errorList)
{
descriptionBegin = j.IndexOf("<Description>") + 13; // starts after the opening tag
descriptionEnd = j.IndexOf("</Description>") - 13; // ends before the closing tag
messageBegin = j.IndexOf("<Message>") + 9; // starts after the opening tag
messageEnd = j.IndexOf("</Message>") - 9; // ends before the closing tag
descriptionDiff = descriptionEnd - descriptionBegin; // amount of chars between tags
messageDiff = messageEnd - messageBegin; // amount of chars between tags
string description = j.Substring(descriptionBegin, descriptionDiff); // grabs only specified amt of chars
string message = j.Substring(messageBegin, messageDiff); // grabs only specified amt of chars
}
Thanks #Lucius for the suggestion.
#Darryl that actually looks like it might work. Thanks for the thorough answer...I might try that out for other stuff in the future (non-XML of course :))

How to capitalize first letter of each sentence?

I know how to capitalize first letter in each word. But I want to know how to capitalize first letter of each sentence in C#.
This is not necessarily a trivial problem. Sentences can end with a number of different punctuation marks, and those same punctuation marks don't always denote the end of a sentence (abbreviations like Dr. may pose a particular problem because there are potentially many of them).
That being said, you might be able to get a "good enough" solution by using regular expressions to look for words after a sentence-ending punctuation, but you would have to add quite a few special cases. It might be easier to process the string character by character or word by word. You would still have to handle all the same special cases, but it might be easier than trying to build that into a regex.
There are lots of weird rules for grammar and punctuation. Any solution you come up with probably won't be able to take them all into account. Some things to consider:
Sentences can end with different punctuation marks (. ! ?)
Some punctuation marks that end sentences might also be used in the middle of a sentence (e.g. abbreviations such as Dr. Mr. e.g.)
Sentences could contain nested sentences. Quotations could pose a particular problem (e.g. He said, "This is a hard problem! I wonder," he mused, "if it can be solved.")
As a first approximation, you could probably treat any sequence like [a-z]\.[ \n\t] as the end of a sentence.
Consider a sentence as a word containing spaces an ending with a period.
There's some VB code on this page which shouldn't be too hard to convert to C#.
However, subsequent posts point out the errors in the algorithm.
This blog has some C# code which claims to work:
It auto capitalises the first letter after every full stop (period), question mark and exclamation mark.
UPDATE 16 Feb 2010: I’ve reworked it so that it doesn’t affect strings such as URL’s and the like
Don't forget sentences with parentheses. Also, * if used as an idicator for bold text.
http://www.grammarbook.com/punctuation/parens.asp
I needed to do something similar, and this served my purposes. I pass in my "sentences" as a IEnumerable of strings.
// Read sentences from text file (each sentence on a separate line)
IEnumerable<string> lines = File.ReadLines(inputPath);
// Call method below
lines = CapitalizeFirstLetterOfEachWord(lines);
private static IEnumerable<string> CapitalizeFirstLetterOfString(IEnumerable<string> inputLines)
{
// Will output: Lorem lipsum et
List<string> outputLines = new List<string>();
TextInfo textInfo = new CultureInfo("en-US", false).TextInfo;
foreach (string line in inputLines)
{
string lineLowerCase = textInfo.ToLower(line);
string[] lineSplit = lineLowerCase.Split(' ');
bool first = true;
for (int i = 0; i < lineSplit.Length; i++ )
{
if (first)
{
lineSplit[0] = textInfo.ToTitleCase(lineSplit[0]);
first = false;
}
}
outputLines.Add(string.Join(" ", lineSplit));
}
return outputLines;
}
I know I'm little late, but just like You, I needed to capitalize every first character on each of my sentences.
I just fell here (and a lot of other pages while I was researching) and found nothing to help me out. So, I burned some neurons, and made a algorithm by myself.
Here is my extension method to capitalize sentences:
public static string CapitalizeSentences(this string Input)
{
if (String.IsNullOrEmpty(Input))
return Input;
if (Input.Length == 1)
return Input.ToUpper();
Input = Regex.Replace(Input, #"\s+", " ");
Input = Input.Trim().ToLower();
Input = Char.ToUpper(Input[0]) + Input.Substring(1);
var objDelimiters = new string[] { ". ", "! ", "? " };
foreach (var objDelimiter in objDelimiters)
{
var varDelimiterLength = objDelimiter.Length;
var varIndexStart = Input.IndexOf(objDelimiter, 0);
while (varIndexStart > -1)
{
Input = Input.Substring(0, varIndexStart + varDelimiterLength) + (Input[varIndexStart + varDelimiterLength]).ToString().ToUpper() + Input.Substring((varIndexStart + varDelimiterLength) + 1);
varIndexStart = Input.IndexOf(objDelimiter, varIndexStart + 1);
}
}
return Input;
}
Details about the algorithm:
This simple algorithm starts removing all double spaces. Then, it capitalize the first character of the string. then search for every delimiter. When find one, capitalize the very next character.
I made it easy to Add/Remove or Edit the delimiters, so You can change a lot how code works with a little change on it.
It doesn't check if the substrings go out of the string length, because the delimiters end with spaces, and the algorithm starts with a "Trim()", so every delimiter if found in the string will be followed by another character.
Important:
You didn't specify what were exactly your needs. I mean, it's a grammar corrector, it's just to prettify a text, etc... So, it's important to consider that my algorithm is just perfect for my needs, that can be different of yours.
*This algorithm was created to format a "Product Description" that isn't normalized (almost always it's entirely uppercased) in a nice format to the user (To be more specific, I need to show a pretty and "smaller" text for user. So, all characters in Upper Case is just opposite of what I want). So, it was not created to be grammatically perfect.
*Also, there maybe some exceptions where the character will not be uppercased because bad formatting.
*I choose to include spaces in the delimiter, so "http://www.stackoverflow.com" will not become "http://www.Stackoverflow.Com". In the other hand, sentences like "the box is blue.it's on the floor" will become "The box is blue.it's on the floor", and not "The box is blue.It's on the floor"
*In abbreviations cases, it will capitalize, but once again, it's not a problem because my needs is just show a product description (where grammar is not extremely critic). And in abbreviations like Mr. or Dr. the very first character is a name, so, it's perfect to be capitalized.
If You, or somebody else needs a more accurate algorithm, I'll be glad to improve it.
Hope I could help somebody!
However you can make a class or method to convert each text in TitleCase. Here is the example you just need to call the method.
public static string ToTitleCase(string strX)
{
string[] aryWords = strX.Trim().Split(' ');
List<string> lstLetters = new List<string>();
List<string> lstWords = new List<string>();
foreach (string strWord in aryWords)
{
int iLCount = 0;
foreach (char chrLetter in strWord.Trim())
{
if (iLCount == 0)
{
lstLetters.Add(chrLetter.ToString().ToUpper());
}
else
{
lstLetters.Add(chrLetter.ToString().ToLower());
}
iLCount++;
}
lstWords.Add(string.Join("", lstLetters));
lstLetters.Clear();
}
string strNewString = string.Join(" ", lstWords);
return strNewString;
}

Categories