sentence capitalizer

sentence capitalizer - c#

I'm woefully attempting a programming assignment. I'm not looking for a "this is how you do this" but more of a "what am I doing wrong?"
I'm attempting to capitalize the start of each sentence from a string input. So for example the string "Hello. my name is john. i like to ride bikes." I would modify the string and return it with capitals for example: "Hello. My name is john. I like to ride bikes." My logic seems a bit flawed and I'm very lost.
What I have so far below. Basically all I'm doing is testing for a punctuation signifying the end of a sentence. And then trying to replace the character. Also testing if it's the at the end of the string as to not create IndexOutOfRange exceptions. Although, that's all I've been getting :(
private string SentenceCapitalizer(string input)
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '.' || input[i] == '!' || input[i] == '?')
{
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
}
}
return input;
}
Any help is greatly appreciated. I'm just learning C# so the most basic of help would be of service. I don't know much :P

Instead of
if (!(input[i + 2] >= input.Length))
It should be
if (!(i + 2 >= input.Length))
You are comparing indices, not characters

You are checking if your current index is less than or equal to the length of the string and then attempting to alter an index 2 further along
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
Should be changed to
if (!((i + 2) >= input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
This will check that there is a value 2 places after a punctuation mark. Also make use of >= rather than == since you're jumping 2 you might end up going over the length of the array where == still returns false but there is no index.

Strings are immutable, you can't do:
var str = "123";
str.Replace('1', '2');
You have to do:
var str = "123";
str = str.Replace('1', '2');

Ok, others have provided you with some pointers to stop the obvious errors, but I'll try to give you some thoughts on how to best implement this.
It is worth thinking about this as a 3-step process
Tokenize the string into sentences
Ensure that the first character of each token is uppercase
reconstruct the string by joining the tokens back together
(1) I'll leave to your imagination, but the idea is to end up with an array of strings with each element representing a "sentence" according to your requirement
(2) Is pretty much as simple as
// Upercase character 0, and join it to everything from character 1 onwards
var fixedToken = token[0].ToUpper(CultureInfo.CurrentCulture)
+ token.Substring(1);
(3) Is also simple
// reconstruct string by joining all tokens with a space
var reconstructed = String.Join(" ",tokens);

Related

Regex match up to the end of a standard pattern

I'm working on an application to manage filenames of downloaded TV Shows. Basically it will search the directory and clean up the filenames, removing things like full stops and replacing them with spaces and getting rid of the descriptions at the end of the filename after the easily recognizable pattern of, for eg., S01E13. (.1080p.BluRay.x264-ROVERS)
What I want to do is to make a regex expression for use in C# to just extract whatever is before the SnnEnn including itself (where n is any whole positive integer).
But, i don't know much regex to get me going
For example, if I had the filename TV.Show.S01E01.1080p.BluRay.x264-ROVERS, the query would only get TV.Show.S01E01, irrespective of how many words are before the pattern, so it could be TV.Show.On.ABC.S01E01 and it would still work.
Thanks for any help :)

Try this
string input = "TV.Show.S01E01.1080p.BluRay.x264-ROVERS";
string pattern = #"(?'pattern'^.*\d\d[A-Z]\d\d)";
string results = Regex.Match(input, pattern).Groups["pattern"].Value;

There is more obvious way without regex:
string GetNameByPattern(string s)
{
const string pattern_length = 6; //SnnEnn
for (int i = 0; i < s.Length - pattern_length; i++)
{
string part = s.SubString(i, pattern_length);
if (part[0] == 'S' && part[3] == 'N') //candidat
if (Char.IsDigit(part[1]) && Char.IsDigit(part[2]) && Char.IsDigit(part[4]) && Char.IsDigit(part[5]))
return s.SubString(0, i + pattern_length);
}
return "";
}

find if string has slash in front

I want to find if a string has "/" in front of it, in my code I get indexof of the string and find out if the character before it has anything, which works but how do I find if it actually is forward slash. here is my code:
string test = "/images/";
if (test.IndexOf(#"images/") - 1 == -1)
{
}
EDIT
Some of my strings may have full url and some may be as above and some may not have / at all hence using index of

Do you mean:
if (test.StartsWith("/"))
? (It's not clear what your sample code is trying to achieve.)
Note that "/" is a forward-slash, not a backslash - and you don't need the verbatim string literal in your case, given that the string doesn't contain any backslashes or line breaks.
EDIT: Your question isn't clear, but I suspect you want something like:
int index = test.IndexOf(targetString);
if (index > 0 && test[index - 1] == '/')
{
// There's a leading forward slash. Deal with it appropriately
}

You can use Method StartsWith():
if(test.StartsWith("/"))
{
}

if (test.StartsWith("images") ||
test.IndexOf("/images") > -1 ||
test.IndexOf("\\images") > -1)

Too many good answers :)
Anyway, I meant to say the following:
string str = "/\images\///";
Match matchfirstFwdSlash = Regex.Match(str, "^[\\/]", RegexOptions.IgnoreCase);
if (matchfirstFwdSlash.Success)
{MessageBox .Show ("Success","Success");}
else
{MessageBox .Show ("Oops","Oops");}

//You can find this way
string test = "/Images/";
string a = test.Split('/')[0];
if (a=="")
{
}

Check string for invalid characters? Smartest way?

I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?
Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'
So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'.
Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you

You can use a regular expression for this:
Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
// validation failed
}
To create a list of characters from A-Z or 0-9 you would use a simple loop:
for (char c = 'A'; c <= 'Z'; c++) {
// c or c.ToString() depending on what you need
}
But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).

I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.
StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));
BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
for (int s = 0; str[s] != 0; s++)
{
int c = 0;
while (true)
{
if (chars[c] == 0)
{
return false;
}
else if (str[s] == chars[c])
{
break;
}
else
{
c++;
}
}
}
return true;
}
BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2] = {str[0], 0};
char last[2] = {str[strlen(str) - 1], 0};
if (!StrChr(str, chars))
{
return false;
}
if (excl_first != 0)
{
if (!StrChr(first, chars + excl_first))
{
return false;
}
}
if (excl_last != 0)
{
if (!StrChr(last, chars + excl_last))
{
return false;
}
}
return true;
}

If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same
var pn = "The String To ChecK";
var badStrings = new List<string>()
{
" ","\t","\n","\r"
};
foreach(var badString in badStrings)
{
if(pn.Contains(badString))
{
//Do something
}
}

If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:
var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
break;
}
}
You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.

Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):
Regex r = new Regex(#"^[0-9\.\-\+\*\/ ]+$");

I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.
Even more so when LINQ offers a simpler and more efficient solution than nesting loops:
var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();

How to extract string at a certain character that is repeated within string?

How can I get "MyLibrary.Resources.Images.Properties" and "Condo.gif" from a "MyLibrary.Resources.Images.Properties.Condo.gif" string.
I also need it to be able to handle something like "MyLibrary.Resources.Images.Properties.legend.House.gif" and return "House.gif" and "MyLibrary.Resources.Images.Properties.legend".
IndexOf LastIndexOf wouldn't work because I need the second to last '.' character.
Thanks in advance!
UPDATE
Thanks for the answers so far but I really need it to be able to handle different namespaces. So really what I'm asking is how to I split on the second to last character in a string?

You can use LINQ to do something like this:
string target = "MyLibrary.Resources.Images.Properties.legend.House.gif";
var elements = target.Split('.');
const int NumberOfFileNameElements = 2;
string fileName = string.Join(
".",
elements.Skip(elements.Length - NumberOfFileNameElements));
string path = string.Join(
".",
elements.Take(elements.Length - NumberOfFileNameElements));
This assumes that the file name part only contains a single . character, so to get it you skip the number of remaining elements.

You can either use a Regex or String.Split with '.' as the separator and return the second-to-last + '.' + last pieces.

You can look for IndexOf("MyLibrary.Resources.Images.Properties."), add that to MyLibrary.Resources.Images.Properties.".Length and then .Substring(..) from that position

If you know exactly what you're looking for, and it's trailing, you could use string.endswith. Something like
if("MyLibrary.Resources.Images.Properties.Condo.gif".EndsWith("Condo.gif"))
If that's not the case check out regular expressions. Then you could do something like
if(Regex.IsMatch("Condo.gif"))
Or a more generic way: split the string on '.' then grab the last two items in the array.

string input = "MyLibrary.Resources.Images.Properties.legend.House.gif";
//if string isn't already validated, make sure there are at least two
//periods here or you'll error out later on.
int index = input.LastIndexOf('.', input.LastIndexOf('.') - 1);
string first = input.Substring(0, index);
string second = input.Substring(index + 1);

Try splitting the string into an array, by separating it by each '.' character.
You will then have something like:
{"MyLibrary", "Resources", "Images", "Properties", "legend", "House", "gif"}
You can then take the last two elements.

Just break down and do it in a char loop:
int NthLastIndexOf(string str, char ch, int n)
{
if (n <= 0) throw new ArgumentException();
for (int idx = str.Length - 1; idx >= 0; --idx)
if (str[idx] == ch && --n == 0)
return idx;
return -1;
}
This is less expensive than trying to coax it using string splitting methods and isn't a whole lot of code.
string s = "1.2.3.4.5";
int idx = NthLastIndexOf(s, '.', 3);
string a = s.Substring(0, idx); // "1.2"
string b = s.Substring(idx + 1); // "3.4.5"

c#: regex how to differentiate between two variations of a string

This is tough to explain enough to ask the question, but i'll try:
I have two possibilities of user input:
S01E05 or 0105 (two different input strings)
which both translate to season 01, episode 05
but if they user inputs it backwards E05S01 or 0501, i need to be able to return the same result, Season 01 Episode 05
The control for this would be the user defining the format of the original filename with something like this:
"SssEee" -- uppercase 'S' denoting that the following lowercase 's' belong to Season and uppercase 'E' denoting that the following lowercase 'e' belong to Episode. So if the user decides to define the format as EeeSss then my function should still return the same result since it knows which numbers belong to season or episode.
I don't have anything working quite yet to share, but what I was toying with is a loop that builds the regex pattern. The function, so far, accepts the user format and the file name:
public static int(string userFormat, string fileName)
{
}
the userFormat would be a string and look something like this:
t.t.t.SssEee
or even
t.SssEee
where t is for title, and the rest you know.
The file name might look like this:
battlestar.galactica.S01E05.mkv
Ive got the function that extracts the title from the file name by using the userFormat to build the regex string
public static string GetTitle(string userFormat, string fileName)
{
string pattern = "^";
char positionChar;
string fileTitle;
for (short i = 0; i < userFormat.Length; i++)
{
positionChar = userFormat[i];
//build the regex pattern
if (positionChar == 't')
{
pattern += #"\w+";
}
else if (positionChar == '#')
{
pattern += #"\d+";
}
else if (positionChar == ' ')
{
pattern += #"\s+";
}
else
pattern += positionChar;
}
//pulls out the title with or without the delimiter
Match title = Regex.Match(fileName, pattern, RegexOptions.IgnoreCase);
fileTitle = title.Groups[0].Value;
//remove the delimiter
string[] tempString = fileTitle.Split(#"\/.-<>".ToCharArray());
fileTitle = "";
foreach (string part in tempString)
{
fileTitle += part + " ";
}
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(fileTitle);
}
but im kind of stumped on how to do the extraction of the episode and season numbers. In my head im thinking the process would look something like:
Look through the userFormat string to find the uppercase S
Determine how many lowercase 's' are following the uppercase S
Build the regex expression that describes this
Search through the file name and find that pattern
Extract the number from that pattern
Sounds simple enough but im having a hard time putting it into actions. The complication being the the fact that the format in the filename could be S01E05 or it could be simply 0105. Either scenario would be identified by the user when they define the format.
Ex 1. the file name is battlestar.galactica.S01E05
the user format submitted will be t.t.?ss?ee
Ex 2. the file name is battlestar.galactica.0105
the user format submitted will be t.t.SssEee
Ex 3. the file name is battlestar.galactica.0501
the user format submitted will be t.t.EeeSss
Sorry for the book... the concept is simple, the regex function should be dynamic, allowing the user to define the format of a file name to where my method can generate the expression and use it to extract information from the file name. Something is telling me that this is simpler than it seems... but im at a loss. lol... any suggestions?

So if I read this right, you know where the the Season/Episode number is in the string because the user has told you. That is, you have t.t.<number>.more.stuff. And <number> can take one of these forms:
SssEee
EeeSss
ssee
eess
Or did you say that the user can define how many digits will be used for season and episode? That is, could it be S01E123?
I'm not sure you need a regex for this. Since you know the format, and it appears that things are separated by periods (I assume that there can't be periods in the individual fields), you should be able to use String.Split to extract the pieces, and you know from the user's format where the Season/Episode is in the resulting array. So you now have a string that takes one of the forms above.
You have the user's format definition and the Season/Episode number. You should be able to write a loop that steps through the two strings together and extracts the necessary information, or issues an error.
string UserFormat = "SssEee";
string EpisodeNumber = "0105";
int ifmt = 0;
int iepi = 0;
int season = 0;
int episode = 0;
while (ifmt <= UserFormat.Length && iepi < EpisodeNumber.Length)
{
if ((UserFormat[ifmt] == "S" || UserFormat[ifmt] == "E"))
{
if (EpisodeNumber[iepi] == UserFormat[ifmt])
{
++iepi;
}
else if (!char.IsDigit(EpisodeNumber[iepi]))
{
// Error! Chars didn't match, and it wasn't a digit.
break;
}
++ifmt;
}
else
{
char c = EpisodeNumber[iepi];
if (!char.IsDigit(c))
{
// error. Expected digit.
}
if (UserFormat[ifmt] == 'e')
{
episode = (episode * 10) + (int)c - (int)'0';
}
else if (UserFormat[ifmt] == 's')
{
season = (season * 10) + (int)c - (int)'0';
}
else
{
// user format is broken
break;
}
++iepi;
++ifmt;
}
}
Note that you'll probably have to do some checking to see that the lengths are correct. That is, the code above will accept S01E1 when the user's format is SssEee. There's a bit more error handling that you can add, depending on how worried you are about bad input. But I think this gives you the gist of the idea.
I have to think that's going to be a whole lot easier than trying to dynamically build regular expressions.

After #Sinaesthetic answered my question we can reduce his original post to:
The challenge is to receive any of these inputs:
0105 (if your input is 0105 you assume SxxEyy)
S01E05
E05S01 OR
1x05 (read as season 1 episode 5)
and transform any of these inputs into: S01E05
At this point title and file format are irrelevant, they just get tacked on to the ends.
Based on that the following code will always result in 'Battlestar.Galactica.S01E05.mkv'
static void Main(string[] args)
{
string[] inputs = new string[6] { "E05S01", "S01E05", "0105", "105", "1x05", "1x5" };
foreach (string input in inputs)
{
Console.WriteLine(FormatEpisodeTitle("Battlestar.Galactica", input, "mkv"));
}
Console.ReadLine();
}
private static string FormatEpisodeTitle(string showTitle, string identifier, string fileFormat)
{
//first make identifier upper case
identifier = identifier.ToUpper();
//normalize for SssEee & EeeSee
if (identifier.IndexOf('S') > identifier.IndexOf('E'))
{
identifier = identifier.Substring(identifier.IndexOf('S')) + identifier.Substring(identifier.IndexOf('E'), identifier.IndexOf('S'));
}
//now get rid of S and replace E with x as needed:
identifier = identifier.Replace("S", string.Empty).Replace("E", "X");
//at this point, if there isn't an "X" we need one, as in 105 or 0105
if (identifier.IndexOf('X') == -1)
{
identifier = identifier.Substring(0, identifier.Length - 2) + "X" + identifier.Substring(identifier.Length - 2);
}
//now split by the 'X'
string[] identifiers = identifier.Split('X');
// and put it back together:
identifier = 'S' + identifiers[0].PadLeft(2, '0') + 'E' + identifiers[1].PadLeft(2, '0');
//tack it all together
return showTitle + '.' + identifier + '.' + fileFormat;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

sentence capitalizer - c#

Instead of if (!(input[i + 2] >= input.Length)) It should be if (!(i + 2 >= input.Length)) You are comparing indices, not characters

Strings are immutable, you can't do: var str = "123"; str.Replace('1', '2'); You have to do: var str = "123"; str = str.Replace('1', '2');

Related

Regex match up to the end of a standard pattern

find if string has slash in front

Check string for invalid characters? Smartest way?

How to extract string at a certain character that is repeated within string?

c#: regex how to differentiate between two variations of a string

Categories

Resources