Regex match up to the end of a standard pattern

Regex match up to the end of a standard pattern - c#

I'm working on an application to manage filenames of downloaded TV Shows. Basically it will search the directory and clean up the filenames, removing things like full stops and replacing them with spaces and getting rid of the descriptions at the end of the filename after the easily recognizable pattern of, for eg., S01E13. (.1080p.BluRay.x264-ROVERS)
What I want to do is to make a regex expression for use in C# to just extract whatever is before the SnnEnn including itself (where n is any whole positive integer).
But, i don't know much regex to get me going
For example, if I had the filename TV.Show.S01E01.1080p.BluRay.x264-ROVERS, the query would only get TV.Show.S01E01, irrespective of how many words are before the pattern, so it could be TV.Show.On.ABC.S01E01 and it would still work.
Thanks for any help :)

Try this
string input = "TV.Show.S01E01.1080p.BluRay.x264-ROVERS";
string pattern = #"(?'pattern'^.*\d\d[A-Z]\d\d)";
string results = Regex.Match(input, pattern).Groups["pattern"].Value;

There is more obvious way without regex:
string GetNameByPattern(string s)
{
const string pattern_length = 6; //SnnEnn
for (int i = 0; i < s.Length - pattern_length; i++)
{
string part = s.SubString(i, pattern_length);
if (part[0] == 'S' && part[3] == 'N') //candidat
if (Char.IsDigit(part[1]) && Char.IsDigit(part[2]) && Char.IsDigit(part[4]) && Char.IsDigit(part[5]))
return s.SubString(0, i + pattern_length);
}
return "";
}

Related

how to find text in a string in c#

I am learning Dotnet c# on my own.
how to find whether a given text exists or not in a string and if exists, how to find count of times the word has got repeated in that string. even if the word is misspelled, how to find it and print that the word is misspelled?
we can do this with collections or linq in c# but here i used string class and used contains method but iam struck after that.
if we can do this with help of linq, how?
because linq works with collections, Right?
you need a list in order to play with linq.
but here we are playing with string(paragraph).
how linq can be used find a word in paragraph?
kindly help.
here is what i have tried so far.
string str = "Education is a ray of light in the darkness. It certainly is a hope for a good life. Eudcation is a basic right of every Human on this Planet. To deny this right is evil. Uneducated youth is the worst thing for Humanity. Above all, the governments of all countries must ensure to spread Education";
for(int i = 0; i < i++)
if (str.Contains("Education") == true)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}

You can make a string a string[] by splitting it by a character/string. Then you can use LINQ:
if(str.Split().Contains("makes"))
{
// note that the default Split without arguments also includes tabs and new-lines
}
If you don't care whether it is a word or just a sub-string, you can use str.Contains("makes") directly.
If you want to compare in a case insensitive way, use the overload of Contains:
if(str.Split().Contains("makes", StringComparer.InvariantCultureIgnoreCase)){}

string str = "money makes many makes things";
var strArray = str.Split(" ");
var count = strArray.Count(x => x == "makes");

the simplest way is to use Split extension to split the string into an array of words.
here is an example :
var words = str.Split(' ');
if(words.Length > 0)
{
foreach(var word in words)
{
if(word.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1)
{
Console.WriteLine("found");
}
else
{
Console.WriteLine("not found");
}
}
}
Now, since you just want the count of number word occurrences, you can use LINQ to do that in a single line like this :
var totalOccurrences = str.Split(' ').Count(x=> x.IndexOf("makes", StringComparison.InvariantCultureIgnoreCase) != -1);
Note that StringComparison.InvariantCultureIgnoreCase is required if you want a case-insensitive comparison.

PigLatin how can I strip punctuation from a string? And Then add it back?

Working on program for class call pig Latin. It works for what I need for class. It ask just to type in a phase to convert. But I notice if I type a sentence with punctuation at the end it will mess up the last word translation. Trying to figure out the best way to fix this. New at programming but I would need away for it to check last character in word to check for punctuations. Remove it before translation and then add it back. Not sure how to do that. Been reading about char.IsPunctuation. Plus not sure what part of my code I would had for that check.
public static string MakePigLatin(string str)
{
string[] words = str.Split(' ');
str = String.Empty;
for (int i = 0; i < words.Length; i++)
{
if (words[i].Length <= 1) continue;
string pigTrans = new String(words[i].ToCharArray());
pigTrans = pigTrans.Substring(1, pigTrans.Length - 1) + pigTrans.Substring(0, 1) + "ay ";
str += pigTrans;
}
return str.Trim();
}

The following should get you strings of letters for converting while passing through any non-letter characters that follow them.
Splitter based on Splitting a string in C#
public static string MakePigLatin(string str) {
MatchCollection matches = Regex.Matches(str, #"([a-zA-Z]*)([^a-zA-Z]*)");
StringBuilder result = new StringBuilder(str.Length * 2);
for (int i = 0; i < matches.Count; ++i) {
string pigTrans = matches[i].Groups[1].Captures[0].Value ?? string.Empty;
if (pigTrans.Length > 1) {
pigTrans = pigTrans.Substring(1) + pigTrans.Substring(0, 1) + "ay";
}
result.Append(pigTrans).Append(matches[i].Groups[2].Captures[0].Value);
}
return result.ToString();
}
The matches variable should contain all the match collections of 2 groups. The first group will be 0 or more letters to translate followed by a second group of 0 or more non-letters to pass through. The StringBuilder should be more memory efficient than concatenating System.String values. I gave it a starting allocation of double the initial string size just to avoid having to double the allocated space. If memory is tight, maybe 1.25 or 1.5 instead of 2 would be better, but you'd probably have to convert it back to int after. I took the length calculation off your Substring call because leaving it out grabs everything to the end of the string already.

sentence capitalizer

I'm woefully attempting a programming assignment. I'm not looking for a "this is how you do this" but more of a "what am I doing wrong?"
I'm attempting to capitalize the start of each sentence from a string input. So for example the string "Hello. my name is john. i like to ride bikes." I would modify the string and return it with capitals for example: "Hello. My name is john. I like to ride bikes." My logic seems a bit flawed and I'm very lost.
What I have so far below. Basically all I'm doing is testing for a punctuation signifying the end of a sentence. And then trying to replace the character. Also testing if it's the at the end of the string as to not create IndexOutOfRange exceptions. Although, that's all I've been getting :(
private string SentenceCapitalizer(string input)
{
for (int i = 0; i < input.Length; i++)
{
if (input[i] == '.' || input[i] == '!' || input[i] == '?')
{
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
}
}
return input;
}
Any help is greatly appreciated. I'm just learning C# so the most basic of help would be of service. I don't know much :P

Instead of
if (!(input[i + 2] >= input.Length))
It should be
if (!(i + 2 >= input.Length))
You are comparing indices, not characters

You are checking if your current index is less than or equal to the length of the string and then attempting to alter an index 2 further along
if (!(input[i] == input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
Should be changed to
if (!((i + 2) >= input.Length))
{
input.Replace(input[i + 2], char.ToUpper(input[i + 2]));
}
This will check that there is a value 2 places after a punctuation mark. Also make use of >= rather than == since you're jumping 2 you might end up going over the length of the array where == still returns false but there is no index.

Strings are immutable, you can't do:
var str = "123";
str.Replace('1', '2');
You have to do:
var str = "123";
str = str.Replace('1', '2');

Ok, others have provided you with some pointers to stop the obvious errors, but I'll try to give you some thoughts on how to best implement this.
It is worth thinking about this as a 3-step process
Tokenize the string into sentences
Ensure that the first character of each token is uppercase
reconstruct the string by joining the tokens back together
(1) I'll leave to your imagination, but the idea is to end up with an array of strings with each element representing a "sentence" according to your requirement
(2) Is pretty much as simple as
// Upercase character 0, and join it to everything from character 1 onwards
var fixedToken = token[0].ToUpper(CultureInfo.CurrentCulture)
+ token.Substring(1);
(3) Is also simple
// reconstruct string by joining all tokens with a space
var reconstructed = String.Join(" ",tokens);

How to extract string at a certain character that is repeated within string?

How can I get "MyLibrary.Resources.Images.Properties" and "Condo.gif" from a "MyLibrary.Resources.Images.Properties.Condo.gif" string.
I also need it to be able to handle something like "MyLibrary.Resources.Images.Properties.legend.House.gif" and return "House.gif" and "MyLibrary.Resources.Images.Properties.legend".
IndexOf LastIndexOf wouldn't work because I need the second to last '.' character.
Thanks in advance!
UPDATE
Thanks for the answers so far but I really need it to be able to handle different namespaces. So really what I'm asking is how to I split on the second to last character in a string?

You can use LINQ to do something like this:
string target = "MyLibrary.Resources.Images.Properties.legend.House.gif";
var elements = target.Split('.');
const int NumberOfFileNameElements = 2;
string fileName = string.Join(
".",
elements.Skip(elements.Length - NumberOfFileNameElements));
string path = string.Join(
".",
elements.Take(elements.Length - NumberOfFileNameElements));
This assumes that the file name part only contains a single . character, so to get it you skip the number of remaining elements.

You can either use a Regex or String.Split with '.' as the separator and return the second-to-last + '.' + last pieces.

You can look for IndexOf("MyLibrary.Resources.Images.Properties."), add that to MyLibrary.Resources.Images.Properties.".Length and then .Substring(..) from that position

If you know exactly what you're looking for, and it's trailing, you could use string.endswith. Something like
if("MyLibrary.Resources.Images.Properties.Condo.gif".EndsWith("Condo.gif"))
If that's not the case check out regular expressions. Then you could do something like
if(Regex.IsMatch("Condo.gif"))
Or a more generic way: split the string on '.' then grab the last two items in the array.

string input = "MyLibrary.Resources.Images.Properties.legend.House.gif";
//if string isn't already validated, make sure there are at least two
//periods here or you'll error out later on.
int index = input.LastIndexOf('.', input.LastIndexOf('.') - 1);
string first = input.Substring(0, index);
string second = input.Substring(index + 1);

Try splitting the string into an array, by separating it by each '.' character.
You will then have something like:
{"MyLibrary", "Resources", "Images", "Properties", "legend", "House", "gif"}
You can then take the last two elements.

Just break down and do it in a char loop:
int NthLastIndexOf(string str, char ch, int n)
{
if (n <= 0) throw new ArgumentException();
for (int idx = str.Length - 1; idx >= 0; --idx)
if (str[idx] == ch && --n == 0)
return idx;
return -1;
}
This is less expensive than trying to coax it using string splitting methods and isn't a whole lot of code.
string s = "1.2.3.4.5";
int idx = NthLastIndexOf(s, '.', 3);
string a = s.Substring(0, idx); // "1.2"
string b = s.Substring(idx + 1); // "3.4.5"

Splitting a string into words in a culture neutral way

I've come up with the method below that aims to split a text of variable length into an array of words for further full text index processing (stop word removal, followed by stemmer). The results seem to be ok but I would like to hear opinions how reliable this implementation would against texts in different languages. Would you recommend using a regex for this instead? Please note that I've opted against using String.Split() because that would require me to pass a list of all known seperators which is exactly what I was trying to avoid when I wrote the function
P.S: I can't use a full blown full text search engine like Lucene.Net for several reasons (Silverlight, Overkill for project scope etc).
public string[] SplitWords(string Text)
{
bool inWord = !Char.IsSeparator(Text[0]) && !Char.IsControl(Text[0]);
var result = new List<string>();
var sbWord = new StringBuilder();
for (int i = 0; i < Text.Length; i++)
{
Char c = Text[i];
// non separator char?
if(!Char.IsSeparator(c) && !Char.IsControl(c))
{
if (!inWord)
{
sbWord = new StringBuilder();
inWord = true;
}
if (!Char.IsPunctuation(c) && !Char.IsSymbol(c))
sbWord.Append(c);
}
// it is a separator or control char
else
{
if (inWord)
{
string word = sbWord.ToString();
if (word.Length > 0)
result.Add(word);
sbWord.Clear();
inWord = false;
}
}
}
return result.ToArray();
}

Since you said in culture neutral way, I really doubt if Regular Expression (word boundary: \b) will do. I have googled a bit and found this. Hope it would be useful.
I am pretty surprised that there is no built-in Java BreakIterator equivalent...

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex match up to the end of a standard pattern - c#

Try this string input = "TV.Show.S01E01.1080p.BluRay.x264-ROVERS"; string pattern = #"(?'pattern'^.*\d\d[A-Z]\d\d)"; string results = Regex.Match(input, pattern).Groups["pattern"].Value;

Related

how to find text in a string in c#

PigLatin how can I strip punctuation from a string? And Then add it back?

sentence capitalizer

How to extract string at a certain character that is repeated within string?

Splitting a string into words in a culture neutral way

Categories

Resources