I'm trying to extract initials from a display name to be used to display their initials.
I'm finding it difficult because the string is one value containing one word or more. How can I achieve this?
Example:
'John Smith' => JS
'Smith, John' => SJ
'John' => J
'Smith' => S
public static SearchDto ToSearchDto(this PersonBasicDto person)
{
return new SearchDto
{
Id = new Guid(person.Id),
Label = person.DisplayName,
Initials = //TODO: GetInitials Code
};
}
I used the following solution: I created a helper method which allowed me to test for multiple cases.
public static string GetInitials(this string name)
{
if (string.IsNullOrWhiteSpace(name))
{
return string.Empty;
}
string[] nameSplit = name.Trim().Split(new string[] { ",", " " }, StringSplitOptions.RemoveEmptyEntries);
var initials = nameSplit[0].Substring(0, 1).ToUpper();
if (nameSplit.Length > 1)
{
initials += nameSplit[nameSplit.Length - 1].Substring(0, 1).ToUpper();
}
return initials;
}
Or just another variation as an extension method, with a small amount of sanity checking
Given
public static class StringExtensions
{
public static string GetInitials(this string value)
=> string.Concat(value
.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries)
.Where(x => x.Length >= 1 && char.IsLetter(x[0]))
.Select(x => char.ToUpper(x[0])));
}
Usage
var list = new List<string>()
{
"James blerg Smith",
"Michael Smith",
"Robert Smith 3rd",
"Maria splutnic Garcia",
"David Smith",
"Maria Rodriguez",
"Mary Smith",
"Maria Hernandez"
};
foreach (var name in list)
Console.WriteLine(name.GetInitials());
Output
JBS
MS
RS
MSG
DS
MR
MS
MH
Full Demo Here
Simple and easy to understand code and handles names which contain first, middle and last name such as "John Smith William".
Test at: https://dotnetfiddle.net/kmaXXE
Console.WriteLine(GetInitials("John Smith")); // JS
Console.WriteLine(GetInitials("Smith, John")); // SJ
Console.WriteLine(GetInitials("John")); // J
Console.WriteLine(GetInitials("Smith")); // S
Console.WriteLine(GetInitials("John Smith William")); // JSW
Console.WriteLine(GetInitials("John H Doe")); // JHD
static string GetInitials(string name)
{
// StringSplitOptions.RemoveEmptyEntries excludes empty spaces returned by the Split method
string[] nameSplit = name.Split(new string[] { "," , " "}, StringSplitOptions.RemoveEmptyEntries);
string initials = "";
foreach (string item in nameSplit)
{
initials += item.Substring(0, 1).ToUpper();
}
return initials;
}
How about the following:
void Main()
{
Console.WriteLine(GetInitials("John Smith"));
Console.WriteLine(GetInitials("Smith, John"));
Console.WriteLine(GetInitials("John"));
Console.WriteLine(GetInitials("Smith"));
}
private string GetInitials(string name)
{
if (string.IsNullOrWhiteSpace(name))
{
return string.Empty;
}
var splitted = name?.Split(' ');
var initials = $"{splitted[0][0]}{(splitted.Length > 1 ? splitted[splitted.Length - 1][0] : (char?)null)}";
return initials;
}
Output:
JS - SJ - J - S
the code below is from here. what the code does is take the first letter of every word from the string and outputs it as capital letters.
static void printInitials(String name)
{
if (name.Length == 0)
return;
// Since touuper() returns int,
// we do typecasting
Console.Write(Char.ToUpper(name[0]));
// Traverse rest of the string and
// print the characters after spaces.
for (int i = 1; i < name.Length - 1; i++)
if (name[i] == ' '&((i + 1)!=name.Length))
Console.Write(" " + Char.ToUpper(name[i + 1]));
}
Related
I would like to write C# code that parses nested parenthesis to array elements, but only on first level. An example is needed for sure:
I want this string:
"(example (to (parsing nested paren) but) (first lvl only))"
tp be parsed into:
["example", "(to (parsing nested paren) but)", "(first lvl only)"]
I was thinking about using regex but can't figure out how to properly use them without implementing this behaviour from scratch.
In the case of malformed inputs I would like to return an empty array, or an array ["error"]
I developed a parser for your example. I also checked some other examples which you can see in the code.
using System;
using System.Collections;
using System.Collections.Generic;
public class Program
{
public static void Main()
{
string str = "(example (to (parsing nested paren) but) (first lvl only))"; // => [example , (to (parsing nested paren) but) , (first lvl only)]
//string str = "(first)(second)(third)"; // => [first , second , third]
//string str = "(first(second)third)"; // => [first , (second) , third]
//string str = "(first(second)(third)fourth)"; // => [first , (second) , (third) , fourth]
//string str = "(first((second)(third))fourth)"; // => [first , ((second)(third)) , fourth]
//string str = "just Text"; // => [ERROR]
//string str = "start with Text (first , second)"; // => [ERROR]
//string str = "(first , second) end with text"; // => [ERROR]
//string str = ""; // => [ERROR]
//string str = "("; // => [ERROR]
//string str = "(first()(second)(third))fourth)"; // => [ERROR]
//string str = "(((extra close pareanthese))))"; // => [ERROR]
var res = Parser.parse(str);
showRes(res);
}
static void showRes(ArrayList res)
{
var strings = res.ToArray();
var theString = string.Join(" , ", strings);
Console.WriteLine("[" + theString + "]");
}
}
public class Parser
{
static Dictionary<TokenType, TokenType> getRules()
{
var rules = new Dictionary<TokenType, TokenType>();
rules.Add(TokenType.OPEN_PARENTHESE, TokenType.START | TokenType.OPEN_PARENTHESE | TokenType.CLOSE_PARENTHESE | TokenType.SIMPLE_TEXT);
rules.Add(TokenType.CLOSE_PARENTHESE, TokenType.SIMPLE_TEXT | TokenType.CLOSE_PARENTHESE);
rules.Add(TokenType.SIMPLE_TEXT, TokenType.SIMPLE_TEXT | TokenType.CLOSE_PARENTHESE | TokenType.OPEN_PARENTHESE);
rules.Add(TokenType.END, TokenType.CLOSE_PARENTHESE);
return rules;
}
static bool isValid(Token prev, Token cur)
{
var rules = Parser.getRules();
return rules.ContainsKey(cur.type) && ((prev.type & rules[cur.type]) == prev.type);
}
public static ArrayList parse(string sourceText)
{
ArrayList result = new ArrayList();
int openParenthesesCount = 0;
Lexer lexer = new Lexer(sourceText);
Token prevToken = lexer.getStartToken();
Token currentToken = lexer.readNextToken();
string tmpText = "";
while (currentToken.type != TokenType.END)
{
if (currentToken.type == TokenType.OPEN_PARENTHESE)
{
openParenthesesCount++;
if (openParenthesesCount > 1)
{
tmpText += currentToken.token;
}
}
else if (currentToken.type == TokenType.CLOSE_PARENTHESE)
{
openParenthesesCount--;
if (openParenthesesCount < 0)
{
return Parser.Error();
}
if (openParenthesesCount > 0)
{
tmpText += currentToken.token;
}
}
else if (currentToken.type == TokenType.SIMPLE_TEXT)
{
tmpText += currentToken.token;
}
if (!Parser.isValid(prevToken, currentToken))
{
return Parser.Error();
}
if (openParenthesesCount == 1 && tmpText.Trim() != "")
{
result.Add(tmpText);
tmpText = "";
}
prevToken = currentToken;
currentToken = lexer.readNextToken();
}
if (openParenthesesCount != 0)
{
return Parser.Error();
}
if (!Parser.isValid(prevToken, currentToken))
{
return Parser.Error();
}
if (tmpText.Trim() != "")
{
result.Add(tmpText);
}
return result;
}
static ArrayList Error()
{
var er = new ArrayList();
er.Add("ERROR");
return er;
}
}
class Lexer
{
string _txt;
int _index;
public Lexer(string text)
{
this._index = 0;
this._txt = text;
}
public Token getStartToken()
{
return new Token(-1, TokenType.START, "");
}
public Token readNextToken()
{
if (this._index >= this._txt.Length)
{
return new Token(-1, TokenType.END, "");
}
Token t = null;
string txt = "";
if (this._txt[this._index] == '(')
{
txt = "(";
t = new Token(this._index, TokenType.OPEN_PARENTHESE, txt);
}
else if (this._txt[this._index] == ')')
{
txt = ")";
t = new Token(this._index, TokenType.CLOSE_PARENTHESE, txt);
}
else
{
txt = this._readText();
t = new Token(this._index, TokenType.SIMPLE_TEXT, txt);
}
this._index += txt.Length;
return t;
}
private string _readText()
{
string txt = "";
int i = this._index;
while (i < this._txt.Length && this._txt[i] != '(' && this._txt[i] != ')')
{
txt = txt + this._txt[i];
i++;
}
return txt;
}
}
class Token
{
public int position
{
get;
private set;
}
public TokenType type
{
get;
private set;
}
public string token
{
get;
private set;
}
public Token(int position, TokenType type, string token)
{
this.position = position;
this.type = type;
this.token = token;
}
}
[Flags]
enum TokenType
{
START = 1,
OPEN_PARENTHESE = 2,
SIMPLE_TEXT = 4,
CLOSE_PARENTHESE = 8,
END = 16
}
well, regex will do the job:
var text = #"(example (to (parsing nested paren) but) (first lvl only))";
var pattern = #"\(([\w\s]+) (\([\w\s]+ \([\w\s]+\) [\w\s]+\)) (\([\w\s]+\))\)*";
try
{
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Match m = r.Match(text);
string group_1 = m.Groups[1].Value; //example
string group_2 = m.Groups[2].Value; //(to (parsing nested paren) but)
string group_3 = m.Groups[3].Value; //(first lvl only)
return new string[]{group_1,group_2,group_3};
}
catch(Exception ex){
return new string[]{"error"};
}
hopefully this helps, tested here in dotnetfiddle
Edit:
this might get you started into building the right expression according to whatever patterns you are falling into and maybe build a recursive function to parse the rest into the desired output :)
RegEx is not recursive. You either count bracket level, or recurse.
An non-recursive parser loop I tested for the example you show is..
string SplitFirstLevel(string s)
{
List<string> result = new List<string>();
int p = 0, level = 0;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == '(')
{
level++;
if (level == 1) p = i + 1;
if (level == 2)
{
result.Add('"' + s.Substring(p, i - p) + '"');
p = i;
}
}
if (s[i] == ')')
if (--level == 0)
result.Add('"' + s.Substring(p, i - p) + '"');
}
return "[" + String.Join(",", result) + "]";
}
Note: after some more testing, I see your specification is unclear. How to delimit orphaned level 1 terms, that is terms without bracketing ?
For example, my parser translates
(example (to (parsing nested paren) but) (first lvl only))
to:
["example ","(to (parsing nested paren) but) ","(first lvl only)"]
and
(example (to (parsing nested paren)) but (first lvl only))
to:
["example ","(to (parsing nested paren)) but ","(first lvl only)"]
In either case, "example" gets a separate term, while "but" is grouped with the first term. In the first example this is logical, it is in the bracketing, but it may be unwanted behaviour in the second case, where "but" should be separated, like "example", which also has no bracketing (?)
Code:
string animals = "cat98dog75";
What i try to achieve :
string a = "cat98";
string b = "dog75";
Question :
How do i split the string using some range delimiter?
example :
animals.split();
I suggest matching with a help of regular expressions:
using System.Text.RegularExpressions;
...
string animals = "cat98dog75";
string[] items = Regex
.Matches(animals, "[a-zA-Z]+[0-9]*")
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
string a = items[0];
string b = items[1];
Concole.Write(string.Join(", ", items));
Outcome:
cat98, dog75
In case you want to split the initial string by equal size chunks:
int size = 5;
string[] items = Enumerable
.Range(0, animals.Length / size + (animals.Length % size > 0 ? 1 : 0))
.Select(index => (index + 1) * size <= animals.Length
? animals.Substring(index * size, size)
: animals.Substring(index * size))
.ToArray();
string a = items[0];
string b = items[1];
This might do the trick for you
string animals = "cat98dog75";
string[] DiffAnimals = Regex.Split(animals, "(?<=[0-9]{2})")
.Where(s => s != String.Empty) //Just to Remove Empty Entries.
.ToArray();
If you want to split the name of the animal and number, try following..
I know its too long....
private static void SplitChars()
{
string animals = "cat98dog75";
Dictionary<string, string> dMyobject = new Dictionary<string, string>();
string sType = "",sCount = "";
bool bLastOneWasNum = false;
foreach (var item in animals.ToCharArray())
{
if (char.IsLetter(item))
{
if (bLastOneWasNum)
{
dMyobject.Add(sType, sCount);
sType = ""; sCount = "";
bLastOneWasNum = false;
}
sType = sType + item;
}
else if (char.IsNumber(item))
{
bLastOneWasNum = true;
sCount = sCount + item;
}
}
dMyobject.Add(sType, sCount);
foreach (var item in dMyobject)
{
Console.WriteLine(item.Key + "- " + item.Value);
}
}
You will get output as
cat - 98
dog - 75
Basically, you are getting type and numbers so if you want to use the count, you don't need to split again...
I have 2 strings. These 2 strings can differ in size. I want to look at these 2 strings finding matching sequences. Once I find a change I want to print that word in Capital and then continue on in my string until I find another change and so on. I'm not sure how I would go about this I tried looking at words as a whole but I'm having issues with that. Basically I will have 2 string something like this string one="This is a new value" and string two= "This This is a new also brand value". I want go though each string from the start and find the matching sequences e.g. "This is" stop at string realise it has changed as string was added change it to upper case and then carry on. Expected output ="THIS this is a new ALSO BRAND value "
Some code I was trying. I don't think this is the right approach.
static void Main(string[] args)
{
string one = "This is a new value";
string two = "This This is a new also brand value";
var coll = two.Split(' ').Select(p => one.Contains(p) ? p : p.ToUpperInvariant());
Console.WriteLine(string.Join(" ", coll));
Console.ReadKey();
}
Is this what you're looking for? The description isn't fantastic, but judging by the answers this seems to be in the same ballpark, and it uses LINQ for less code and complication.
class Program
{
static void Main(string[] args)
{
string one = "This is text one";
string two = "This is string text two not string one";
var coll = two.Split(' ').Select(p => one.Contains(p) ? p : p.ToUpperInvariant());
Console.WriteLine(string.Join(" ", coll)); // This is STRING text TWO NOT STRING one
Console.ReadKey();
}
}
You can break this out to a separate method and pass your variables in as parameters.
You can convert string to char array and compare chars one by one. You can use the following code i guess.
string one = "this is string one";
string two = "this is string one or two";
char[] oneChar = one.ToCharArray();
char[] twoChar = two.ToCharArray();
int index = 0;
List<char> Diff = new List<char>();
if (oneChar.Length > twoChar.Length)
{
foreach (char item in twoChar)
{
if (item != oneChar[index])
Diff.Add(item);
index++;
}
for (int i = index; i < oneChar.Length; i++)
{
Diff.Add(oneChar[i]);
}
}
else if (oneChar.Length < twoChar.Length)
{
foreach (char item in oneChar)
{
if (item != twoChar[index])
Diff.Add(twoChar[index]);
index++;
}
for (int i = index; i < twoChar.Length; i++)
{
Diff.Add(twoChar[i]);
}
}
else//equal length
{
foreach (char item in twoChar)
{
if (item != oneChar[index])
Diff.Add(item);
}
}
Console.WriteLine(Diff.ToArray());//" or two"
Is that what you need? (Updated)
var value1 = "This is a new Value";
var value2 = "This is also a new value";
var separators = new[] { " " };
var value1Split = value1.Split(separators, StringSplitOptions.None);
var value2Split = value2.Split(separators, StringSplitOptions.None);
var result = new List<string>();
var i = 0;
var j = 0;
while (i < value1Split.Length && j < value2Split.Length)
{
if (value1Split[i].Equals(value2Split[j], StringComparison.OrdinalIgnoreCase))
{
result.Add(value2Split[j]);
i++;
j++;
}
else
{
result.Add(value2Split[j].ToUpper());
j++;
}
}
Console.WriteLine(string.Join(" ", result));
Console.ReadKey();
Note that if for value1="This is a new Value" and value2="This is also a new value" output should be "This is ALSO a new value" than for value1="This is text one" and value2="This is string text two not string one" output will be "This is STRING text TWO NOT STRING one", not "This is STRING TEXT TWO NOT STRING ONE" as you mentioned before.
I'm new in c#. and I have some Question...
I have String following this code
string taxNumber = "1222233333445";
I want to get data from This string like that
string a = "1"
string b = "2222"
string c = "33333"
string d = "44"
string e = "5"
Please Tell me about Method for get Data From String.
Thank You Very Much ^^
Use the String.Substring(int index, int length) method
string a = taxNumber.Substring(0, 1);
string b = taxNumber.Substring(1, 4);
// etc
Oh well, the best I can come up with is this:
IEnumerable<string> numbers
= taxNumber.ToCharArray()
.Distinct()
.Select(c => new string(c, taxNumber.Count(t => t == c)));
foreach (string numberGroup in numbers)
{
Console.WriteLine(numberGroup);
}
Outputs:
1
2222
33333
44
5
This can also do , you dont need to fix the no of characters, you can check by changing the no of 1's , 2's etc
string taxNumber = "1222233333445";
string s1 = taxNumber.Substring(taxNumber.IndexOf("1"), ((taxNumber.Length - taxNumber.IndexOf("1")) - (taxNumber.Length - taxNumber.LastIndexOf("1"))) + 1);
string s2 = taxNumber.Substring(taxNumber.IndexOf("2"), ((taxNumber.Length - taxNumber.IndexOf("2")) - (taxNumber.Length - taxNumber.LastIndexOf("2"))) + 1);
string s3 = taxNumber.Substring(taxNumber.IndexOf("3"), ((taxNumber.Length - taxNumber.IndexOf("3")) - (taxNumber.Length - taxNumber.LastIndexOf("3"))) + 1);
You can use Char.IsDigit to identify digits out of string, and may apply further logic as follows:
for (int i=0; i< taxNumber.Length; i++)
{
if (Char.IsDigit(taxNumber[i]))
{
if(taxNumber[i-1]==taxNumber[i])
{
/*Further assign values*/
}
}
Try this Code
string taxNumber = "1222233333445";
char[] aa = taxNumber.ToCharArray();
List<string> finals = new List<string>();
string temp = string.Empty;
for (int i = 0; i < aa.Length; i++)
{
if (i == 0)
{
temp = aa[i].ToString();
}
else
{
if (aa[i].ToString() == aa[i - 1].ToString())
{
temp += aa[i];
}
else
{
if (temp != string.Empty)
{
finals.Add(temp);
temp = aa[i].ToString();
}
}
if (i == aa.Length - 1)
{
if (aa[i].ToString() != aa[i - 1].ToString())
{
temp = aa[i].ToString();
finals.Add(temp);
}
else
{
finals.Add(temp);
}
}
}
}
and check value of finals string list
you may use regex:
string strRegex = #"(1+|2+|3+|4+|5+|6+|7+|8+|9+|0+)";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"1222233333445";
return myRegex.Split(strTargetString);
I have been trying to use a C# Regex unsuccessfully to remove certain strings from a movie name.
Examples of the file names I'm working with are:
EuroTrip (2004) [SD]
Event Horizon (1997) [720]
Fast & Furious (2009) [1080p]
Star Trek (2009) [Unknown]
I'd like to remove anything in square brackets or parenthesis (including the brackets themselves)
So far I'm using:
movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([*\\(\\d{4}\\)])", "");
Which seems to remove the Year and Parenthesis ok, but I just can't figure out how to remove the Square Brackets and content without affecting other parts... I've had miscellaneous results but the closest one has been:
movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([?\\[+A-Z+\\]])", "");
Which left me with:
urorip (2004)
Instead of:
EuroTrip (2004) [SD]
Any whitespace that is left at the ends are ok as I will just perform
movieTitleToFetch = movieTitleToFetch.Trim();
at the end.
Thanks in advance,
Alex
This regex pattern should work ok... maybe needs a bit of tweaking
"[\[\(].+?[\]\)]"
Regex.Replace(movieTitleToFetch, #"[\[\(].+?[\]\)]", "");
This should match anything from either "[" or "(" until the next occurance of "]" or ")"
If that does not work try removing the escape character for the parentheses, like so...
Regex.Replace(movieTitleToFetch, #"[\[(].+?[\])]", "");
#Craigt is pretty much spot on but it's possibly cleaner to ensure that the brackets are matched.
([\[].*?[\]]|[\(].*?[\)])
I'know i'm late on this thread but i wrote a simple algorythm to sanitize the downloaded movies filenames.
This runs these steps:
Removes everything in brackets (if find a year it tries to keep the info)
Removes a list of common used words (720p, bdrip, h264 and so on...)
Assumes that can be languages info in the title and removes them when at the end of remaining string (before special words)
if a year was not found into parenthesis looks at the end of remaining string (as for languages)
Doing this replaces dots and spaces so the title is ready, as example, to be a query for a search api.
Here's the test in XUnit (i used most of italian titles to test it)
using Grappachu.Movideo.Core.Helpers.TitleCleaner;
using SharpTestsEx;
using Xunit;
namespace Grappachu.MoVideo.Test
{
public class TitleCleanerTest
{
[Theory]
[InlineData("Avengers.Confidential.La.Vedova.Nera.E.Punisher.2014.iTALiAN.Bluray.720p.x264 - BG.mkv",
"Avengers Confidential La Vedova Nera E Punisher", 2014)]
[InlineData("Fuck You, Prof! (2013) BDRip 720p HEVC ITA GER AC3 Multi Sub PirateMKV.mkv",
"Fuck You, Prof!", 2013)]
[InlineData("Il Libro della Giungla(2016)(BDrip1080p_H264_AC3 5.1 Ita Eng_Sub Ita Eng)by siste82.avi",
"Il Libro della Giungla", 2016)]
[InlineData("Il primo dei bugiardi (2009) [Mux by Little-Boy]", "Il primo dei bugiardi", 2009)]
[InlineData("Il.Viaggio.Di.Arlo-The.Good.Dinosaur.2015.DTS.ITA.ENG.1080p.BluRay.x264-BLUWORLD",
"il viaggio di arlo", 2015)]
[InlineData("La Mafia Uccide Solo D'estate 2013 .avi",
"La Mafia Uccide Solo D'estate", 2013)]
[InlineData("Ip.Man.3.2015.iTA.AC3.5.1.448.Chi.Aac.BluRay.m1080p.x264.Sub.[scambiofile.info].mkv",
"Ip Man 3", 2015)]
[InlineData("Inferno.2016.BluRay.1080p.AC3.ITA.AC3.ENG.Subs.x264-WGZ.mkv",
"Inferno", 2016)]
[InlineData("Ghostbusters.2016.iTALiAN.BDRiP.EXTENDED.XviD-HDi.mp4",
"Ghostbusters", 2016)]
[InlineData("Transcendence.mkv", "Transcendence", null)]
[InlineData("Being Human (Forsyth, 1994).mkv", "Being Human", 1994)]
public void Clean_should_return_title_and_year_when_possible(string filename, string title, int? year)
{
var res = MovieTitleCleaner.Clean(filename);
res.Title.ToLowerInvariant().Should().Be.EqualTo(title.ToLowerInvariant());
res.Year.Should().Be.EqualTo(year);
}
}
}
and fisrt version of the code
using System;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
namespace Grappachu.Movideo.Core.Helpers.TitleCleaner
{
public class MovieTitleCleanerResult
{
public string Title { get; set; }
public int? Year { get; set; }
public string SubTitle { get; set; }
}
public class MovieTitleCleaner
{
private const string SpecialMarker = "§=§";
private static readonly string[] ReservedWords;
private static readonly string[] SpaceChars;
private static readonly string[] Languages;
static MovieTitleCleaner()
{
ReservedWords = new[]
{
SpecialMarker, "hevc", "bdrip", "Bluray", "x264", "h264", "AC3", "DTS", "480p", "720p", "1080p"
};
var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
var l = cultures.Select(x => x.EnglishName).ToList();
l.AddRange(cultures.Select(x => x.ThreeLetterISOLanguageName));
Languages = l.Distinct().ToArray();
SpaceChars = new[] {".", "_", " "};
}
public static MovieTitleCleanerResult Clean(string filename)
{
var temp = Path.GetFileNameWithoutExtension(filename);
int? maybeYear = null;
// Remove what's inside brackets trying to keep year info.
temp = RemoveBrackets(temp, '{', '}', ref maybeYear);
temp = RemoveBrackets(temp, '[', ']', ref maybeYear);
temp = RemoveBrackets(temp, '(', ')', ref maybeYear);
// Removes special markers (codec, formats, ecc...)
var tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
var title = string.Empty;
for (var i = 0; i < tokens.Length; i++)
{
var tok = tokens[i];
if (ReservedWords.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
{
if (title.Length > 0)
break;
}
else
{
title = string.Join(" ", title, tok).Trim();
}
}
temp = title;
// Remove languages infos when are found before special markers (should not remove "English" if it's inside the title)
tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
for (var i = tokens.Length - 1; i >= 0; i--)
{
var tok = tokens[i];
if (Languages.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
tokens[i] = string.Empty;
else
break;
}
title = string.Join(" ", tokens).Trim();
// If year is not found inside parenthesis try to catch at the end, just after the title
if (!maybeYear.HasValue)
{
var resplit = title.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
var last = resplit.Last();
if (LooksLikeYear(last))
{
maybeYear = int.Parse(last);
title = title.Replace(last, string.Empty).Trim();
}
}
// TODO: review this. when there's one dash separates main title from subtitle
var res = new MovieTitleCleanerResult();
res.Year = maybeYear;
if (title.Count(x => x == '-') == 1)
{
var sp = title.Split('-');
res.Title = sp[0];
res.SubTitle = sp[1];
}
else
{
res.Title = title;
}
return res;
}
private static string RemoveBrackets(string inputString, char openChar, char closeChar, ref int? maybeYear)
{
var str = inputString;
while (str.IndexOf(openChar) > 0 && str.IndexOf(closeChar) > 0)
{
var dataGraph = str.GetBetween(openChar.ToString(), closeChar.ToString());
if (LooksLikeYear(dataGraph))
{
maybeYear = int.Parse(dataGraph);
}
else
{
var parts = dataGraph.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
foreach (var part in parts)
if (LooksLikeYear(part))
{
maybeYear = int.Parse(part);
break;
}
}
str = str.ReplaceBetween(openChar, closeChar, string.Format(" {0} ", SpecialMarker));
}
return str;
}
private static bool LooksLikeYear(string dataRound)
{
return Regex.IsMatch(dataRound, "^(19|20)[0-9][0-9]");
}
}
public static class StringUtils
{
public static string GetBetween(this string src, string a, string b,
StringComparison comparison = StringComparison.Ordinal)
{
var idxStr = src.IndexOf(a, comparison);
var idxEnd = src.IndexOf(b, comparison);
if (idxStr >= 0 && idxEnd > 0)
{
if (idxStr > idxEnd)
Swap(ref idxStr, ref idxEnd);
return src.Substring(idxStr + a.Length, idxEnd - idxStr - a.Length);
}
return src;
}
private static void Swap<T>(ref T idxStr, ref T idxEnd)
{
var temp = idxEnd;
idxEnd = idxStr;
idxStr = temp;
}
public static string ReplaceBetween(this string s, char begin, char end, string replacement = null)
{
var regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
return regex.Replace(s, replacement ?? string.Empty);
}
}
}
This does the trick:
#"(\[[^\]]*\])|(\([^\)]*\))"
It removes anything from "[" to the next "]" and anything from "(" to the next ")".
Can you just use:
string MovieTitle="Star Trek (2009) [Unknown]";
movieTitleToFetch= MovieTitle.IndexOf('(')>MovieTitle.IndexOf('[')?
MovieTitle.Substring(0,MovieTitle.IndexOf('[')):
MovieTitle.Substring(0,MovieTitle.IndexOf('('));
Cant we use this instead:-
if(movieTitleToFetch.Contains("("))
movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));
Above code will surely return you the perfect movie titles for these strings:-
EuroTrip (2004) [SD]
Event Horizon (1997) [720]
Fast & Furious (2009) [1080p]
Star Trek (2009) [Unknown]
if there occurs a case where you will not have year but only type i.e :-
EuroTrip [SD]
Event Horizon [720]
Fast & Furious [1080p]
Star Trek [Unknown]
then use this
if(movieTitleToFetch.Contains("("))
movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));
else if(movieTitleToFetch.Contains("["))
movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("["));
I came up with .+\s(?<year>\(\d{4}\))\s(?<format>\[\w+\]) which matches any of your examples, and contains the year and format as named capture groups to help you replace them.
This pattern translates as:
Any character, one or more repitions
Whitespace
Literal '(' followed by 4 digits followed by literal ')' (year)
Whitespace
Literal '[' followed by alphanumeric, one or more repitions, followed by literal ']' (format)