String Replacements for Word Merge - c#

using asp.net 4
we do a lot of Word merges at work. rather than using the complicated conditional statements of Word i want to embed my own syntax. something like:
Dear Mr. { select lastname from users where userid = 7 },
Your invoice for this quarter is: ${ select amount from invoices where userid = 7 }.
......
ideally, i'd like this to get turned into:
string.Format("Dear Mr. {0}, Your invoice for this quarter is: ${1}", sqlEval[0], sqlEval[1]);
any ideas?

Well, I don't really recommend rolling your own solution for this, however I will answer the question as asked.
First, you need to process the text and extract the SQL statements. For that you'll need a simple parser:
/// <summary>Parses the input string and extracts a unique list of all placeholders.</summary>
/// <remarks>
/// This method does not handle escaping of delimiters
/// </remarks>
public static IList<string> Parse(string input)
{
const char placeholderDelimStart = '{';
const char placeholderDelimEnd = '}';
var characters = input.ToCharArray();
var placeHolders = new List<string>();
string currentPlaceHolder = string.Empty;
bool inPlaceHolder = false;
for (int i = 0; i < characters.Length; i++)
{
var currentChar = characters[i];
// Start of a placeholder
if (!inPlaceHolder && currentChar == placeholderDelimStart)
{
currentPlaceHolder = string.Empty;
inPlaceHolder = true;
continue;
}
// Start of a placeholder when we already have one
if (inPlaceHolder && currentChar == placeholderDelimStart)
throw new InvalidOperationException("Unexpected character detected at position " + i);
// We found the end marker while in a placeholder - we're done with this placeholder
if (inPlaceHolder && currentChar == placeholderDelimEnd)
{
if (!placeHolders.Contains(currentPlaceHolder))
placeHolders.Add(currentPlaceHolder);
inPlaceHolder = false;
continue;
}
// End of a placeholder with no matching start
if (!inPlaceHolder && currentChar == placeholderDelimEnd)
throw new InvalidOperationException("Unexpected character detected at position " + i);
if (inPlaceHolder)
currentPlaceHolder += currentChar;
}
return placeHolders;
}
Okay, so that will get you a list of SQL statements extracted from the input text. You'll probably want to tweak it to use properly typed parser exceptions and some input guards (which I elided for clarity).
Now you just need to replace those placeholders with the results of the evaluated SQL:
// Sample input
var input = "Hello Mr. {select firstname from users where userid=7}";
string output = input;
var extractedStatements = Parse(input);
foreach (var statement in extractedStatements)
{
// Execute the SQL statement
var result = Evaluate(statement);
// Update the output with the result of the SQL statement
output = output.Replace("{" + statement + "}", result);
}
This is obviously not the most efficient way to do this, but I think it sufficiently demonstrates the concept without muddying the waters.
You'll need to define the Evaluate(string) method. This will handle executing the SQL.

I just finished building a proprietary solution like this for a law firm here.
I evaluated a product called Windward reports. It's a tad pricy, esp if you need a lot of copies, but for one user it's not bad.
it can pull from XML or SQL data sources (or more if I remember).
Might be worth a look (and no I don't work for 'em, just evaluated their stuff)

You might want to check out the razor engine project on codeplex
http://razorengine.codeplex.com/
Using SQL etc within your template looks like a bad idea. I'd suggest you make a ViewModel for each template.
The Razor thing is really easy to use. Just add a reference, import the namespace, and call the Parse method like so:
(VB guy so excuse syntax!)
MyViewModel myModel = new MyViewModel("Bob",150.00); //set properties
string myTemplate = "Dear Mr. #Model.FirstName, Your invoice for this quarter is: #Model.InvoiceAmount";
string myOutput = Razor.Parse(myTemplate, myModel);
Your string can come from anywhere - I use this with my templates stored in a database, you could equally load it from files or whatever. It's very powerful as a view engine, you can do conditional stuff, loops, etc etc.

i ended up rolling my own solution but thanks. i really dislike if statements. i'll need to refactor them out. here it is:
var mailingMergeString = new MailingMergeString(input);
var output = mailingMergeString.ParseMailingMergeString();
public class MailingMergeString
{
private string _input;
public MailingMergeString(string input)
{
_input = input;
}
public string ParseMailingMergeString()
{
IList<SqlReplaceCommand> sqlCommands = new List<SqlReplaceCommand>();
var i = 0;
const string openBrace = "{";
const string closeBrace = "}";
while (string.IsNullOrWhiteSpace(_input) == false)
{
var sqlReplaceCommand = new SqlReplaceCommand();
var open = _input.IndexOf(openBrace) + 1;
var close = _input.IndexOf(closeBrace);
var length = close != -1 ? close - open : _input.Length;
var newInput = _input.Substring(close + 1);
var nextClose = newInput.Contains(openBrace) ? newInput.IndexOf(openBrace) : newInput.Length;
if (i == 0 && open > 0)
{
sqlReplaceCommand.Text = _input.Substring(0, open - 1);
_input = _input.Substring(open - 1);
}
else
{
sqlReplaceCommand.Command = _input.Substring(open, length);
sqlReplaceCommand.PlaceHolder = openBrace + i + closeBrace;
sqlReplaceCommand.Text = _input.Substring(close + 1, nextClose);
sqlReplaceCommand.NewInput = _input.Substring(close + 1);
_input = newInput.Contains(openBrace) ? sqlReplaceCommand.NewInput : string.Empty;
}
sqlCommands.Add(sqlReplaceCommand);
i++;
}
return sqlCommands.GetParsedString();
}
internal class SqlReplaceCommand
{
public string Command { get; set; }
public string SqlResult { get; set; }
public string PlaceHolder { get; set; }
public string Text { get; set; }
protected internal string NewInput { get; set; }
}
}
internal static class SqlReplaceExtensions
{
public static string GetParsedString(this IEnumerable<MailingMergeString.SqlReplaceCommand> sqlCommands)
{
return sqlCommands.Aggregate("", (current, replaceCommand) => current + (replaceCommand.PlaceHolder + replaceCommand.Text));
}
}

Related

Can't properly rebuild a string with Replacement values from Dictionary

I am trying to build a file using a template. I am processing the file in a while loop line by line. The first section of the file, first 35 lines are header information. The infromation is surrounded by # signs. Take this string for example:
Field InspectionStationID 3 {"PVA TePla #WSM#", "sw#data.tool_context.TOOL_SOFTWARE_VERSION#", "#data.context.TOOL_ENTITY#"}
The expected output should be:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
This header section uses a different mapping than the rest of the file so I wanted to parse the file line by line from top to bottom and use a different logic for each section so that I don't waste time parsing the entire file at once multiple times for different sections.
The logic uses two dictionaries populated from an xml file. Because the file has mutliple tables, I combined them in the two dictionaries like so:
var headerCdataIndexKeyVals = Dictionary<string, int>(){
{"data.tool_context.TOOL_SOFTWARE_VERSION", 1},
{"data.context.TOOL_ENTITY",0}
};
var headerCdataArrayKeyVals = new Dictionary<string, List<string>>();
var tool_contextCdataList = new list <string>{"HM654", "sw0.2.002"};
var contextCdataList = new List<string>{"WSM102"}
headerCdataArrayKeyVals.add("tool_context", tool_contextCdataList);
headerCdataArrayKeyVals.add("context", contextCdataList);
To help me map the values to their respective positions in the string in one go and without having to loop through multiple dictionaries.
I am using the following logic:
public static string FindSubsInDelimetersAndReturn(string str, char openDelimiter, char closeDelimiter, HeaderMapperData mapperData )
{
string newString = string.Empty;
// Stores the indices of
Stack <int> dels = new Stack <int>();
for (int i = 0; i < str.Length; i++)
{
var let = str[i];
// If opening delimeter
// is encountered
if (str[i] == openDelimiter && dels.Count == 0)
{
dels.Push(i);
}
// If closing delimeter
// is encountered
else if (str[i] == closeDelimiter && dels.Count > 0)
{
// Extract the position
// of opening delimeter
int pos = dels.Peek();
dels.Pop();
// Length of substring
int len = i - 1 - pos;
// Extract the substring
string headerSubstring = str.Substring(pos + 1, len);
bool hasKey = mapperData.HeaderCdataIndexKeyVals.TryGetValue(headerSubstring.ToUpper(), out int headerCdataIndex);
string[] headerSubstringSplit = headerSubstring.Split('.');
string headerCDataVal = string.Empty;
if (hasKey)
{
if (headerSubstring.Contains("CONTAINER.CONTEXT", StringComparison.OrdinalIgnoreCase))
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper() + '.' + headerSubstringSplit[2].ToUpper()][headerCdataIndex];
//mapperData.HeaderCdataArrayKeyVals[]
}
else
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper()][headerCdataIndex];
}
string strToReplace = openDelimiter + headerSubstring + closeDelimiter;
string sub = str.Remove(i + 1);
sub = sub.Replace(strToReplace, headerCDataVal);
newString += sub;
}
else if (headerSubstring == "WSM" && closeDelimiter == '#')
{
string sub = str.Remove(len + 1);
newString += sub.Replace(openDelimiter + headerSubstring + closeDelimiter, "");
}
else
{
newString += let;
}
}
}
return newString;
}
}
But my output turns out to be:
"\tFie\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw0.2.002\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw#data.tool_context.TOOL_SOFTWARE_VERSION#\", \"WSM102"
Can someone help understand why this is happening and how I can go about correcting it so I get the output:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
Am i even trying to solve this the right way or is there a better cleaner way to do it? Btw if the key is not in the dictionary I replace it with empty string

How remove some special words from a string content?

I have some strings containing code for emoji icons, like :grinning:, :kissing_heart:, or :bouquet:. I'd like to process them to remove the emoji codes.
For example, given:
Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
I want to get this:
Hello , how are you? Are you fine?
I know I can use this code:
richTextBox2.Text = richTextBox1.Text.Replace(":kissing_heart:", "").Replace(":bouquet:", "").Replace(":grinning:", "").ToString();
However, there are 856 different emoji icons I have to remove (which, using this method, would take 856 calls to Replace()). Is there any other way to accomplish this?
You can use Regex to match the word between :anything:. Using Replace with function you can make other validation.
string pattern = #":(.*?):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, (m) =>
{
if (m.ToString().Split(' ').Count() > 1) // more than 1 word and other validations that will help preventing parsing the user text
{
return m.ToString();
}
return String.Empty;
}); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
If you don't want to use Replace that make use of a lambda expression, you can use \w, as #yorye-nathan mentioned, to match only words.
string pattern = #":(\w*):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, String.Empty); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
string Text = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
i would solve it that way
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
Emoj.ForEach(x => Text = Text.Replace(x, string.Empty));
UPDATE - refering to Detail's Comment
Another approach: replace only existing Emojs
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
var Matches = Regex.Matches(Text, #":(\w*):").Cast<Match>().Select(x => x.Value);
Emoj.Intersect(Matches).ToList().ForEach(x => Text = Text.Replace(x, string.Empty));
But i'm not sure if it's that big difference for such short chat-strings and it's more important to have code that's easy to read/maintain. OP's question was about reducing redundancy Text.Replace().Text.Replace() and not about the most efficient solution.
I would use a combination of some of the techniques already suggested. Firstly, I'd store the 800+ emoji strings in a database and then load them up at runtime. Use a HashSet to store these in memory, so that we have a O(1) lookup time (very fast). Use Regex to pull out all potential pattern matches from the input and then compare each to our hashed emoji, removing the valid ones and leaving any non-emoji patterns the user has entered themselves...
public class Program
{
//hashset for in memory representation of emoji,
//lookups are O(1), so very fast
private HashSet<string> _emoji = null;
public Program(IEnumerable<string> emojiFromDb)
{
//load emoji from datastore (db/file,etc)
//into memory at startup
_emoji = new HashSet<string>(emojiFromDb);
}
public string RemoveEmoji(string input)
{
//pattern to search for
string pattern = #":(\w*):";
string output = input;
//use regex to find all potential patterns in the input
MatchCollection matches = Regex.Matches(input, pattern);
//only do this if we actually find the
//pattern in the input string...
if (matches.Count > 0)
{
//refine this to a distinct list of unique patterns
IEnumerable<string> distinct =
matches.Cast<Match>().Select(m => m.Value).Distinct();
//then check each one against the hashset, only removing
//registered emoji. This allows non-emoji versions
//of the pattern to survive...
foreach (string match in distinct)
if (_emoji.Contains(match))
output = output.Replace(match, string.Empty);
}
return output;
}
}
public class MainClass
{
static void Main(string[] args)
{
var program = new Program(new string[] { ":grinning:", ":kissing_heart:", ":bouquet:" });
string output = program.RemoveEmoji("Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:");
Console.WriteLine(output);
}
}
Which results in:
Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:,
but valid :nonetheless:
You do not have to replace all 856 emoji's. You only have to replace those that appear in the string. So have a look at:
Finding a substring using C# with a twist
Basically you extract all tokens ie the strings between : and : and then replace those with string.Empty()
If you are concerned that the search will return strings that are not emojis such as :some other text: then you could have a hash table lookup to make sure that replacing said found token is appropriate to do.
Finally got around to write something up. I'm combining a couple previously mentioned ideas, with the fact we should only loop over the string once. Based on those requirement, this sound like the perfect job for Linq.
You should probably cache the HashSet. Other than that, this has O(n) performance and only goes over the list once. Would be interesting to benchmark, but this could very well be the most efficient solution.
The approach is pretty straight forwards.
First load all Emoij in a HashSet so we can quickly look them up.
Split the string with input.Split(':') at the :.
Decide if we keep the current element.
If the last element was a match, keep the current element.
If the last element was no match, check if the current element matches.
If it does, ignore it. (This effectively removes the substring from the output).
If it doesn't, append : back and keep it.
Rebuild our string with a StringBuilder.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
static class Program
{
static void Main(string[] args)
{
ISet<string> emojiList = new HashSet<string>(new[] { "kissing_heart", "bouquet", "grinning" });
Console.WriteLine("Hello:grinning: , ho:w: a::re you?:kissing_heart:kissing_heart: Are you fine?:bouquet:".RemoveEmoji(':', emojiList));
Console.ReadLine();
}
public static string RemoveEmoji(this string input, char delimiter, ISet<string> emojiList)
{
StringBuilder sb = new StringBuilder();
input.Split(delimiter).Aggregate(true, (prev, curr) =>
{
if (prev)
{
sb.Append(curr);
return false;
}
if (emojiList.Contains(curr))
{
return true;
}
sb.Append(delimiter);
sb.Append(curr);
return false;
});
return sb.ToString();
}
}
}
Edit: I did something cool using the Rx library, but then realized Aggregate is the IEnumerable counterpart of Scan in Rx, thus simplifying the code even more.
If efficiency is a concern and to avoid processing "false positives", consider rewriting the string using a StringBuilder while skipping the special emoji tokens:
static HashSet<string> emojis = new HashSet<string>()
{
"grinning",
"kissing_heart",
"bouquet"
};
static string RemoveEmojis(string input)
{
StringBuilder sb = new StringBuilder();
int length = input.Length;
int startIndex = 0;
int colonIndex = input.IndexOf(':');
while (colonIndex >= 0 && startIndex < length)
{
//Keep normal text
int substringLength = colonIndex - startIndex;
if (substringLength > 0)
sb.Append(input.Substring(startIndex, substringLength));
//Advance the feed and get the next colon
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
if (colonIndex < 0) //No more colons, so no more emojis
{
//Don't forget that first colon we found
sb.Append(':');
//Add the rest of the text
sb.Append(input.Substring(startIndex));
break;
}
else //Possible emoji, let's check
{
string token = input.Substring(startIndex, colonIndex - startIndex);
if (emojis.Contains(token)) //It's a match, so we skip this text
{
//Advance the feed
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
}
else //No match, so we keep the normal text
{
//Don't forget the colon
sb.Append(':');
//Instead of doing another substring next loop, let's just use the one we already have
sb.Append(token);
startIndex = colonIndex;
}
}
}
return sb.ToString();
}
static void Main(string[] args)
{
List<string> inputs = new List<string>()
{
"Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:",
"Tricky test:123:grinning:",
"Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:"
};
foreach (string input in inputs)
{
Console.WriteLine("In <- " + input);
Console.WriteLine("Out -> " + RemoveEmojis(input));
Console.WriteLine();
}
Console.WriteLine("\r\n\r\nPress enter to exit...");
Console.ReadLine();
}
Outputs:
In <- Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
Out -> Hello , how are you? Are you fine?
In <- Tricky test:123:grinning:
Out -> Tricky test:123
In <- Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:
Out -> Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:, but valid :nonetheless:
Use this code I put up below I think using this function your problem will be solved.
string s = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
string rmv = ""; string remove = "";
int i = 0; int k = 0;
A:
rmv = "";
for (i = k; i < s.Length; i++)
{
if (Convert.ToString(s[i]) == ":")
{
for (int j = i + 1; j < s.Length; j++)
{
if (Convert.ToString(s[j]) != ":")
{
rmv += s[j];
}
else
{
remove += rmv + ",";
i = j;
k = j + 1;
goto A;
}
}
}
}
string[] str = remove.Split(',');
for (int x = 0; x < str.Length-1; x++)
{
s = s.Replace(Convert.ToString(":" + str[x] + ":"), "");
}
Console.WriteLine(s);
Console.ReadKey();
I'd use extension method like this:
public static class Helper
{
public static string MyReplace(this string dirty, char separator)
{
string newText = "";
bool replace = false;
for (int i = 0; i < dirty.Length; i++)
{
if(dirty[i] == separator) { replace = !replace ; continue;}
if(replace ) continue;
newText += dirty[i];
}
return newText;
}
}
Usage:
richTextBox2.Text = richTextBox2.Text.MyReplace(':');
This method show be better in terms of performance compare to one with Regex
I would split the text with the ':' and then build the string excluding the found emoji names.
const char marker = ':';
var textSections = text.Split(marker);
var emojiRemovedText = string.Empty;
var notMatchedCount = 0;
textSections.ToList().ForEach(section =>
{
if (emojiNames.Contains(section))
{
notMatchedCount = 0;
}
else
{
if (notMatchedCount++ > 0)
{
emojiRemovedText += marker.ToString();
}
emojiRemovedText += section;
}
});

compare the characters in two strings

In C#, how do I compare the characters in two strings.
For example, let's say I have these two strings
"bc3231dsc" and "bc3462dsc"
How do I programically figure out the the strings
both start with "bc3" and end with "dsc"?
So the given would be two variables:
var1 = "bc3231dsc";
var2 = "bc3462dsc";
After comparing each characters from var1 to var2, I would want the output to be:
leftMatch = "bc3";
center1 = "231";
center2 = "462";
rightMatch = "dsc";
Conditions:
1. The strings will always be a length of 9 character.
2. The strings are not case sensitive.
The string class has 2 methods (StartsWith and Endwith) that you can use.
After reading your question and the already given answers i think there are some constraints are missing, which are maybe obvious to you, but not to the community. But maybe we can do a little guess work:
You'll have a bunch of string pairs that should be compared.
The two strings in each pair are of the same length or you are only interested by comparing the characters read simultaneously from left to right.
Get some kind of enumeration that tells me where each block starts and how long it is.
Due to the fact, that a string is only a enumeration of chars you could use LINQ here to get an idea of the matching characters like this:
private IEnumerable<bool> CommonChars(string first, string second)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var charsToCompare = first.Zip(second, (LeftChar, RightChar) => new { LeftChar, RightChar });
var matchingChars = charsToCompare.Select(pair => pair.LeftChar == pair.RightChar);
return matchingChars;
}
With this we can proceed and now find out how long each block of consecutive true and false flags are with this method:
private IEnumerable<Tuple<int, int>> Pack(IEnumerable<bool> source)
{
if (source == null)
throw new ArgumentNullException("source");
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
bool current = iterator.Current;
int index = 0;
int length = 1;
while (iterator.MoveNext())
{
if(current != iterator.Current)
{
yield return Tuple.Create(index, length);
index += length;
length = 0;
}
current = iterator.Current;
length++;
}
yield return Tuple.Create(index, length);
}
}
Currently i don't know if there is an already existing LINQ function that provides the same functionality. As far as i have already read it should be possible with SelectMany() (cause in theory you can accomplish any LINQ task with this method), but as an adhoc implementation the above was easier (for me).
These functions could then be used in a way something like this:
var firstString = "bc3231dsc";
var secondString = "bc3462dsc";
var commonChars = CommonChars(firstString, secondString);
var packs = Pack(commonChars);
foreach (var item in packs)
{
Console.WriteLine("Left side: " + firstString.Substring(item.Item1, item.Item2));
Console.WriteLine("Right side: " + secondString.Substring(item.Item1, item.Item2));
Console.WriteLine();
}
Which would you then give this output:
Left side: bc3
Right side: bc3
Left side: 231
Right side: 462
Left side: dsc
Right side: dsc
The biggest drawback is in someway the usage of Tuple cause it leads to the ugly property names Item1 and Item2 which are far away from being instantly readable. But if it is really wanted you could introduce your own simple class holding two integers and has some rock-solid property names. Also currently the information is lost about if each block is shared by both strings or if they are different. But once again it should be fairly simply to get this information also into the tuple or your own class.
static void Main(string[] args)
{
string test1 = "bc3231dsc";
string tes2 = "bc3462dsc";
string firstmatch = GetMatch(test1, tes2, false);
string lasttmatch = GetMatch(test1, tes2, true);
string center1 = test1.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
string center2 = test2.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
}
public static string GetMatch(string fist, string second, bool isReverse)
{
if (isReverse)
{
fist = ReverseString(fist);
second = ReverseString(second);
}
StringBuilder builder = new StringBuilder();
char[] ar1 = fist.ToArray();
for (int i = 0; i < ar1.Length; i++)
{
if (fist.Length > i + 1 && ar1[i].Equals(second[i]))
{
builder.Append(ar1[i]);
}
else
{
break;
}
}
if (isReverse)
{
return ReverseString(builder.ToString());
}
return builder.ToString();
}
public static string ReverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
Pseudo code of what you need..
int stringpos = 0
string resultstart = ""
while not end of string (either of the two)
{
if string1.substr(stringpos) == string1.substr(stringpos)
resultstart =resultstart + string1.substr(stringpos)
else
exit while
}
resultstart has you start string.. you can do the same going backwards...
Another solution you can use is Regular Expressions.
Regex re = new Regex("^bc3.*?dsc$");
String first = "bc3231dsc";
if(re.IsMatch(first)) {
//Act accordingly...
}
This gives you more flexibility when matching. The pattern above matches any string that starts in bc3 and ends in dsc with anything between except a linefeed. By changing .*? to \d, you could specify that you only want digits between the two fields. From there, the possibilities are endless.
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
List<string> common_str = commonStrings(s1,s2);
foreach ( var s in common_str)
Console.WriteLine(s);
}
static public List<string> commonStrings(string s1, string s2){
int len = s1.Length;
char [] match_chars = new char[len];
for(var i = 0; i < len ; ++i)
match_chars[i] = (Char.ToLower(s1[i])==Char.ToLower(s2[i]))? '#' : '_';
string pat = new String(match_chars);
Regex regex = new Regex("(#+)", RegexOptions.Compiled);
List<string> result = new List<string>();
foreach (Match match in regex.Matches(pat))
result.Add(s1.Substring(match.Index, match.Length));
return result;
}
}
for UPDATE CONDITION
using System;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
int len = 9;//s1.Length;//cond.1)
int l_pos = 0;
int r_pos = len;
for(int i=0;i<len && Char.ToLower(s1[i])==Char.ToLower(s2[i]);++i){
++l_pos;
}
for(int i=len-1;i>0 && Char.ToLower(s1[i])==Char.ToLower(s2[i]);--i){
--r_pos;
}
string leftMatch = s1.Substring(0,l_pos);
string center1 = s1.Substring(l_pos, r_pos - l_pos);
string center2 = s2.Substring(l_pos, r_pos - l_pos);
string rightMatch = s1.Substring(r_pos);
Console.Write(
"leftMatch = \"{0}\"\n" +
"center1 = \"{1}\"\n" +
"center2 = \"{2}\"\n" +
"rightMatch = \"{3}\"\n",leftMatch, center1, center2, rightMatch);
}
}

C# Regex for Movie Filename

I have been trying to use a C# Regex unsuccessfully to remove certain strings from a movie name.
Examples of the file names I'm working with are:
EuroTrip (2004) [SD]
Event Horizon (1997) [720]
Fast & Furious (2009) [1080p]
Star Trek (2009) [Unknown]
I'd like to remove anything in square brackets or parenthesis (including the brackets themselves)
So far I'm using:
movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([*\\(\\d{4}\\)])", "");
Which seems to remove the Year and Parenthesis ok, but I just can't figure out how to remove the Square Brackets and content without affecting other parts... I've had miscellaneous results but the closest one has been:
movieTitleToFetch = Regex.Replace(movieTitleToFetch, "([?\\[+A-Z+\\]])", "");
Which left me with:
urorip (2004)
Instead of:
EuroTrip (2004) [SD]
Any whitespace that is left at the ends are ok as I will just perform
movieTitleToFetch = movieTitleToFetch.Trim();
at the end.
Thanks in advance,
Alex
This regex pattern should work ok... maybe needs a bit of tweaking
"[\[\(].+?[\]\)]"
Regex.Replace(movieTitleToFetch, #"[\[\(].+?[\]\)]", "");
This should match anything from either "[" or "(" until the next occurance of "]" or ")"
If that does not work try removing the escape character for the parentheses, like so...
Regex.Replace(movieTitleToFetch, #"[\[(].+?[\])]", "");
#Craigt is pretty much spot on but it's possibly cleaner to ensure that the brackets are matched.
([\[].*?[\]]|[\(].*?[\)])
I'know i'm late on this thread but i wrote a simple algorythm to sanitize the downloaded movies filenames.
This runs these steps:
Removes everything in brackets (if find a year it tries to keep the info)
Removes a list of common used words (720p, bdrip, h264 and so on...)
Assumes that can be languages info in the title and removes them when at the end of remaining string (before special words)
if a year was not found into parenthesis looks at the end of remaining string (as for languages)
Doing this replaces dots and spaces so the title is ready, as example, to be a query for a search api.
Here's the test in XUnit (i used most of italian titles to test it)
using Grappachu.Movideo.Core.Helpers.TitleCleaner;
using SharpTestsEx;
using Xunit;
namespace Grappachu.MoVideo.Test
{
public class TitleCleanerTest
{
[Theory]
[InlineData("Avengers.Confidential.La.Vedova.Nera.E.Punisher.2014.iTALiAN.Bluray.720p.x264 - BG.mkv",
"Avengers Confidential La Vedova Nera E Punisher", 2014)]
[InlineData("Fuck You, Prof! (2013) BDRip 720p HEVC ITA GER AC3 Multi Sub PirateMKV.mkv",
"Fuck You, Prof!", 2013)]
[InlineData("Il Libro della Giungla(2016)(BDrip1080p_H264_AC3 5.1 Ita Eng_Sub Ita Eng)by siste82.avi",
"Il Libro della Giungla", 2016)]
[InlineData("Il primo dei bugiardi (2009) [Mux by Little-Boy]", "Il primo dei bugiardi", 2009)]
[InlineData("Il.Viaggio.Di.Arlo-The.Good.Dinosaur.2015.DTS.ITA.ENG.1080p.BluRay.x264-BLUWORLD",
"il viaggio di arlo", 2015)]
[InlineData("La Mafia Uccide Solo D'estate 2013 .avi",
"La Mafia Uccide Solo D'estate", 2013)]
[InlineData("Ip.Man.3.2015.iTA.AC3.5.1.448.Chi.Aac.BluRay.m1080p.x264.Sub.[scambiofile.info].mkv",
"Ip Man 3", 2015)]
[InlineData("Inferno.2016.BluRay.1080p.AC3.ITA.AC3.ENG.Subs.x264-WGZ.mkv",
"Inferno", 2016)]
[InlineData("Ghostbusters.2016.iTALiAN.BDRiP.EXTENDED.XviD-HDi.mp4",
"Ghostbusters", 2016)]
[InlineData("Transcendence.mkv", "Transcendence", null)]
[InlineData("Being Human (Forsyth, 1994).mkv", "Being Human", 1994)]
public void Clean_should_return_title_and_year_when_possible(string filename, string title, int? year)
{
var res = MovieTitleCleaner.Clean(filename);
res.Title.ToLowerInvariant().Should().Be.EqualTo(title.ToLowerInvariant());
res.Year.Should().Be.EqualTo(year);
}
}
}
and fisrt version of the code
using System;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;
namespace Grappachu.Movideo.Core.Helpers.TitleCleaner
{
public class MovieTitleCleanerResult
{
public string Title { get; set; }
public int? Year { get; set; }
public string SubTitle { get; set; }
}
public class MovieTitleCleaner
{
private const string SpecialMarker = "§=§";
private static readonly string[] ReservedWords;
private static readonly string[] SpaceChars;
private static readonly string[] Languages;
static MovieTitleCleaner()
{
ReservedWords = new[]
{
SpecialMarker, "hevc", "bdrip", "Bluray", "x264", "h264", "AC3", "DTS", "480p", "720p", "1080p"
};
var cultures = CultureInfo.GetCultures(CultureTypes.AllCultures);
var l = cultures.Select(x => x.EnglishName).ToList();
l.AddRange(cultures.Select(x => x.ThreeLetterISOLanguageName));
Languages = l.Distinct().ToArray();
SpaceChars = new[] {".", "_", " "};
}
public static MovieTitleCleanerResult Clean(string filename)
{
var temp = Path.GetFileNameWithoutExtension(filename);
int? maybeYear = null;
// Remove what's inside brackets trying to keep year info.
temp = RemoveBrackets(temp, '{', '}', ref maybeYear);
temp = RemoveBrackets(temp, '[', ']', ref maybeYear);
temp = RemoveBrackets(temp, '(', ')', ref maybeYear);
// Removes special markers (codec, formats, ecc...)
var tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
var title = string.Empty;
for (var i = 0; i < tokens.Length; i++)
{
var tok = tokens[i];
if (ReservedWords.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
{
if (title.Length > 0)
break;
}
else
{
title = string.Join(" ", title, tok).Trim();
}
}
temp = title;
// Remove languages infos when are found before special markers (should not remove "English" if it's inside the title)
tokens = temp.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
for (var i = tokens.Length - 1; i >= 0; i--)
{
var tok = tokens[i];
if (Languages.Any(x => string.Equals(x, tok, StringComparison.OrdinalIgnoreCase)))
tokens[i] = string.Empty;
else
break;
}
title = string.Join(" ", tokens).Trim();
// If year is not found inside parenthesis try to catch at the end, just after the title
if (!maybeYear.HasValue)
{
var resplit = title.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
var last = resplit.Last();
if (LooksLikeYear(last))
{
maybeYear = int.Parse(last);
title = title.Replace(last, string.Empty).Trim();
}
}
// TODO: review this. when there's one dash separates main title from subtitle
var res = new MovieTitleCleanerResult();
res.Year = maybeYear;
if (title.Count(x => x == '-') == 1)
{
var sp = title.Split('-');
res.Title = sp[0];
res.SubTitle = sp[1];
}
else
{
res.Title = title;
}
return res;
}
private static string RemoveBrackets(string inputString, char openChar, char closeChar, ref int? maybeYear)
{
var str = inputString;
while (str.IndexOf(openChar) > 0 && str.IndexOf(closeChar) > 0)
{
var dataGraph = str.GetBetween(openChar.ToString(), closeChar.ToString());
if (LooksLikeYear(dataGraph))
{
maybeYear = int.Parse(dataGraph);
}
else
{
var parts = dataGraph.Split(SpaceChars, StringSplitOptions.RemoveEmptyEntries);
foreach (var part in parts)
if (LooksLikeYear(part))
{
maybeYear = int.Parse(part);
break;
}
}
str = str.ReplaceBetween(openChar, closeChar, string.Format(" {0} ", SpecialMarker));
}
return str;
}
private static bool LooksLikeYear(string dataRound)
{
return Regex.IsMatch(dataRound, "^(19|20)[0-9][0-9]");
}
}
public static class StringUtils
{
public static string GetBetween(this string src, string a, string b,
StringComparison comparison = StringComparison.Ordinal)
{
var idxStr = src.IndexOf(a, comparison);
var idxEnd = src.IndexOf(b, comparison);
if (idxStr >= 0 && idxEnd > 0)
{
if (idxStr > idxEnd)
Swap(ref idxStr, ref idxEnd);
return src.Substring(idxStr + a.Length, idxEnd - idxStr - a.Length);
}
return src;
}
private static void Swap<T>(ref T idxStr, ref T idxEnd)
{
var temp = idxEnd;
idxEnd = idxStr;
idxStr = temp;
}
public static string ReplaceBetween(this string s, char begin, char end, string replacement = null)
{
var regex = new Regex(string.Format("\\{0}.*?\\{1}", begin, end));
return regex.Replace(s, replacement ?? string.Empty);
}
}
}
This does the trick:
#"(\[[^\]]*\])|(\([^\)]*\))"
It removes anything from "[" to the next "]" and anything from "(" to the next ")".
Can you just use:
string MovieTitle="Star Trek (2009) [Unknown]";
movieTitleToFetch= MovieTitle.IndexOf('(')>MovieTitle.IndexOf('[')?
MovieTitle.Substring(0,MovieTitle.IndexOf('[')):
MovieTitle.Substring(0,MovieTitle.IndexOf('('));
Cant we use this instead:-
if(movieTitleToFetch.Contains("("))
movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));
Above code will surely return you the perfect movie titles for these strings:-
EuroTrip (2004) [SD]
Event Horizon (1997) [720]
Fast & Furious (2009) [1080p]
Star Trek (2009) [Unknown]
if there occurs a case where you will not have year but only type i.e :-
EuroTrip [SD]
Event Horizon [720]
Fast & Furious [1080p]
Star Trek [Unknown]
then use this
if(movieTitleToFetch.Contains("("))
movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("("));
else if(movieTitleToFetch.Contains("["))
movieTitleToFetch=movieTitleToFetch.Substring(0,movieTitleToFetch.IndexOf("["));
I came up with .+\s(?<year>\(\d{4}\))\s(?<format>\[\w+\]) which matches any of your examples, and contains the year and format as named capture groups to help you replace them.
This pattern translates as:
Any character, one or more repitions
Whitespace
Literal '(' followed by 4 digits followed by literal ')' (year)
Whitespace
Literal '[' followed by alphanumeric, one or more repitions, followed by literal ']' (format)

Please critique my class

I've taken a few school classes along time ago on and to be honest i never really understood the concept of classes. I recently "got back on the horse" and have been trying to find some real world application for creating a class.
you may have seen that I'm trying to parse a lot of family tree data that is in an very old and antiquated format called gedcom
I created a Gedcom Reader class to read in the file , process it and make it available as two lists that contain the data that i found necessary to use
More importantly to me is i created a class to do it so I would very much like to get the experts here to tell me what i did right and what i could have done better ( I wont say wrong because the thing works and that's good enough for me)
Class:
using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
namespace GedcomReader
{
class Gedcom
{
private string GedcomText = "";
public struct INDI
{
public string ID;
public string Name;
public string Sex;
public string BDay;
public bool Dead;
}
public struct FAM
{
public string FamID;
public string Type;
public string IndiID;
}
public List<INDI> Individuals = new List<INDI>();
public List<FAM> Families = new List<FAM>();
public Gedcom(string fileName)
{
using (StreamReader SR = new StreamReader(fileName))
{
GedcomText = SR.ReadToEnd();
}
ReadGedcom();
}
private void ReadGedcom()
{
string[] Nodes = GedcomText.Replace("0 #", "\u0646").Split('\u0646');
foreach (string Node in Nodes)
{
string[] SubNode = Node.Replace("\r\n", "\r").Split('\r');
if (SubNode[0].Contains("INDI"))
{
Individuals.Add(ExtractINDI(SubNode));
}
else if (SubNode[0].Contains("FAM"))
{
Families.Add(ExtractFAM(SubNode));
}
}
}
private FAM ExtractFAM(string[] Node)
{
string sFID = Node[0].Replace("# FAM", "");
string sID = "";
string sType = "";
foreach (string Line in Node)
{
// If node is HUSB
if (Line.Contains("1 HUSB "))
{
sType = "PAR";
sID = Line.Replace("1 HUSB ", "").Replace("#", "").Trim();
}
//If node for Wife
else if (Line.Contains("1 WIFE "))
{
sType = "PAR";
sID = Line.Replace("1 WIFE ", "").Replace("#", "").Trim();
}
//if node for multi children
else if (Line.Contains("1 CHIL "))
{
sType = "CHIL";
sID = Line.Replace("1 CHIL ", "").Replace("#", "");
}
}
FAM Fam = new FAM();
Fam.FamID = sFID;
Fam.Type = sType;
Fam.IndiID = sID;
return Fam;
}
private INDI ExtractINDI(string[] Node)
{
//If a individual is found
INDI I = new INDI();
if (Node[0].Contains("INDI"))
{
//Create new Structure
//Add the ID number and remove extra formating
I.ID = Node[0].Replace("#", "").Replace(" INDI", "").Trim();
//Find the name remove extra formating for last name
I.Name = Node[FindIndexinArray(Node, "NAME")].Replace("1 NAME", "").Replace("/", "").Trim();
//Find Sex and remove extra formating
I.Sex = Node[FindIndexinArray(Node, "SEX")].Replace("1 SEX ", "").Trim();
//Deterine if there is a brithday -1 means no
if (FindIndexinArray(Node, "1 BIRT ") != -1)
{
// add birthday to Struct
I.BDay = Node[FindIndexinArray(Node, "1 BIRT ") + 1].Replace("2 DATE ", "").Trim();
}
// deterimin if there is a death tag will return -1 if not found
if (FindIndexinArray(Node, "1 DEAT ") != -1)
{
//convert Y or N to true or false ( defaults to False so no need to change unless Y is found.
if (Node[FindIndexinArray(Node, "1 DEAT ")].Replace("1 DEAT ", "").Trim() == "Y")
{
//set death
I.Dead = true;
}
}
}
return I;
}
private int FindIndexinArray(string[] Arr, string search)
{
int Val = -1;
for (int i = 0; i < Arr.Length; i++)
{
if (Arr[i].Contains(search))
{
Val = i;
}
}
return Val;
}
}
}
Implementation:
using System;
using System.Windows.Forms;
using GedcomReader;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
string path = #"C:\mostrecent.ged";
string outpath = #"C:\gedcom.txt";
Gedcom GD = new Gedcom(path);
GraphvizWriter GVW = new GraphvizWriter("Family Tree");
foreach(Gedcom.INDI I in GD.Individuals)
{
string color = "pink";
if (I.Sex == "M")
{
color = "blue";
}
GVW.ListNode(I.ID, I.Name, "filled", color, "circle");
if (I.ID == "ind23800")
{MessageBox.Show("stop");}
//"ind23800" [ label="Sarah Mandley",shape="circle",style="filled",color="pink" ];
}
foreach (Gedcom.FAM F in GD.Families)
{
if (F.Type == "par")
{
GVW.ConnNode(F.FamID, F.IndiID);
}
else if (F.Type =="chil")
{
GVW.ConnNode(F.IndiID, F.FamID);
}
}
string x = GVW.SB.ToString();
GVW.SaveFile(outpath);
MessageBox.Show("done");
}
}
I am particularly interested in if anything could be done about the structures i don't know if how i use them in the implementation is the greatest but again it works
Thanks alot
Quick thoughts:
Nested types should not be visible.
ValueTypes (structs) should be immutable.
Fields (class variables) should not be public. Expose them via properties instead.
Check passed arguments for invalid values, like null.
It might be more readable. It's hard to read and understand.
You may study SOLID principles (http://butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod)
Robert C. Martin gave good presentation on Oredev 2008 about clean code (http://www.oredev.org/topmenu/video/agile/robertcmartincleancodeiiifunctions.4.5a2d30d411ee6ffd2888000779.html)
Some recomended books to read about code readability:
Kent Beck "Implemetation patterns"
Robert C Martin "Clean Code" Robert C
Martin "Agile Principles, Patterns
and Practices in C#"
I suggest you check this place out: http://refactormycode.com/.
For some quick things, your naming is the biggest thing I would start to change.
No need to use ALL-CAPS or abbreviated terms.
Also, FxCop will help with a lot of suggested changes. For example, FindIndexinArray would be named FindIndexInArray.
EDIT:
I don't know if this is a bug in your code or by-design, but in FindIndexinArray, you don't break from your loop once you find a match. Do you want the first (break) or last (no break) match in the array?

Categories