How to remove spaces to make a combination of strings [closed] - c#

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I've been trying to figure out the best approach to combining words in a string to make combinations of that string. I'm trying to do this for a class project. If the string is "The quick fox", I need to find a way to output "Thequick fox", "the quickfox", and "thequickfox". I've tried using string.split and gluing them back together, but haven't had a lot of luck. The issues is the string input could be of any size.

I decided to try this for fun. The idea here is to split the bigger problems into smaller subproblems. So I first started with strings that had 0 and 1 space. I see that with 0 space, the only possible combinations is the string items. With 1 space, I can either have that space or not.
Then I just have to recursively divide the problem until I get one of the base cases. So to that do that I Skip elements in the split array in increments of 2. That way I am guaranteed to get one of the base cases eventually. Once I do that, I run it through the program again and figure out how to add all the results of that to my current set of combinations.
Here's the code:
class Program
{
static void Main(string[] args)
{
string test1 = "fox";
string test2 = "The quick";
string test3 = "The quick fox";
string test4 = "The quick fox says";
string test5 = "The quick fox says hello";
var splittest1 = test1.Split(' ');
var splittest2 = test2.Split(' ');
var splittest3 = test3.Split(' ');
var splittest4 = test4.Split(' ');
var splittest5 = test5.Split(' ');
var ans1 = getcombinations(splittest1);
var ans2 = getcombinations(splittest2);
var ans3 = getcombinations(splittest3);
var ans4 = getcombinations(splittest4);
var ans5 = getcombinations(splittest5);
}
static List<string> getcombinations(string[] splittest)
{
var combos = new List<string>();
var numspaces = splittest.Count() - 1;
if (numspaces == 1)
{
var addcombos = AddTwoStrings(splittest[0], splittest[1]);
var withSpacesCurrent = addcombos.Item1;
var noSpacesCurrent = addcombos.Item2;
combos.Add(withSpacesCurrent);
combos.Add(noSpacesCurrent);
}
else if (numspaces == 0)
{
combos.Add(splittest[0]);
}
else
{
var addcombos = AddTwoStrings(splittest[0], splittest[1]);
var withSpacesCurrent = addcombos.Item1;
var noSpacesCurrent = addcombos.Item2;
var futureCombos = getcombinations(splittest.Skip(2).ToArray());
foreach (var futureCombo in futureCombos)
{
var addFutureCombos = AddTwoStrings(withSpacesCurrent, futureCombo);
var addFutureCombosNoSpaces = AddTwoStrings(noSpacesCurrent, futureCombo);
var combo1 = addFutureCombos.Item1;
var combo2 = addFutureCombos.Item2;
var combo3 = addFutureCombosNoSpaces.Item1;
var combo4 = addFutureCombosNoSpaces.Item2;
combos.Add(combo1);
combos.Add(combo2);
combos.Add(combo3);
combos.Add(combo4);
}
}
return combos;
}
static Tuple<string, string> AddTwoStrings(string a, string b)
{
return Tuple.Create(a + " " + b, a + b);
}
}
}

This is how I got it working, not sure if it is the best algorithm.
public class Program
{
public static void Main(string[] args)
{
Console.WriteLine("Enter a string");
string input = Console.ReadLine();
//split the input string into an array
string[] arrInput = input.Split(' ');
Console.WriteLine("The combinations are...");
//output the original string
Console.WriteLine(input);
//this loop decide letter combination
for (int i = 2; i <= arrInput.Length; i++)
{
//this loop decide how many outputs we would get for a letter combination
//for ex. we would get 2 outputs in a 3 word string if we combine 2 words
for (int j = i-1; j < arrInput.Length; j++)
{
int end = j; // end index
int start = (end - i) + 1; //start index
string output = Combine(arrInput, start, end);
Console.WriteLine(output);
}
}
Console.ReadKey();
}
//combine array into a string with space except from start to end
public static string Combine(string[] arrInput, int start, int end) {
StringBuilder builder = new StringBuilder();
bool combine = false;
for (int i = 0; i < arrInput.Length; i++) {
//first word in the array... don't worry
if (i == 0) {
builder.Append(arrInput[i]);
continue;
}
//don't append " " if combine is true
combine = (i > start && i <= end) ? true : false;
if (!combine)
{
builder.Append(" ");
}
builder.Append(arrInput[i]);
}
return builder.ToString();
}
}

Related

Is there a way to remove a character from a string in this program?

I am tasked with reading the first character in the sentence and count how many times that character occurs. I must then move on to the next character that has not been counted yet and
static void Main(string[] args)
{
char[] array = fullword.ToCharArray();
foreach (char ch in array)
{
Console.WriteLine(GetHowManyTimeOccurenceCharInString(fullword, ch));
}
}
public static int GetHowManyTimeOccurenceCharInString(string text, char c)
{
int count = 0;
foreach (char ch in text)
{
if (ch.Equals(c))
{
count++;
}
}
return count;
}
}
Ok here it is. Just a little program to show you how to do that. First convert your string into a Char arrar and add that to a List. Then you can remove chars from the list using the RemoveAt(n) method. Finally you change your list into a string again and voila there are "JackandJill" again.
using System;
using System.Collections.Generic;
using System.Text;
namespace RemoveChars
{
internal class Program
{
static void Main(string[] args)
{
string name = "Jack and Jill";
var list = new List<char>();
list.AddRange(name.ToCharArray());
list.RemoveAt(4);
list.RemoveAt(7);
var sb = new StringBuilder();
list.ForEach(x => sb.Append(x));
Console.WriteLine(sb.ToString());
Console.ReadKey();
}
}
}
I believe you are trying to get the number of character occurrences in a string.
This will give you a dictionary of character and count.
public Dictionary<char, int> GetNoOfOccurenceCharInString(string stringToCheck)
{
//Track the character and count in a Dictionary.
Dictionary<char, int> map = new Dictionary<char, int>();
//For String "Jack and Jill" it will store something like this in the memory
// j =2, a = 2, c = 1, k = 1, " " = 2, n = 1, d = 1, i = 1, l = 2
//Shift to Lower case and Loop through each character
foreach (var c in stringToCheck.ToLower())
{
if (c == ' ') //If the current char is a space do not count;
{
continue;
}
//Check if map already has the char, if yes use the count stored in the memory, else initialize to 0
var count = map.ContainsKey(c) ? map[c] : 0;
map[c] = count + 1; //Increment count by 1
}
return map;
}
To test this:
string name1 = "Jack";
string matches = "matches";
string name2 = "Jill";
string fullword = name1 + matches + name2;
//Call our method
var result = GetNoOfOccurenceCharInString(fullword);
//loop through each key in the Dictionary
foreach (var key in result.Keys)
{
//Print out the Key and the Count
Console.WriteLine($"{key}: {result[key]}");
}
The best way to do what you need is to create a counting HashMap. I will put this example made in java.
public static void main(String[] args) {
String name1 = "jack";
String matches = "matches";
String name2 = "jill";
String fullWord = name1 + matches + name2;
//Map to get the number of repeated characters
HashMap<Character, Integer> counts = new HashMap<>();
//New string without repeated characters
StringBuilder noRepeatingCharacter = new StringBuilder();
// Text string with the number of repeated characters
StringBuilder literalCounts = new StringBuilder();
for (int i = 0; i < fullWord.length(); i++) {
//We get the character to establish if it is repeated or not.
char charAr = fullWord.charAt(i);
//We verify if in our map the character exists
if (counts.containsKey(charAr)) {
// Update counts
counts.replace(charAr, counts.get(charAr) + 1);
} else {
// we add to our new string of characters only the non-repeated ones
noRepeatingCharacter.append(charAr);
// If the character does not exist in our map, we add it
counts.put(charAr, 1);
}
}
// So that we can print the count in order, we assign to our variable literalCounts
for (int i = 0; i < noRepeatingCharacter.length(); i++) {
char charAr = noRepeatingCharacter.charAt(i);
literalCounts.append(counts.get(charAr));
}
// Print counts
System.out.println("Counts: " + literalCounts);
}
Output
Output

Diff comparison word by word and display changes

Could be marked as duplicated, but I haven't found a propper solution yet.
I need to write a function that compares 2 pieces of text word by word, and prints out the text showing added/deleted/changed words. For example:
StringOriginal = "I am Tim and I am 27 years old"
StringEdited = "I am Kim and I am not that old".
Result: I am Tim Kim and I am 27 years not that old.
Most of the diff algorithms I find tend to compare char by char. this works fine, untill you have a 2 different words on the same index, with mutual chars.
"I am Tim" edited to
"I am Kim"
Results into:
I am TKim
instead of
I am Tim Kim.
Any pointers?
Split by space both StringOriginal and StringEdited. Loop thru each word of StringOriginal comparing it to the same word index from Edited. Every unequal word should be put to a temporary variable and concatenate it to the result only when you get equal word again from the loop. Use StringBuilder in creating the result. Hope this helps
Split both strings by space, join the resulting arrays via Union, then back to string like this:
string[] arr1 = str1.Split(' ');
string[] arr2 = str1.Split(' ');
var merged = arr1.Union(arr2).ToArray<string>();
var mergedString = string.Join(' ', merged);
little bit old fashion, but you can try this.
string StringOriginal = "I am Tim and I am 27 years old";
string StringEdited = "I am Kim and I am not that old";
string[] StringOriginalArray = StringOriginal.Split();
string[] StringEditedArray = StringEdited.Split();
string[] newStringArray = new string[StringOriginalArray.Length + StringEditedArray.Length];
int i = 0;
int io = 0;
int ie = 0;
while (i < newStringArray.Length)
{
if (io < StringOriginalArray.Length)
{
newStringArray[i] = StringOriginalArray[io];
io++;
i++;
}
if (ie < StringEditedArray.Length)
{
newStringArray[i] = StringEditedArray[ie];
ie++;
i++;
}
}
string[] finalArray = new string[newStringArray.Length];
int f = 0;
for (int k = 0; k < newStringArray.Length; k=k+2)
{
finalArray[f++] = newStringArray[k];
if (newStringArray[k] != newStringArray[k+1])
{
finalArray[f++] = newStringArray[k+1];
}
}
Console.WriteLine(String.Join(" ", finalArray));
Output:
"I am Tim Kim and I am 27 not years that old"
been looking for an answer myself to this question.
haven't been able to find a good solution.
came up with the following. but it's not perfect.
public static class DiffEngine
{
private static Regex r = new Regex(#"(?<=[\s])", RegexOptions.Compiled);
public static string Process(ref string TextA, ref string TextB)
{
var A = r.Split(TextA);
var B = r.Split(TextB);
var max = Math.Max(A.Count(), B.Count());
var sbDel = new StringBuilder("<del>");
var sbIns = new StringBuilder("<ins>");
var sbOutput = new StringBuilder();
var aCurr = string.Empty;
var bCurr = string.Empty;
var aNext = string.Empty;
var bNext = string.Empty;
for (int i = 0; i < max; i++)
{
aCurr = (i > A.Count() - 1) ? string.Empty : A[i];
bCurr = (i > B.Count() - 1) ? string.Empty : B[i];
aNext = (i > A.Count() - 2) ? string.Empty : A[i + 1];
bNext = (i > B.Count() - 2) ? string.Empty : B[i + 1];
if (aCurr == bCurr)
{
sbOutput.Append(aCurr);
}
else
{
if (aNext != bNext)
{
sbDel.Append(aCurr);
sbIns.Append(bCurr);
}
else
{
sbDel.Append(aCurr);
sbIns.Append(bCurr);
sbOutput
.Append(sbDel.ToString())
.Append("</del>")
.Append(sbIns.ToString())
.Append("</ins>");
sbDel.Clear().Append("<del>");
sbIns.Clear().Append("<ins>");
}
}
}
A = null;
B = null;
sbDel = null;
sbIns = null;
return sbOutput.ToString();
}
}

C# How to generate a new string based on multiple ranged index

Let's say I have a string like this one, left part is a word, right part is a collection of indices (single or range) used to reference furigana (phonetics) for kanjis in my word:
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす"
The pattern in detail:
word,<startIndex>(-<endIndex>):<furigana>
What would be the best way to achieve something like this (with a space in front of the kanji to mark which part is linked to the [furigana]):
子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Edit: (thanks for your comments guys)
Here is what I wrote so far:
static void Main(string[] args)
{
string myString = "ABCDEF,1:test;3:test2";
//Split Kanjis / Indices
string[] tokens = myString.Split(',');
//Extract furigana indices
string[] indices = tokens[1].Split(';');
//Dictionnary to store furigana indices
Dictionary<string, string> furiganaIndices = new Dictionary<string, string>();
//Collect
foreach (string index in indices)
{
string[] splitIndex = index.Split(':');
furiganaIndices.Add(splitIndex[0], splitIndex[1]);
}
//Processing
string result = tokens[0] + ",";
for (int i = 0; i < tokens[0].Length; i++)
{
string currentIndex = i.ToString();
if (furiganaIndices.ContainsKey(currentIndex)) //add [furigana]
{
string currentFurigana = furiganaIndices[currentIndex].ToString();
result = result + " " + tokens[0].ElementAt(i) + string.Format("[{0}]", currentFurigana);
}
else //nothing to add
{
result = result + tokens[0].ElementAt(i);
}
}
File.AppendAllText(#"D:\test.txt", result + Environment.NewLine);
}
Result:
ABCDEF,A B[test]C D[test2]EF
I struggle to find a way to process ranged indices:
string myString = "ABCDEF,1:test;2-3:test2";
Result : ABCDEF,A B[test] CD[test2]EF
I don't have anything against manually manipulating strings per se. But given that you seem to have a regular pattern describing the inputs, it seems to me that a solution that uses regex would be more maintainable and readable. So with that in mind, here's an example program that takes that approach:
class Program
{
private const string _kinvalidFormatException = "Invalid format for edit specification";
private static readonly Regex
regex1 = new Regex(#"(?<word>[^,]+),(?<edit>(?:\d+)(?:-(?:\d+))?:(?:[^;]+);?)+", RegexOptions.Compiled),
regex2 = new Regex(#"(?<start>\d+)(?:-(?<end>\d+))?:(?<furigana>[^;]+);?", RegexOptions.Compiled);
static void Main(string[] args)
{
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
string result = EditString(myString);
}
private static string EditString(string myString)
{
Match editsMatch = regex1.Match(myString);
if (!editsMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int ichCur = 0;
string input = editsMatch.Groups["word"].Value;
StringBuilder text = new StringBuilder();
foreach (Capture capture in editsMatch.Groups["edit"].Captures)
{
Match oneEditMatch = regex2.Match(capture.Value);
if (!oneEditMatch.Success)
{
throw new ArgumentException(_kinvalidFormatException);
}
int start, end;
if (!int.TryParse(oneEditMatch.Groups["start"].Value, out start))
{
throw new ArgumentException(_kinvalidFormatException);
}
Group endGroup = oneEditMatch.Groups["end"];
if (endGroup.Success)
{
if (!int.TryParse(endGroup.Value, out end))
{
throw new ArgumentException(_kinvalidFormatException);
}
}
else
{
end = start;
}
text.Append(input.Substring(ichCur, start - ichCur));
if (text.Length > 0)
{
text.Append(' ');
}
ichCur = end + 1;
text.Append(input.Substring(start, ichCur - start));
text.Append(string.Format("[{0}]", oneEditMatch.Groups["furigana"]));
}
if (ichCur < input.Length)
{
text.Append(input.Substring(ichCur));
}
return text.ToString();
}
}
Notes:
This implementation assumes that the edit specifications will be listed in order and won't overlap. It makes no attempt to validate that part of the input; depending on where you are getting your input from you may want to add that. If it's valid for the specifications to be listed out of order, you can also extend the above to first store the edits in a list and sort the list by the start index before actually editing the string. (In similar fashion to the way the other proposed answer works; though, why they are using a dictionary instead of a simple list to store the individual edits, I have no idea…that seems arbitrarily complicated to me.)
I included basic input validation, throwing exceptions where failures occur in the pattern matching. A more user-friendly implementation would add more specific information to each exception, describing what part of the input actually was invalid.
The Regex class actually has a Replace() method, which allows for complete customization. The above could have been implemented that way, using Replace() and a MatchEvaluator to provide the replacement text, instead of just appending text to a StringBuilder. Which way to do it is mostly a matter of preference, though the MatchEvaluator might be preferred if you have a need for more flexible implementation options (i.e. if the exact format of the result can vary).
If you do choose to use the other proposed answer, I strongly recommend you use StringBuilder instead of simply concatenating onto the results variable. For short strings it won't matter much, but you should get into the habit of always using StringBuilder when you have a loop that is incrementally adding onto a string value, because for long string the performance implications of using concatenation can be very negative.
This should do it (and even handle ranged indices), based on the formatting of the input string you have-
using System;
using System.Collections.Generic;
public class stringParser
{
private struct IndexElements
{
public int start;
public int end;
public string value;
}
public static void Main()
{
//input string
string myString = "子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす";
int wordIndexSplit = myString.IndexOf(',');
string word = myString.Substring(0,wordIndexSplit);
string indices = myString.Substring(wordIndexSplit + 1);
string[] eachIndex = indices.Split(';');
Dictionary<int,IndexElements> index = new Dictionary<int,IndexElements>();
string[] elements;
IndexElements e;
int dash;
int n = 0;
int last = -1;
string results = "";
foreach (string s in eachIndex)
{
e = new IndexElements();
elements = s.Split(':');
if (elements[0].Contains("-"))
{
dash = elements[0].IndexOf('-');
e.start = int.Parse(elements[0].Substring(0,dash));
e.end = int.Parse(elements[0].Substring(dash + 1));
}
else
{
e.start = int.Parse(elements[0]);
e.end = e.start;
}
e.value = elements[1];
index.Add(n,e);
n++;
}
//this is the part that takes the "setup" from the parts above and forms the result string
//loop through each of the "indices" parsed above
for (int i = 0; i < index.Count; i++)
{
//if this is the first iteration through the loop, and the first "index" does not start
//at position 0, add the beginning characters before its start
if (last == -1 && index[i].start > 0)
{
results += word.Substring(0,index[i].start);
}
//if this is not the first iteration through the loop, and the previous iteration did
//not stop at the position directly before the start of the current iteration, add
//the intermediary chracters
else if (last != -1 && last + 1 != index[i].start)
{
results += word.Substring(last + 1,index[i].start - (last + 1));
}
//add the space before the "index" match, the actual match, and then the formatted "index"
results += " " + word.Substring(index[i].start,(index[i].end - index[i].start) + 1)
+ "[" + index[i].value + "]";
//remember the position of the ending for the next iteration
last = index[i].end;
}
//if the last "index" did not stop at the end of the input string, add the remaining characters
if (index[index.Keys.Count - 1].end + 1 < word.Length)
{
results += word.Substring(index[index.Keys.Count-1].end + 1);
}
//trimming spaces that may be left behind
results = results.Trim();
Console.WriteLine("INPUT - " + myString);
Console.WriteLine("OUTPUT - " + results);
Console.Read();
}
}
input - 子で子にならぬ時鳥,0:こ;2:こ;7-8:ほととぎす
output - 子[こ]で 子[こ]にならぬ 時鳥[ほととぎす]
Note that this should also work with characters the English alphabet if you wanted to use English instead-
input - iliketocodeverymuch,2:A;4-6:B;9-12:CDEFG
output - il i[A]k eto[B]co deve[CDEFG]rymuch

Reverse a String without using Reverse. It works, but why?

Ok, so a friend of mine asked me to help him out with a string reverse method that can be reused without using String.Reverse (it's a homework assignment for him). Now, I did, below is the code. It works. Splendidly actually. Obviously by looking at it you can see the larger the string the longer the time it takes to work. However, my question is WHY does it work? Programming is a lot of trial and error, and I was more pseudocoding than actual coding and it worked lol.
Can someone explain to me how exactly reverse = ch + reverse; is working? I don't understand what is making it go into reverse :/
class Program
{
static void Reverse(string x)
{
string text = x;
string reverse = string.Empty;
foreach (char ch in text)
{
reverse = ch + reverse;
// this shows the building of the new string.
// Console.WriteLine(reverse);
}
Console.WriteLine(reverse);
}
static void Main(string[] args)
{
string comingin;
Console.WriteLine("Write something");
comingin = Console.ReadLine();
Reverse(comingin);
// pause
Console.ReadLine();
}
}
If the string passed through is "hello", the loop will be doing this:
reverse = 'h' + string.Empty
reverse = 'e' + 'h'
reverse = 'l' + 'eh'
until it's equal to
olleh
If your string is My String, then:
Pass 1, reverse = 'M'
Pass 2, reverse = 'yM'
Pass 3, reverse = ' yM'
You're taking each char and saying "that character and tack on what I had before after it".
I think your question has been answered. My reply goes beyond the immediate question and more to the spirit of the exercise. I remember having this task many decades ago in college, when memory and mainframe (yikes!) processing time was at a premium. Our task was to reverse an array or string, which is an array of characters, without creating a 2nd array or string. The spirit of the exercise was to teach one to be mindful of available resources.
In .NET, a string is an immutable object, so I must use a 2nd string. I wrote up 3 more examples to demonstrate different techniques that may be faster than your method, but which shouldn't be used to replace the built-in .NET Replace method. I'm partial to the last one.
// StringBuilder inserting at 0 index
public static string Reverse2(string inputString)
{
var result = new StringBuilder();
foreach (char ch in inputString)
{
result.Insert(0, ch);
}
return result.ToString();
}
// Process inputString backwards and append with StringBuilder
public static string Reverse3(string inputString)
{
var result = new StringBuilder();
for (int i = inputString.Length - 1; i >= 0; i--)
{
result.Append(inputString[i]);
}
return result.ToString();
}
// Convert string to array and swap pertinent items
public static string Reverse4(string inputString)
{
var chars = inputString.ToCharArray();
for (int i = 0; i < (chars.Length/2); i++)
{
var temp = chars[i];
chars[i] = chars[chars.Length - 1 - i];
chars[chars.Length - 1 - i] = temp;
}
return new string(chars);
}
Please imagine that you entrance string is "abc". After that you can see that letters are taken one by one and add to the start of the new string:
reverse = "", ch='a' ==> reverse (ch+reverse) = "a"
reverse= "a", ch='b' ==> reverse (ch+reverse) = b+a = "ba"
reverse= "ba", ch='c' ==> reverse (ch+reverse) = c+ba = "cba"
To test the suggestion by Romoku of using StringBuilder I have produced the following code.
public static void Reverse(string x)
{
string text = x;
string reverse = string.Empty;
foreach (char ch in text)
{
reverse = ch + reverse;
}
Console.WriteLine(reverse);
}
public static void ReverseFast(string x)
{
string text = x;
StringBuilder reverse = new StringBuilder();
for (int i = text.Length - 1; i >= 0; i--)
{
reverse.Append(text[i]);
}
Console.WriteLine(reverse);
}
public static void Main(string[] args)
{
int abcx = 100; // amount of abc's
string abc = "";
for (int i = 0; i < abcx; i++)
abc += "abcdefghijklmnopqrstuvwxyz";
var x = new System.Diagnostics.Stopwatch();
x.Start();
Reverse(abc);
x.Stop();
string ReverseMethod = "Reverse Method: " + x.ElapsedMilliseconds.ToString();
x.Restart();
ReverseFast(abc);
x.Stop();
Console.Clear();
Console.WriteLine("Method | Milliseconds");
Console.WriteLine(ReverseMethod);
Console.WriteLine("ReverseFast Method: " + x.ElapsedMilliseconds.ToString());
System.Console.Read();
}
On my computer these are the speeds I get per amount of alphabet(s).
100 ABC(s)
Reverse ~5-10ms
FastReverse ~5-15ms
1000 ABC(s)
Reverse ~120ms
FastReverse ~20ms
10000 ABC(s)
Reverse ~16,852ms!!!
FastReverse ~262ms
These time results will vary greatly depending on the computer but one thing is for certain if you are processing more than 100k characters you are insane for not using StringBuilder! On the other hand if you are processing less than 2000 characters the overhead from the StringBuilder definitely seems to catch up with its performance boost.

Multiple string replace in c#

I am dynamically editing a regex for matching text in a pdf, which can contain hyphenation at the end of some lines.
Example:
Source string:
"consecuti?vely"
Replace rules:
.Replace("cuti?",#"cuti?(-\s+)?")
.Replace("con",#"con(-\s+)?")
.Replace("consecu",#"consecu(-\s+)?")
Desired output:
"con(-\s+)?secu(-\s+)?ti?(-\s+)?vely"
The replace rules are built dynamically, this is just an example which causes problems.
Whats the best solution to perform such a multiple replace, which will produce the desired output?
So far I thought about using Regex.Replace and zipping the word to replace with optional (-\s+)? between each character, but that would not work, because the word to replace already contains special-meaning characters in regex context.
EDIT: My current code, doesnt work when replace rules overlap like in example above
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
for (int i = 0; i < hyphenatedParts.Count; i++)
{
var partBeforeHyphen = String.Concat(hyphenatedParts[i].Value.TakeWhile(c => c != '-'));
regex = regex.Replace(partBeforeHyphen, partBeforeHyphen + #"(-\s+)?");
}
return regex;
}
the output of this program is "con(-\s+)?secu(-\s+)?ti?(-\s+)?vely";
and as I understand your problem, my code can completely solve your problem.
class Program
{
class somefields
{
public string first;
public string secound;
public string Add;
public int index;
public somefields(string F, string S)
{
first = F;
secound = S;
}
}
static void Main(string[] args)
{
//declaring output
string input = "consecuti?vely";
List<somefields> rules=new List<somefields>();
//declaring rules
rules.Add(new somefields("cuti?",#"cuti?(-\s+)?"));
rules.Add(new somefields("con",#"con(-\s+)?"));
rules.Add(new somefields("consecu",#"consecu(-\s+)?"));
// finding the string which must be added to output string and index of that
foreach (var rul in rules)
{
var index=input.IndexOf(rul.first);
if (index != -1)
{
var add = rul.secound.Remove(0,rul.first.Count());
rul.Add = add;
rul.index = index+rul.first.Count();
}
}
// sort rules by index
for (int i = 0; i < rules.Count(); i++)
{
for (int j = i + 1; j < rules.Count(); j++)
{
if (rules[i].index > rules[j].index)
{
somefields temp;
temp = rules[i];
rules[i] = rules[j];
rules[j] = temp;
}
}
}
string output = input.ToString();
int k=0;
foreach(var rul in rules)
{
if (rul.index != -1)
{
output = output.Insert(k + rul.index, rul.Add);
k += rul.Add.Length;
}
}
System.Console.WriteLine(output);
System.Console.ReadLine();
}
}
You should probably write your own parser, it's probably easier to maintain :).
Maybe you could add "special characters" around pattern in order to protect them like "##" if the strings not contains it.
Try this one:
var final = Regex.Replace(originalTextOfThePage, #"(\w+)(?:\-[\s\r\n]*)?", "$1");
I had to give up an easy solution and did the editing of the regex myself. As a side effect, the new approach goes only twice trough the string.
private string ModifyRegexToAcceptHyphensOfCurrentPage(string regex, int searchedPage)
{
var indexesToInsertPossibleHyphenation = GetPossibleHyphenPositions(regex, searchedPage);
var hyphenationToken = #"(-\s+)?";
return InsertStringTokenInAllPositions(regex, indexesToInsertPossibleHyphenation, hyphenationToken);
}
private static string InsertStringTokenInAllPositions(string sourceString, List<int> insertionIndexes, string insertionToken)
{
if (insertionIndexes == null || string.IsNullOrEmpty(insertionToken)) return sourceString;
var sb = new StringBuilder(sourceString.Length + insertionIndexes.Count * insertionToken.Length);
var linkedInsertionPositions = new LinkedList<int>(insertionIndexes.Distinct().OrderBy(x => x));
for (int i = 0; i < sourceString.Length; i++)
{
if (!linkedInsertionPositions.Any())
{
sb.Append(sourceString.Substring(i));
break;
}
if (i == linkedInsertionPositions.First.Value)
{
sb.Append(insertionToken);
}
if (i >= linkedInsertionPositions.First.Value)
{
linkedInsertionPositions.RemoveFirst();
}
sb.Append(sourceString[i]);
}
return sb.ToString();
}
private List<int> GetPossibleHyphenPositions(string regex, int searchedPage)
{
var originalTextOfThePage = mPagesNotModified[searchedPage];
var hyphenatedParts = Regex.Matches(originalTextOfThePage, #"\w+\-\s");
var indexesToInsertPossibleHyphenation = new List<int>();
//....
// Aho-Corasick to find all occurences of all
//strings in "hyphenatedParts" in the "regex" string
// ....
return indexesToInsertPossibleHyphenation;
}

Categories