Removing matching characters between two strings - c#

I want to remove the characters which are matching between the two given strings. Eg.
string str1 = "Abbbccd";
string str2 = "Ebbd";
From these two strings I want the output as:
"Abcc", only those many matching characters should be removed from str1,which are present in str2.
I tried the following code:
public string Sub(string str1, string str2)
{
char[] arr1 = str1.ToCharArray();
char[] arr2 = str2.ToCharArray();
char[] arrDifference = arr1.Except(arr2).ToArray();
string final = new string(arrDifference);
return final;
}
With this code I get the output as "Ac". It removes all the matching characters between two arrays and stores 'c' only once.

First create this helper method:
IEnumerable<Tuple<char, int>> IndexDistinct(IEnumerable<char> source)
{
var D = new Dictionary<char, int>();
foreach (var c in source)
{
D[c] = D.ContainsKey(c) ? (D[c] + 1) : 0;
yield return Tuple.Create(c, D[c]);
}
}
It converts a string "aabcccd" to [(a,0),(a,1),(b,0),(c,0),(c,1),(c,2),(d,0)]. The idea is to make every character distinct by adding a counting index on equal characters.
Then modify your proposed function like this:
string Sub(string str1, string str2)
{
return new string(
IndexDistinct(str1)
.Except(IndexDistinct(str2))
.Select(x => x.Item1)
.ToArray());
}
Now that you are doing Except on Tuple<char, int> instead of just char, you should get the behavior you specified.

You can do it with lists as well:
List<char> one = new List<char>("Abbbccd".ToCharArray());
List<char> two = new List<char>("Ebbd".ToCharArray());
foreach (char c in two) {
try { one.RemoveAt(one.IndexOf(c)); } catch { }
}
string result = new string(one.ToArray());

Use C#'s string commands to modify the string.
public string testmethod(string str1, string str2)
{
string result = str1;
foreach (char character in str2.ToCharArray())
{
result = result.Replace(character.ToString(), "");
}
return result;
}

Related

How to output unique symbols except case c# without LINQ

Output unique symbols ignoring case
IDictionary<char, int> charDict = new Dictionary<char, int>();
foreach (var ch in text)
{
if (!charDict.TryGetValue(ch, out n)) {
charDict.Add(new KeyValuePair<char, int>(ch, 1));
} else
{
charDict[ch]++;
}
}
Appellodppsafs => Apelodsf
And Is it possible not to use LINQ?
Use a HashSet<char> to remember existing characters (that's what Distinct() does internally)
Assuming your input and expected result are type string
string input = "Appellodppsafs";
HashSet<char> crs = new HashSet<char>();
string result = string.Concat(input.Where(x => crs.Add(char.ToLower(x)))); //Apelodsf
You can try this (if you do not have long strings or performance issues):
string str = "Appellodppsafs";
string result = string.Concat(str.Select(s => $"{s}")
.Distinct(StringComparer.InvariantCultureIgnoreCase));
Console.WriteLine(result);
Output:
Apelodsf

Replace every other of a certain char in a string

I have searched a lot to find a solution to this, but could not find anything. I do however suspect that it is because I don't know what to search for.
First, I have a string that I convert to an array. The string will be formatted like so:
"99.28099822998047,68.375 118.30699729919434,57.625 126.49999713897705,37.875 113.94499683380127,11.048999786376953 96.00499725341797,8.5"
I create the array with the following code:
public static Array StringToArray(string String)
{
var list = new List<string>();
string[] Coords = String.Split(' ', ',');
foreach (string Coord in Coords)
{
list.Add(Coord);
}
var array = list.ToArray();
return array;
}
Now my problem is; I am trying to find a way to convert it back into a string, with the same formatting. So, I could create a string simply using:
public static String ArrayToString(Array array)
{
string String = string.Join(",", array);
return String;
}
and then hopefully replace every 2nd "," with a space (" "). Is this possible? Or are there a whole other way you would do this?
Thank you in advance! I hope my question makes sense.
There is no built-in way of doing what you need. However, it's pretty trivial to achieve what it is you need e.g.
public static string[] StringToArray(string str)
{
return str.Replace(" ", ",").Split(',');
}
public static string ArrayToString(string[] array)
{
StringBuilder sb = new StringBuilder();
for (int i = 0; i <= array.Length-1; i++)
{
sb.AppendFormat(i % 2 != 0 ? "{0} " : "{0},", array[i]);
}
return sb.ToString();
}
If those are pairs of coordinates, you can start by parsing them like pairs, not like separate numbers:
public static IEnumerable<string[]> ParseCoordinates(string input)
{
return input.Split(' ').Select(vector => vector.Split(','));
}
It is easier then to reconstruct the original string:
public static string PrintCoordinates(IEnumerable<string[]> coords)
{
return String.Join(" ", coords.Select(vector => String.Join(",", vector)));
}
But if you absolutely need to have your data in a flat structure like array, it is then possible to convert it to a more structured format:
public static IEnumerable<string[]> Pairwise(string[] coords)
{
coords.Zip(coords.Skip(1), (coord1, coord2) => new[] { coord1, coord2 });
}
You then can use this method in conjunction with PrintCoordinates to reconstruct your initial string.
Here is a route to do it. I don't think other solutions were removing last comma or space. I also include a test.
public static String ArrayToString(Array array)
{
var useComma = true;
var stringBuilder = new StringBuilder();
foreach (var value in array)
{
if (useComma)
{
stringBuilder.AppendFormat("{0}{1}", value, ",");
}
else
{
stringBuilder.AppendFormat("{0}{1}", value, " ");
}
useComma = !useComma;
}
// Remove last space or comma
stringBuilder.Length = stringBuilder.Length - 1;
return stringBuilder.ToString();
}
[TestMethod]
public void ArrayToStringTest()
{
var expectedStringValue =
"99.28099822998047,68.375 118.30699729919434,57.625 126.49999713897705,37.875 113.94499683380127,11.048999786376953 96.00499725341797,8.5";
var array = new[]
{
"99.28099822998047",
"68.375",
"118.30699729919434",
"57.625",
"126.49999713897705",
"37.875",
"113.94499683380127",
"11.048999786376953",
"96.00499725341797",
"8.5",
};
var actualStringValue = ArrayToString(array);
Assert.AreEqual(expectedStringValue, actualStringValue);
}
Another way of doing it:
string inputString = "1.11,11.3 2.22,12.4 2.55,12.8";
List<string[]> splitted = inputString.Split(' ').Select(a => a.Split(',')).ToList();
string joined = string.Join(" ", splitted.Select(a => string.Join(",",a)).ToArray());
"splitted" list will look like this:
1.11 11.3
2.22 12.4
2.55 12.8
"joined" string is the same as "inputString"
Here's another approach to this problem.
public static string ArrayToString(string[] array)
{
Debug.Assert(array.Length % 2 == 0, "Array is not dividable by two.");
// Group all coordinates as pairs of two.
int index = 0;
var coordinates = from item in array
group item by index++ / 2
into pair
select pair;
// Format each coordinate pair with a comma.
var formattedCoordinates = coordinates.Select(i => string.Join(",", i));
// Now concatinate all the pairs with a space.
return string.Join(" ", formattedCoordinates);
}
And a simple demonstration:
public static void A_Simple_Test()
{
string expected = "1,2 3,4";
string[] array = new string[] { "1", "2", "3", "4" };
Debug.Assert(expected == ArrayToString(array));
}

Javascript slugifier to C#

I am looking at converting the JS slugify function by diegok and he is using this JavaScript construct:
function turkish_map() {
return {
'ş':'s', 'Ş':'S', 'ı':'i', 'İ':'I', 'ç':'c', 'Ç':'C', 'ü':'u', 'Ü':'U',
'ö':'o', 'Ö':'O', 'ğ':'g', 'Ğ':'G'
};
}
It is a map of char to char translations. However, I don't know which JS construct is this and how could it be rewritten in C# preferably without spending too much time on rewriting? (There's more to it, this is just one of the functions).
Should I make an array, dictionary, something else?
Dictionary<char, char> turkish_map() {
return new Dictionary<char, char> {
{'ş','s'}, {'Ş','S'}, {'ı','i'}, {'İ','I'} {'ç','c'} , {'Ç','C' }, {'ü','u'}, {'Ü','U'}, {'ö','o'}, {'Ö','O'}, {'ğ','g'}, {'Ğ','G'} };
}
The use it like:
turkish_map()['İ'] // returns I
Or you can save it into field and use it without creating it every time.
Use these methods to remove diacritics, the result will be sSıIcCuUoOgG.
namespace Test
{
public class Program
{
public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm, Func<char, char> customFolding)
{
foreach (char c in src.Normalize(compatNorm ? NormalizationForm.FormKD : NormalizationForm.FormD))
switch (CharUnicodeInfo.GetUnicodeCategory(c))
{
case UnicodeCategory.NonSpacingMark:
case UnicodeCategory.SpacingCombiningMark:
case UnicodeCategory.EnclosingMark:
//do nothing
break;
default:
yield return customFolding(c);
break;
}
}
public static IEnumerable<char> RemoveDiacriticsEnum(string src, bool compatNorm)
{
return RemoveDiacritics(src, compatNorm, c => c);
}
public static string RemoveDiacritics(string src, bool compatNorm, Func<char, char> customFolding)
{
StringBuilder sb = new StringBuilder();
foreach (char c in RemoveDiacriticsEnum(src, compatNorm, customFolding))
sb.Append(c);
return sb.ToString();
}
public static string RemoveDiacritics(string src, bool compatNorm)
{
return RemoveDiacritics(src, compatNorm, c => c);
}
static void Main(string[] args)
{
var str = "şŞıİçÇüÜöÖğĞ";
Console.Write(RemoveDiacritics(str, false));
// output: sSıIcCuUoOgG
Console.ReadKey();
}
}
}
For other characters like ı which wasn't converted, and others as you mentioned as #, ™ you can use the method to remove diacritics then use a regex to remove invalid characters. If you care enough for some characters you can make a Dictionary<char, char> and use it to replace them each one of them.
Then you can do this:
var input = "Şöme-p#ttern"; // text to convert into a slug
var replaces = new Dictionary<char, char> { { '#', 'a' } }; // list of chars you care
var pattern = #"[^A-Z0-9_-]+"; // regex to remove invalid characters
var result = new StringBuilder(RemoveDiacritics(input, false)); // convert Ş to S
// and so on
foreach (var item in replaces)
{
result = result.Replace(item.Key, item.Value); // replace # with a and so on
}
// remove invalid characters which weren't converted
var slug = Regex.Replace(result.ToString(), pattern, String.Empty,
RegexOptions.IgnoreCase);
// output: Some-pattern

String formatting using C#

Is there a way to remove every special character from a string like:
"\r\n 1802 S St Nw<br>\r\n Washington, DC 20009"
And to just write it like:
"1802 S St Nw, Washington, DC 20009"
To remove special characters:
public static string ClearSpecialChars(this string input)
{
foreach (var ch in new[] { "\r", "\n", "<br>", etc })
{
input = input.Replace(ch, String.Empty);
}
return input;
}
To replace all double space with single space:
public static string ClearDoubleSpaces(this string input)
{
while (input.Contains(" ")) // double
{
input = input.Replace(" ", " "); // with single
}
return input;
}
You also may split both methods into a single one:
public static string Clear(this string input)
{
return input
.ClearSpecialChars()
.ClearDoubleSpaces()
.Trim();
}
two ways, you can use RegEx, or you can use String.Replace(...)
Use the Regex.Replace() method, specifying all of the characters you want to remove as the pattern to match.
You can use the C# Trim() method, look here:
http://msdn.microsoft.com/de-de/library/d4tt83f9%28VS.80%29.aspx
System.Text.RegularExpressions.Regex.Replace("\"\\r\\n 1802 S St Nw<br>\\r\\n Washington, DC 20009\"",
#"(<br>)*?\\r\\n\s+", "");
Maybe something like this, using ASCII int values. Assumes all html tags will be closed.
public static class StringExtensions
{
public static string Clean(this string str)
{
string[] split = str.Split(' ');
List<string> strings = new List<string>();
foreach (string splitStr in split)
{
if (splitStr.Length > 0)
{
StringBuilder sb = new StringBuilder();
bool tagOpened = false;
foreach (char c in splitStr)
{
int iC = (int)c;
if (iC > 32)
{
if (iC == 60)
tagOpened = true;
if (!tagOpened)
sb.Append(c);
if (iC == 62)
tagOpened = false;
}
}
string result = sb.ToString();
if (result.Length > 0)
strings.Add(result);
}
}
return string.Join(" ", strings.ToArray());
}
}

How do I remove all non alphanumeric characters from a string except dash?

How do I remove all non alphanumeric characters from a string except dash and space characters?
Replace [^a-zA-Z0-9 -] with an empty string.
Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");
I could have used RegEx, they can provide elegant solution but they can cause performane issues. Here is one solution
char[] arr = str.ToCharArray();
arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c)
|| char.IsWhiteSpace(c)
|| c == '-')));
str = new string(arr);
When using the compact framework (which doesn't have FindAll)
Replace FindAll with1
char[] arr = str.Where(c => (char.IsLetterOrDigit(c) ||
char.IsWhiteSpace(c) ||
c == '-')).ToArray();
str = new string(arr);
1 Comment by ShawnFeatherly
You can try:
string s1 = Regex.Replace(s, "[^A-Za-z0-9 -]", "");
Where s is your string.
Using System.Linq
string withOutSpecialCharacters = new string(stringWithSpecialCharacters.Where(c =>char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-').ToArray());
The regex is [^\w\s\-]*:
\s is better to use instead of space (), because there might be a tab in the text.
Based on the answer for this question, I created a static class and added these. Thought it might be useful for some people.
public static class RegexConvert
{
public static string ToAlphaNumericOnly(this string input)
{
Regex rgx = new Regex("[^a-zA-Z0-9]");
return rgx.Replace(input, "");
}
public static string ToAlphaOnly(this string input)
{
Regex rgx = new Regex("[^a-zA-Z]");
return rgx.Replace(input, "");
}
public static string ToNumericOnly(this string input)
{
Regex rgx = new Regex("[^0-9]");
return rgx.Replace(input, "");
}
}
Then the methods can be used as:
string example = "asdf1234!##$";
string alphanumeric = example.ToAlphaNumericOnly();
string alpha = example.ToAlphaOnly();
string numeric = example.ToNumericOnly();
Want something quick?
public static class StringExtensions
{
public static string ToAlphaNumeric(this string self,
params char[] allowedCharacters)
{
return new string(Array.FindAll(self.ToCharArray(),
c => char.IsLetterOrDigit(c) ||
allowedCharacters.Contains(c)));
}
}
This will allow you to specify which characters you want to allow as well.
Here is a non-regex heap allocation friendly fast solution which was what I was looking for.
Unsafe edition.
public static unsafe void ToAlphaNumeric(ref string input)
{
fixed (char* p = input)
{
int offset = 0;
for (int i = 0; i < input.Length; i++)
{
if (char.IsLetterOrDigit(p[i]))
{
p[offset] = input[i];
offset++;
}
}
((int*)p)[-1] = offset; // Changes the length of the string
p[offset] = '\0';
}
}
And for those who don't want to use unsafe or don't trust the string length hack.
public static string ToAlphaNumeric(string input)
{
int j = 0;
char[] newCharArr = new char[input.Length];
for (int i = 0; i < input.Length; i++)
{
if (char.IsLetterOrDigit(input[i]))
{
newCharArr[j] = input[i];
j++;
}
}
Array.Resize(ref newCharArr, j);
return new string(newCharArr);
}
I´ve made a different solution, by eliminating the Control characters, which was my original problem.
It is better than putting in a list all the "special but good" chars
char[] arr = str.Where(c => !char.IsControl(c)).ToArray();
str = new string(arr);
it´s simpler, so I think it´s better !
Here's an extension method using #ata answer as inspiration.
"hello-world123, 456".MakeAlphaNumeric(new char[]{'-'});// yields "hello-world123456"
or if you require additional characters other than hyphen...
"hello-world123, 456!?".MakeAlphaNumeric(new char[]{'-','!'});// yields "hello-world123456!"
public static class StringExtensions
{
public static string MakeAlphaNumeric(this string input, params char[] exceptions)
{
var charArray = input.ToCharArray();
var alphaNumeric = Array.FindAll<char>(charArray, (c => char.IsLetterOrDigit(c)|| exceptions?.Contains(c) == true));
return new string(alphaNumeric);
}
}
If you are working in JS, here is a very terse version
myString = myString.replace(/[^A-Za-z0-9 -]/g, "");
I use a variation of one of the answers here. I want to replace spaces with "-" so its SEO friendly and also make lower case. Also not reference system.web from my services layer.
private string MakeUrlString(string input)
{
var array = input.ToCharArray();
array = Array.FindAll<char>(array, c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-');
var newString = new string(array).Replace(" ", "-").ToLower();
return newString;
}
There is a much easier way with Regex.
private string FixString(string str)
{
return string.IsNullOrEmpty(str) ? str : Regex.Replace(str, "[\\D]", "");
}

Categories