I have some strings in a .resx file include some sequences like this:
\u26A0 warning
So i use the following code to unscape it
str = Regex.Unescape(str);
Now, when i see the result everything works well (with \u) and it show the related emoji.
But Regex.Unescape(...) method dose not work when the input string is include \U like this:
\U0001F4D8 book
and it return this error:
Error: Unrecognized escape sequence \U
My question:
Is there another method in .Net framework to Unescape the sequences include \u and \U?
If there is not an embed method, how can i write a helper method manually to do it?
Edit:
When i read string from the resx file it has double backslash, i should convert these Unicode sequences to their characters:
Indeed, according to source code of Regex.Unescape, RegexParser.ScanCharEscape, \U is not handled.
Instead, you could consider a manual conversion with help of char.ConnvertFromUtf32:
string converted = char.ConvertFromUtf32(int.Parse("0001F4D8", NumberStyles.HexNumber));
This is a draft implementation. (The annoying complexity comes from an attempt to distinguish \U and \\U.)
static string Unescape(string str)
{
StringBuilder builder = new StringBuilder();
int startIndex = 0;
while(true)
{
int index = IndexOfBackslashU(str, startIndex);
if (index == -1)
return builder.Append(Regex.Unescape(str.Substring(startIndex))).ToString();
builder.Append(Regex.Unescape(str.Substring(startIndex, index - startIndex)));
string number = str.Substring(index + 2, 8);
builder.Append(char.ConvertFromUtf32(int.Parse(number, NumberStyles.HexNumber)));
startIndex = index + 10;
}
}
static int IndexOfBackslashU(string str, int startIndex)
{
while (true)
{
int index = str.IndexOf(#"\U", startIndex);
if (index == -1)
return index;
bool evenNumberOfPreviousBackslashes = true;
for (int k = index-1; k >= 0 && str[k] == '\\'; k--)
evenNumberOfPreviousBackslashes = !evenNumberOfPreviousBackslashes;
if (evenNumberOfPreviousBackslashes)
return index;
startIndex = index + 2;
}
}
I wrote this method and the problem solved:
public static string UnescapeIt(string str)
{
var regex = new Regex(#"(?<!\\)(?:\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8})", RegexOptions.Compiled);
return regex.Replace(str,
m =>
{
if (m.Value.IndexOf("\\U", StringComparison.Ordinal) > -1)
return char.ConvertFromUtf32(int.Parse(m.Value.Replace("\\U", ""), NumberStyles.HexNumber));
return Regex.Unescape(m.Value);
});
}
It unescape \u sequences and convert \U sequences to related character. So we can see the emojis.
Use:
str= UnescapeIt(str);
Result:
Update:
I changed the regex from
\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8}
to
(?<!\\)(?:\\u[0-9a-fA-F]{4}|\\U[0-9a-fA-F]{8})
Now it will fail the match if we have a backslash before \u or \U
So I have a string that I need to split by semicolon's
Email address: "one#tw;,.'o"#hotmail.com;"some;thing"#example.com
Both of the email addresses are valid
So I want to have a List<string> of the following:
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
But the way I am currently splitting the addresses is not working:
var addresses = emailAddressString.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim()).ToList();
Because of the multiple ; characters I end up with invalid email addresses.
I have tried a few different ways, even going down working out if the string contains quotes and then finding the index of the ; characters and working it out that way, but it's a real pain.
Does anyone have any better suggestions?
Assuming that double-quotes are not allowed, except for the opening and closing quotes ahead of the "at" sign #, you can use this regular expression to capture e-mail addresses:
((?:[^#"]+|"[^"]*")#[^;]+)(?:;|$)
The idea is to capture either an unquoted [^#"]+ or a quoted "[^"]*" part prior to #, and then capture everything up to semicolon ; or the end anchor $.
Demo of the regex.
var input = "\"one#tw;,.'o\"#hotmail.com;\"some;thing\"#example.com;hello#world";
var mm = Regex.Matches(input, "((?:[^#\"]+|\"[^\"]*\")#[^;]+)(?:;|$)");
foreach (Match m in mm) {
Console.WriteLine(m.Groups[1].Value);
}
This code prints
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
hello#world
Demo 1.
If you would like to allow escaped double-quotes inside double-quotes, you could use a more complex expression:
((?:(?:[^#\"]|(?<=\\)\")+|\"([^\"]|(?<=\\)\")*\")#[^;]+)(?:;|$)
Everything else remains the same.
Demo 2.
I obviously started writing my anti regex method at around the same time as juharr (Another answer). I thought that since I already have it written I would submit it.
public static IEnumerable<string> SplitEmailsByDelimiter(string input, char delimiter)
{
var startIndex = 0;
var delimiterIndex = 0;
while (delimiterIndex >= 0)
{
delimiterIndex = input.IndexOf(';', startIndex);
string substring = input;
if (delimiterIndex > 0)
{
substring = input.Substring(0, delimiterIndex);
}
if (!substring.Contains("\"") || substring.IndexOf("\"") != substring.LastIndexOf("\""))
{
yield return substring;
input = input.Substring(delimiterIndex + 1);
startIndex = 0;
}
else
{
startIndex = delimiterIndex + 1;
}
}
}
Then the following
var input = "blah#blah.com;\"one#tw;,.'o\"#hotmail.com;\"some;thing\"#example.com;hello#world;asdasd#asd.co.uk;";
foreach (var email in SplitEmailsByDelimiter(input, ';'))
{
Console.WriteLine(email);
}
Would give this output
blah#blah.com
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
hello#world
asdasd#asd.co.uk
You can also do this without using regular expressions. The following extension method will allow you to specify a delimiter character and a character to begin and end escape sequences. Note it does not validate that all escape sequences are closed.
public static IEnumerable<string> SpecialSplit(
this string str, char delimiter, char beginEndEscape)
{
int beginIndex = 0;
int length = 0;
bool escaped = false;
foreach (char c in str)
{
if (c == beginEndEscape)
{
escaped = !escaped;
}
if (!escaped && c == delimiter)
{
yield return str.Substring(beginIndex, length);
beginIndex += length + 1;
length = 0;
continue;
}
length++;
}
yield return str.Substring(beginIndex, length);
}
Then the following
var input = "\"one#tw;,.'o\"#hotmail.com;\"some;thing\"#example.com;hello#world;\"D;D#blah;blah.com\"";
foreach (var address in input.SpecialSplit(';', '"'))
Console.WriteLine(v);
While give this output
"one#tw;,.'o"#hotmail.com
"some;thing"#example.com
hello#world
"D;D#blah;blah.com"
Here's the version that works with an additional single escape character. It assumes that two consecutive escape characters should become one single escape character and it's escaping both the beginEndEscape charter so it will not trigger the beginning or end of an escape sequence and it also escapes the delimiter. Anything else that comes after the escape character will be left as is with the escape character removed.
public static IEnumerable<string> SpecialSplit(
this string str, char delimiter, char beginEndEscape, char singleEscape)
{
StringBuilder builder = new StringBuilder();
bool escapedSequence = false;
bool previousEscapeChar = false;
foreach (char c in str)
{
if (c == singleEscape && !previousEscapeChar)
{
previousEscapeChar = true;
continue;
}
if (c == beginEndEscape && !previousEscapeChar)
{
escapedSequence = !escapedSequence;
}
if (!escapedSequence && !previousEscapeChar && c == delimiter)
{
yield return builder.ToString();
builder.Clear();
continue;
}
builder.Append(c);
previousEscapeChar = false;
}
yield return builder.ToString();
}
Finally you probably should add null checking for the string that is passed in and note that both will return a sequence with one empty string if you pass in an empty string.
Is there a better way to replace strings?
I am surprised that Replace does not take in a character array or string array. I guess that I could write my own extension but I was curious if there is a better built in way to do the following? Notice the last Replace is a string not a character.
myString.Replace(';', '\n').Replace(',', '\n').Replace('\r', '\n').Replace('\t', '\n').Replace(' ', '\n').Replace("\n\n", "\n");
You can use a replace regular expression.
s/[;,\t\r ]|[\n]{2}/\n/g
s/ at the beginning means a search
The characters between [ and ] are the characters to search for (in any order)
The second / delimits the search-for text and the replace text
In English, this reads:
"Search for ; or , or \t or \r or (space) or exactly two sequential \n and replace it with \n"
In C#, you could do the following: (after importing System.Text.RegularExpressions)
Regex pattern = new Regex("[;,\t\r ]|[\n]{2}");
pattern.Replace(myString, "\n");
If you are feeling particularly clever and don't want to use Regex:
char[] separators = new char[]{' ',';',',','\r','\t','\n'};
string s = "this;is,\ra\t\n\n\ntest";
string[] temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
s = String.Join("\n", temp);
You could wrap this in an extension method with little effort as well.
Edit: Or just wait 2 minutes and I'll end up writing it anyway :)
public static class ExtensionMethods
{
public static string Replace(this string s, char[] separators, string newVal)
{
string[] temp;
temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
return String.Join( newVal, temp );
}
}
And voila...
char[] separators = new char[]{' ',';',',','\r','\t','\n'};
string s = "this;is,\ra\t\n\n\ntest";
s = s.Replace(separators, "\n");
You could use Linq's Aggregate function:
string s = "the\nquick\tbrown\rdog,jumped;over the lazy fox.";
char[] chars = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
string snew = chars.Aggregate(s, (c1, c2) => c1.Replace(c2, '\n'));
Here's the extension method:
public static string ReplaceAll(this string seed, char[] chars, char replacementCharacter)
{
return chars.Aggregate(seed, (str, cItem) => str.Replace(cItem, replacementCharacter));
}
Extension method usage example:
string snew = s.ReplaceAll(chars, '\n');
This is the shortest way:
myString = Regex.Replace(myString, #"[;,\t\r ]|[\n]{2}", "\n");
Strings are just immutable char arrays
You just need to make it mutable:
either by using StringBuilder
go in the unsafe world and play with pointers (dangerous though)
and try to iterate through the array of characters the least amount of times. Note the HashSet here, as it avoids to traverse the character sequence inside the loop. Should you need an even faster lookup, you can replace HashSet by an optimized lookup for char (based on an array[256]).
Example with StringBuilder
public static void MultiReplace(this StringBuilder builder,
char[] toReplace,
char replacement)
{
HashSet<char> set = new HashSet<char>(toReplace);
for (int i = 0; i < builder.Length; ++i)
{
var currentCharacter = builder[i];
if (set.Contains(currentCharacter))
{
builder[i] = replacement;
}
}
}
Edit - Optimized version (only valid for ASCII)
public static void MultiReplace(this StringBuilder builder,
char[] toReplace,
char replacement)
{
var set = new bool[256];
foreach (var charToReplace in toReplace)
{
set[charToReplace] = true;
}
for (int i = 0; i < builder.Length; ++i)
{
var currentCharacter = builder[i];
if (set[currentCharacter])
{
builder[i] = replacement;
}
}
}
Then you just use it like this:
var builder = new StringBuilder("my bad,url&slugs");
builder.MultiReplace(new []{' ', '&', ','}, '-');
var result = builder.ToString();
Ohhh, the performance horror!
The answer is a bit outdated, but still...
public static class StringUtils
{
#region Private members
[ThreadStatic]
private static StringBuilder m_ReplaceSB;
private static StringBuilder GetReplaceSB(int capacity)
{
var result = m_ReplaceSB;
if (null == result)
{
result = new StringBuilder(capacity);
m_ReplaceSB = result;
}
else
{
result.Clear();
result.EnsureCapacity(capacity);
}
return result;
}
public static string ReplaceAny(this string s, char replaceWith, params char[] chars)
{
if (null == chars)
return s;
if (null == s)
return null;
StringBuilder sb = null;
for (int i = 0, count = s.Length; i < count; i++)
{
var temp = s[i];
var replace = false;
for (int j = 0, cc = chars.Length; j < cc; j++)
if (temp == chars[j])
{
if (null == sb)
{
sb = GetReplaceSB(count);
if (i > 0)
sb.Append(s, 0, i);
}
replace = true;
break;
}
if (replace)
sb.Append(replaceWith);
else
if (null != sb)
sb.Append(temp);
}
return null == sb ? s : sb.ToString();
}
}
You may also simply write these string extension methods, and put them somewhere in your solution:
using System.Text;
public static class StringExtensions
{
public static string ReplaceAll(this string original, string toBeReplaced, string newValue)
{
if (string.IsNullOrEmpty(original) || string.IsNullOrEmpty(toBeReplaced)) return original;
if (newValue == null) newValue = string.Empty;
StringBuilder sb = new StringBuilder();
foreach (char ch in original)
{
if (toBeReplaced.IndexOf(ch) < 0) sb.Append(ch);
else sb.Append(newValue);
}
return sb.ToString();
}
public static string ReplaceAll(this string original, string[] toBeReplaced, string newValue)
{
if (string.IsNullOrEmpty(original) || toBeReplaced == null || toBeReplaced.Length <= 0) return original;
if (newValue == null) newValue = string.Empty;
foreach (string str in toBeReplaced)
if (!string.IsNullOrEmpty(str))
original = original.Replace(str, newValue);
return original;
}
}
Call them like this:
"ABCDE".ReplaceAll("ACE", "xy");
xyBxyDxy
And this:
"ABCDEF".ReplaceAll(new string[] { "AB", "DE", "EF" }, "xy");
xyCxyF
Use RegEx.Replace, something like this:
string input = "This is text with far too much " +
"whitespace.";
string pattern = "[;,]";
string replacement = "\n";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Here's more info on this MSDN documentation for RegEx.Replace
Performance-Wise this probably might not be the best solution but it works.
var str = "filename:with&bad$separators.txt";
char[] charArray = new char[] { '#', '%', '&', '{', '}', '\\', '<', '>', '*', '?', '/', ' ', '$', '!', '\'', '"', ':', '#' };
foreach (var singleChar in charArray)
{
str = str.Replace(singleChar, '_');
}
string ToBeReplaceCharacters = #"~()##$%&+,'"<>|;\/*?";
string fileName = "filename;with<bad:separators?";
foreach (var RepChar in ToBeReplaceCharacters)
{
fileName = fileName.Replace(RepChar.ToString(), "");
}
A .NET Core version for replacing a defined set of string chars to a specific char. It leverages the recently introduced Span type and string.Create method.
The idea is to prepare a replacement array, so no actual comparison operations would be required for the each string char. Thus, the replacement process reminds the way a state machine works. In order to avoid initialization of all items of the replacement array, let's store oldChar ^ newChar (XOR'ed) values there, what gives the following benefits:
If a char is not changing: ch ^ ch = 0 - no need to initialize non-changing items
The final char can be found by XOR'ing: ch ^ repl[ch]:
ch ^ 0 = ch - not changed chars case
ch ^ (ch ^ newChar) = newChar - replaced char
So the only requirement would be to ensure that the replacement array is zero-ed when initialized. We'll be using ArrayPool<char> to avoid allocations each time the ReplaceAll method is called. And, in order to ensure that the arrays are zero-ed without expensive call to Array.Clear method, we'll be maintaining a pool dedicated for the ReplaceAll method. We'll be clearing the replacement array (exact items only) before returning it to the pool.
public static class StringExtensions
{
private static readonly ArrayPool<char> _replacementPool = ArrayPool<char>.Create();
public static string ReplaceAll(this string str, char newChar, params char[] oldChars)
{
// If nothing to do, return the original string.
if (string.IsNullOrEmpty(str) ||
oldChars is null ||
oldChars.Length == 0)
{
return str;
}
// If only one character needs to be replaced,
// use the more efficient `string.Replace`.
if (oldChars.Length == 1)
{
return str.Replace(oldChars[0], newChar);
}
// Get a replacement array from the pool.
var replacements = _replacementPool.Rent(char.MaxValue + 1);
try
{
// Intialize the replacement array in the way that
// all elements represent `oldChar ^ newChar`.
foreach (var oldCh in oldChars)
{
replacements[oldCh] = (char)(newChar ^ oldCh);
}
// Create a string with replaced characters.
return string.Create(str.Length, (str, replacements), (dst, args) =>
{
var repl = args.replacements;
foreach (var ch in args.str)
{
dst[0] = (char)(repl[ch] ^ ch);
dst = dst.Slice(1);
}
});
}
finally
{
// Clear the replacement array.
foreach (var oldCh in oldChars)
{
replacements[oldCh] = char.MinValue;
}
// Return the replacement array back to the pool.
_replacementPool.Return(replacements);
}
}
}
I know this question is super old, but I want to offer 2 options that are more efficient:
1st off, the extension method posted by Paul Walls is good but can be made more efficient by using the StringBuilder class, which is like the string data type but made especially for situations where you will be changing string values more than once. Here is a version I made of the extension method using StringBuilder:
public static string ReplaceChars(this string s, char[] separators, char newVal)
{
StringBuilder sb = new StringBuilder(s);
foreach (var c in separators) { sb.Replace(c, newVal); }
return sb.ToString();
}
I ran this operation 100,000 times and using StringBuilder took 73ms compared to 81ms using string. So the difference is typically negligible, unless you're running many operations or using a huge string.
Secondly, here is a 1 liner loop you can use:
foreach (char c in separators) { s = s.Replace(c, '\n'); }
I personally think this is the best option. It is highly efficient and doesn't require writing an extension method. In my testing this ran the 100k iterations in only 63ms, making it the most efficient.
Here is an example in context:
string s = "this;is,\ra\t\n\n\ntest";
char[] separators = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
foreach (char c in separators) { s = s.Replace(c, '\n'); }
Credit to Paul Walls for the first 2 lines in this example.
I also fiddled around with that problem, and found that most of the solutions here are very slow. The fastest one was actually the LINQ + Aggregate method that dodgy_coder posted.
But I thought, well that might be also quite heavy in memory allocations depending upon how many old characters there are. So I came out with this:
The idea here is to have a cached replacement map of the old characters for the current thread, to safe allocations. And other than that just working with a character array of the input that later on is returned as string again. Whereas the character array is modified as less as possible.
[ThreadStatic]
private static bool[] replaceMap;
public static string Replace(this string input, char[] oldChars, char newChar)
{
if (input == null) throw new ArgumentNullException(nameof(input));
if (oldChars == null) throw new ArgumentNullException(nameof(oldChars));
if (oldChars.Length == 1) return input.Replace(oldChars[0], newChar);
if (oldChars.Length == 0) return input;
replaceMap = replaceMap ?? new bool[char.MaxValue + 1];
foreach (var oldChar in oldChars)
{
replaceMap[oldChar] = true;
}
try
{
var count = input.Length;
var output = input.ToCharArray();
for (var i = 0; i < count; i++)
{
if (replaceMap[input[i]])
{
output[i] = newChar;
}
}
return new string(output);
}
finally
{
foreach (var oldChar in oldChars)
{
replaceMap[oldChar] = false;
}
}
}
For me this is at most two allocations for the actual input string to work on. A StringBuilder turned out to be much slower for me for some reasons. And it is 2 times faster than the LINQ variant.
No "Replace" (Linq only):
string myString = ";,\r\t \n\n=1;;2,,3\r\r4\t\t5 6\n\n\n\n7=";
char NoRepeat = '\n';
string ByeBye = ";,\r\t ";
string myResult = myString.ToCharArray().Where(t => !"STOP-OUTSIDER".Contains(t))
.Select(t => "" + ( ByeBye.Contains(t) ? '\n' : t))
.Aggregate((all, next) => (
next == "" + NoRepeat && all.Substring(all.Length - 1) == "" + NoRepeat
? all : all + next ) );
Having built my own solution, and looking at the solution used here, I leveraged an answer that isn't using complex code and is generally efficient for most parameters.
Cover base cases where other methods are more appropriate. If there are no chars to replacement, return the original string. If there is only one, just use the Replace method.
Use a StringBuilder and initialize the capacity to the length of the original string. After all, the new string being built will have the same length of the original string if its just chars being replaced. This ensure only 1 memory allocation is used for the new string.
Assuming that the 'char' length could be small or large will impact performance. Large collections are better with hashsets, while smaller collections are not. This is a near-perfect use case for Hybrid Dictionaries. They switch to using a Hash based lookup once the collection gets too large. However, we don't care about the value of the dictionary, so I just set it to "true".
Have different methods for StringBuilder verse just a string will prevent unnecessary memory allocation. If its just a string, don't instantiate a StringBuilder unless the base cases were checked. If its already a StringBuilder, then perform the replacements and return the StringBuilder itself (as other StringBuilder methods like Append do).
I put the replacement char first, and the chars to check at the end. This way, I can leverage the params keyword for easily passing additional strings. However, you don't have to do this if you prefer the other order.
namespace Test.Extensions
{
public static class StringExtensions
{
public static string ReplaceAll(this string str, char replacementCharacter, params char[] chars)
{
if (chars.Length == 0)
return str;
if (chars.Length == 1)
return str.Replace(chars[0], replacementCharacter);
StringBuilder sb = new StringBuilder(str.Length);
var searcher = new HybridDictionary(chars.Length);
for (int i = 0; i < chars.Length; i++)
searcher[chars[i]] = true;
foreach (var c in str)
{
if (searcher.Contains(c))
sb.Append(replacementCharacter);
else
sb.Append(c);
}
return sb.ToString();
}
public static StringBuilder ReplaceAll(this StringBuilder sb, char replacementCharacter, params char[] chars)
{
if (chars.Length == 0)
return sb;
if (chars.Length == 1)
return sb.Replace(chars[0], replacementCharacter);
var searcher = new HybridDictionary(chars.Length);
for (int i = 0; i < chars.Length; i++)
searcher[chars[i]] = true;
for (int i = 0; i < sb.Length; i++)
{
var val = sb[i];
if (searcher.Contains(val))
sb[i] = replacementCharacter;
}
return sb;
}
}
}
I would like to know how to reverse the process of the below DecodeBinaryBase64 so that I can have a matching Encode method. In short C# code that if given the output of this method it would return the same string that it took as input.
private static string DecodeBinaryBase64(string stringToDecode)
{
StringBuilder builder = new StringBuilder();
foreach (var b in Convert.FromBase64String(stringToDecode))
builder.Append(string.Format("{0:X2}", b));
return builder.ToString();
}
Here is an example of an encoded string and its decoded counterpart. The result is a SHA1 hash for a file. The above method is an example of understanding how the decoding works to get to the right string.
ENCODED
/KUGOuoESMWYuDb+BTMK1LaGe7k=
DECODED
FCA5063AEA0448C598B836FE05330AD4B6867BB9
or
0xFCA5063AEA0448C598B836FE05330AD4B6867BB9
Updated to reflect correct SHA1 value thanks to Porges and a fix for hex bug found by Dean 'codeka' Hardin.
Implemented Solution
Here is the the implementation I have now, it is from Porges post distilled down to two methods.
private static string EncodeFileDigestBase64(string digest)
{
byte[] result = new byte[digest.Length / 2];
for (int i = 0; i < digest.Length; i += 2)
result[i / 2] = byte.Parse(digest.Substring(i, 2), System.Globalization.NumberStyles.HexNumber);
if (result.Length != 20)
throw new ArgumentException("Not a valid SHA1 filedigest.");
return Convert.ToBase64String(result);
}
private static string DecodeFileDigestBase64(string encodedDigest)
{
byte[] base64bytes = Convert.FromBase64String(encodedDigest);
return string.Join(string.Empty, base64bytes.Select(x => x.ToString("X2")));
}
I don't believe it's physically possible. The problem is that string.Format("{0:X}", b) will return either 1 or 2 characters depending on whether the input byte is < 16 or not. And you've got no way to know once the string has been joined together.
If you can modify the DecodeBinaryBase64 method so that it always appends two character for each byte, i.e. by using string.Format("{0:X2}", b) then it will be possible by just taking the input string two characters at a time.
If you made that change to your DecodeBinaryBase64, then you can use the following to convert back again:
private static string DecodeBinaryBase64(string stringToDecode)
{
StringBuilder builder = new StringBuilder();
foreach (var b in Convert.FromBase64String(stringToDecode))
builder.Append(string.Format("{0:X2}", b));
return "0x" + builder.ToString();
}
private static string EncodeBinaryBase64(string stringToEncode)
{
var binary = new List<byte>();
for(int i = 2; i < stringToEncode.Length; i += 2)
{
string s = new string(new [] {stringToEncode[i], stringToEncode[i+1]});
binary.Add(byte.Parse(s, NumberStyles.HexNumber));
}
return Convert.ToBase64String(binary.ToArray());
}
(Error checking and so on is missing, though)
Well, you're going from Base-64 to an ASCII/UTF-8 string - and then outputting each character as a 2-digit hex value.
I don't know of any way to automatically get that back. You may have to pull out two characters at a time, cast those as a "char", and use string.format() to turn those back into characters, maybe?
I've never seen the need to take hex output like that, and turn it back into a real string before. Hope that helps.
So I expanded my answer a bit:
/** Here are the methods in question: **/
string Encode(string input)
{
return SHA1ToBase64String(StringToBytes(input));
}
string Decode(string input)
{
return BytesToString(Base64StringToSHA1(input));
}
/****/
string BytesToString(byte[] bytes)
{
return string.Join("",bytes.Select(x => x.ToString("X2")));
}
byte[] StringToBytes(string input)
{
var result = new byte[input.Length/2];
for (var i = 0; i < input.Length; i+=2)
result[i/2] = byte.Parse(input.Substring(i,2), System.Globalization.NumberStyles.HexNumber);
return result;
}
string SHA1ToBase64String(byte[] hash)
{
if (hash.Length != 20)
throw new Exception("Not an SHA-1 hash.");
return Convert.ToBase64String(hash);
}
byte[] Base64StringToSHA1(string input)
{
return Convert.FromBase64String(input);
}
void Main() {
var encoded = "/KUGOuoESMWYuDb+BTMK1LaGe7k=";
var decoded = Decode(encoded);
var reencoded = Encode(decoded);
Console.WriteLine(encoded == reencoded); //True
Console.WriteLine(decoded);
// FCA5063AEA0448C598B836FE05330AD4B6867BB9
}
I guess the confusion in other comments was over whether you want to provide a left-inverse or a right-inverse.
That is do you want a function "f" that does:
f(Decode(x)) == x // "left inverse"
or:
Decode(f(x)) == x // "right inverse"
I assumed the latter, because you said (1st comment on other answer) that you wanted to be able to replicate Microsoft's encoding. (And what Dean noted - your function wasn't providing reversible output.) :)
Either way the above reimplements your version for correct output, so both functions are inverses of each other.
I have a C# routine that imports data from a CSV file, matches it against a database and then rewrites it to a file. The source file seems to have a few non-ASCII characters that are fouling up the processing routine.
I already have a static method that I run each input field through but it performs basic checks like removing commas and quotes. Does anybody know how I could add functionality that removes non-ASCII characters too?
Here a simple solution:
public static bool IsASCII(this string value)
{
// ASCII encoding replaces non-ascii with question marks, so we use UTF8 to see if multi-byte sequences are there
return Encoding.UTF8.GetByteCount(value) == value.Length;
}
source: http://snipplr.com/view/35806/
string sOut = Encoding.ASCII.GetString(Encoding.ASCII.GetBytes(s))
Do it all at once
public string ReturnCleanASCII(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
foreach(char c in s)
{
if((int)c > 127) // you probably don't want 127 either
continue;
if((int)c < 32) // I bet you don't want control characters
continue;
if(c == ',')
continue;
if(c == '"')
continue;
sb.Append(c);
}
return sb.ToString();
}
If you wanted to test a specific character, you could use
if ((int)myChar <= 127)
Just getting the ASCII encoding of the string will not tell you that a specific character was non-ASCII to begin with (if you care about that). See MSDN.
Here's an improvement upon the accepted answer:
string fallbackStr = "";
Encoding enc = Encoding.GetEncoding(Encoding.ASCII.CodePage,
new EncoderReplacementFallback(fallbackStr),
new DecoderReplacementFallback(fallbackStr));
string cleanStr = enc.GetString(enc.GetBytes(inputStr));
This method will replace unknown characters with the value of fallbackStr, or if fallbackStr is empty, leave them out entirely. (Note that enc can be defined outside the scope of a function.)
It sounds kind of strange that it's accepted to drop the non-ASCII.
Also I always recommend the excellent FileHelpers library for parsing CSV-files.
strText = Regex.Replace(strText, #"[^\u0020-\u007E]", string.Empty);
public string RunCharacterCheckASCII(string s)
{
string str = s;
bool is_find = false;
char ch;
int ich = 0;
try
{
char[] schar = str.ToCharArray();
for (int i = 0; i < schar.Length; i++)
{
ch = schar[i];
ich = (int)ch;
if (ich > 127) // not ascii or extended ascii
{
is_find = true;
schar[i] = '?';
}
}
if (is_find)
str = new string(schar);
}
catch (Exception ex)
{
}
return str;
}