C# - Converting string data to actual string representation - c#

I have a device that sends out string data representing its settings, but the data is encoded in the string as its unicode(?) representation. For example, the device sends the string "00530079007300740065006D" which represents the string "System". Are there function built into C# that will convert the string data sent from the device to the actual string? If not, can someone advise me on how best to perform this conversion?

Totally jumping on #Will Dean's solution bandwagon here (so dont mark me as the answer).
If you're using Will's solution alot, I suggest wrapping it into a string extension:
public static class StringExtensions
{
public static string HexToString(this string input)
{
return new String(Enumerable.Range(0, input.Length/4)
.Select(idx => (char) int.Parse(input.Substring(idx*4,4),
NumberStyles.HexNumber)).ToArray());
}
}

This isn't a built-in function, but it is only one line of code:
string input = "00530079007300740065006D";
String output = new String(Enumerable.Range(0, input.Length/4)
.Select(idx => (char)int.Parse(input.Substring(idx * 4,4),
NumberStyles.HexNumber)).ToArray());
Here's another one, which is perhaps a little more high-minded in not doing the hacky char/int cast:
string out2 =
Encoding.BigEndianUnicode.GetString(Enumerable.Range(0, input.Length/2)
.Select(idx => byte.Parse(input.Substring(idx * 2, 2),
NumberStyles.HexNumber)).ToArray());

Yes, .NET has a built in way to move from a given encoding type to another type, say UTF8 -> ASCII or somesuch. The namespace you'll need is System.Text.Encoding.

Related

How can i convert string value to rial money unit?

I'm beginner in c# and have this value of string:
123456
but want convert that string to my country money, want convert that string value to this:
123,456
always split three numbers with comma for example, if string number is this:
1234567890
Show to user this:
1,234,567,890
How can i write code that purpose?
I would suggest convert it to int (or long) first and then use ToString() and supply required format.
int number = int.Parse(numberString); //ex..
number.ToString("N0"); // 1,000,000
If you're asking about culture-specific formatting, then you could do this.
number.ToString("N0", CultureInfo.CreateSpecificCulture("es-US"));
You can explore more on standard numeric formats
Example code
Use the standard formatters and the CultureInfo for the desired country.
e.g
int i = int.Parse("123456");
string money = i.ToString("C", CultureInfo.CreateSpecificCulture("fr-Ir"));
Or if the system culture is fr-Ir
string money = i.ToString("C");
Which is the same as
string money = i.ToString("C", CultureInfo.CurrentCulture);
Or if you want to use the UI culture (the culture of the requesting browser)
string money = i.ToString("C", CultureInfo.CurrentUICulture);
Since you want to convert your value to currency, I would suggest using "C" of string formats provided by .NET.
123456.125M.ToString("C"); // $123,456.13
Sign infront of the string will be defined by the culture of your machine. More information here.
On the other hand, there is another solution to add your own custom format:
123456.125M.ToString("#,0.################"); // 123,456.125
It is not the clean way, but I have not since found a correct way of actually formating this in generic way.
Side note: for currency handling it is generally considered a good practise to use decimal. Since it does not have a floating point issue.
Please try this one hope will help
Just whats inside the void method
using System.Linq;
public class Program
{
public void ABC()
{
var data = "123456789";
const int separateOnLength = N;
var separated = new string(
data.Select((x,i) => i > 0 && i % separateOnLength == 0 ? new [] { ',', x } : new [] { x })
.SelectMany(x => x)
.ToArray()
);
}
}

Using C#6 string interpolation like String.Format [duplicate]

C#6.0 have a string interpolation - a nice feature to format strings like:
var name = "John";
WriteLine($"My name is {name}");
The example is converted to
var name = "John";
WriteLine(String.Format("My name is {0}", name));
From the localization point of view, it is much better to store strings like :
"My name is {name} {middlename} {surname}"
than in String.Format notation:
"My name is {0} {1} {2}"
How to use the string interpolation for .NET localization? Is there going to be a way to put $"..." to resource files? Or should strings be stored like "...{name}" and somehow interpolated on fly?
P.S. This question is NOT about "how to make string.FormatIt extension" (there are A LOT of such libraries, SO answers, etc.). This question is about something like Roslyn extension for "string interpolation" in "localization" context (both are terms in MS .NET vocabulary), or dynamic usage like Dylan proposed.
An interpolated string evaluates the block between the curly braces as a C# expression (e.g. {expression}, {1 + 1}, {person.FirstName}).
This means that the expressions in an interpolated string must reference names in the current context.
For example this statement will not compile:
var nameFormat = $"My name is {name}"; // Cannot use *name*
// before it is declared
var name = "Fred";
WriteLine(nameFormat);
Similarly:
class Program
{
const string interpolated = $"{firstName}"; // Name *firstName* does not exist
// in the current context
static void Main(string[] args)
{
var firstName = "fred";
Console.WriteLine(interpolated);
Console.ReadKey();
}
}
To answer your question:
There is no current mechanism provided by the framework to evaluate interpolated strings at runtime. Therefore, you cannot store strings and interpolate on the fly out of the box.
There are libraries that exist that handle runtime interpolation of strings.
According to this discussion on the Roslyn codeplex site, string interpolation will likely not be compatible with resource files (emphasis mine):
String interpolation could be neater and easier to debug than either String.Format or concatenation...
Dim y = $"Robot {name} reporting
{coolant.name} levels are {coolant.level}
{reactor.name} levels are {reactor.level}"
However, this example is fishy. Most professional programmers won't be writing
user-facing strings in code. Instead they'll be storing those strings in resources (.resw, .resx or .xlf) for reasons of localization. So there doesn't seem much use for string interpolation here.
Assuming that your question is more about how to localise interpolated strings in your source code, and not how to handle interpolated string resources...
Given the example code:
var name = "John";
var middlename = "W";
var surname = "Bloggs";
var text = $"My name is {name} {middlename} {surname}";
Console.WriteLine(text);
The output is obviously:
My name is John W Bloggs
Now change the text assignment to fetch a translation instead:
var text = Translate($"My name is {name} {middlename} {surname}");
Translate is implemented like this:
public static string Translate(FormattableString text)
{
return string.Format(GetTranslation(text.Format),
text.GetArguments());
}
private static string GetTranslation(string text)
{
return text; // actually use gettext or whatever
}
You need to provide your own implementation of GetTranslation; it will receive a string like "My name is {0} {1} {2}" and should use GetText or resources or similar to locate and return a suitable translation for this, or just return the original parameter to skip translation.
You will still need to document for your translators what the parameter numbers mean; the text used in the original code string doesn't exist at runtime.
If, for example, in this case GetTranslation returned "{2}. {0} {2}, {1}. Don't wear it out." (hey, localisation is not just about language!) then the output of the full program would be:
Bloggs. John Bloggs, W. Don't wear it out.
Having said this, while using this style of translation is easy to develop, it's hard to actually translate, since the strings are buried in the code and only surface at runtime. Unless you have a tool that can statically explore your code and extract all the translatable strings (without having to hit that code path at runtime), you're better off using more traditional resx files, since they inherently give you a table of text to be translated.
As already said in previous answers: you currently cannot load the format string at runtime (e.g. from resource files) for string interpolation because it is used at compile time.
If you don't care about the compile time feature and just want to have named placeholders, you could use something like this extension method:
public static string StringFormat(this string input, Dictionary<string, object> elements)
{
int i = 0;
var values = new object[elements.Count];
foreach (var elem in elements)
{
input = Regex.Replace(input, "{" + Regex.Escape(elem.Key) + "(?<format>[^}]+)?}", "{" + i + "${format}}");
values[i++] = elem.Value;
}
return string.Format(input, values);
}
Be aware that you cannot have inline expressions like {i+1} here and that this is not code with best performance.
You can use this with a dictionary you load from resource files or inline like this:
var txt = "Hello {name} on {day:yyyy-MM-dd}!".StringFormat(new Dictionary<string, object>
{
["name"] = "Joe",
["day"] = DateTime.Now,
});
String interpolation is difficult to combine with localization because the compiler prefers to translate it to string.Format(...), which does not support localization. However, there is a trick that makes it possible to combine localization and string interpolation; it is described near the end of this article.
Normally string interpolation is translated to string.Format, whose behavior cannot be customized. However, in much the same way as lambda methods sometimes become expression trees, the compiler will switch from string.Format to FormattableStringFactory.Create (a .NET 4.6 method) if the target method accepts a System.FormattableString object.
The problem is, the compiler prefers to call string.Format if possible, so if there were an overload of Localized() that accepted FormattableString, it would not work with string interpolation because the C# compiler would simply ignore it [because there is an overload that accepts a plain string]. Actually, it's worse than that: the compiler also refuses to use FormattableString when calling an extension method.
It can work if you use a non-extension method. For example:
static class Loca
{
public static string lize(this FormattableString message)
{ return message.Format.Localized(message.GetArguments()); }
}
Then you can use it like this:
public class Program
{
public static void Main(string[] args)
{
Localize.UseResourceManager(Resources.ResourceManager);
var name = "Dave";
Console.WriteLine(Loca.lize($"Hello, {name}"));
}
}
It's important to realize that the compiler converts the $"..." string into an old-fashioned format string. So in this example, Loca.lize actually receives "Hello, {0}" as the format string, not "Hello, {name}".
Using the Microsoft.CodeAnalysis.CSharp.Scripting package you can achieve this.
You will need to create an object to store the data in, below a dynamic object is used. You could also create an specific class with all the properties required. The reason to wrap the dynamic object in a class in described here.
public class DynamicData
{
public dynamic Data { get; } = new ExpandoObject();
}
You can then use it as shown below.
var options = ScriptOptions.Default
.AddReferences(
typeof(Microsoft.CSharp.RuntimeBinder.RuntimeBinderException).GetTypeInfo().Assembly,
typeof(System.Runtime.CompilerServices.DynamicAttribute).GetTypeInfo().Assembly);
var globals = new DynamicData();
globals.Data.Name = "John";
globals.Data.MiddleName = "James";
globals.Data.Surname = "Jamison";
var text = "My name is {Data.Name} {Data.MiddleName} {Data.Surname}";
var result = await CSharpScript.EvaluateAsync<string>($"$\"{text}\"", options, globals);
This is compiling the snippet of code and executing it, so it is true C# string interpolation. Though you will have to take into account the performance of this as it is actually compiling and executing your code at runtime. To get around this performance hit if you could use CSharpScript.Create to compile and cache the code.
The C# 6.0 string interpolation won't help you if the format string is not in your C# source code. In that case, you will have to use some other solution, like this library.
If we use interpolation then we are thinking in terms of methods, not constants. In that case we could define our translations as methods:
public abstract class InterpolatedText
{
public abstract string GreetingWithName(string firstName, string lastName);
}
public class InterpolatedTextEnglish : InterpolatedText
{
public override string GreetingWithName(string firstName, string lastName) =>
$"Hello, my name is {firstName} {lastName}.";
}
We can then load an implementation of InterpolatedText for a specific culture. This also provides a way to implement fallback, as one implementation can inherit from another. If English is the default language and other implementations inherit from it, there will at least be something to display until a translation is provided.
This seems a bit unorthodox, but offers some benefits:
Primarily, the string used for interpolation is always stored in a strongly-typed method with clearly-specified arguments.
Given this: "Hello, my name is {0} {1}" can we determine that the placeholders represent first name and last name in that order? There will always be a method which matches values to placeholders, but there's less room for confusion when the interpolated string is stored with its arguments.
Similarly, if we store our translation strings in one place and use them in another, it becomes possible to modify them in a way that breaks the code using them. We can add {2} to a string which will be used elsewhere, and that code will fail at runtime.
Using string interpolation this is impossible. If our translation string doesn't match the available arguments it won't even compile.
There are drawbacks, although I see difficulty in maintaining any solution.
The greatest is portability. If your translation is coded in C# and you switch, it's not the easiest thing to export all of your translations.
It also means that if you wish to farm out translations to different individuals (unless you have one person who speaks everything) then the translators must modify code. It's easy code, but code nonetheless.
Interpolated strings can not refactored out from their (variable) scope because of using of the embedded variables in them.
The only way to relocate the string literal part is passing the scope bound variables as parameter to an other location, and mark their position in the string with special placeholders. However this solution is already "invented" and out there:
string.Format("literal with placeholers", parameters);
or some of advanced library (interpolating runtime), but using the very same concept (passing parameters).
Then you can refactor out the "literal with placeholers" to a resource.

C#6.0 string interpolation localization

C#6.0 have a string interpolation - a nice feature to format strings like:
var name = "John";
WriteLine($"My name is {name}");
The example is converted to
var name = "John";
WriteLine(String.Format("My name is {0}", name));
From the localization point of view, it is much better to store strings like :
"My name is {name} {middlename} {surname}"
than in String.Format notation:
"My name is {0} {1} {2}"
How to use the string interpolation for .NET localization? Is there going to be a way to put $"..." to resource files? Or should strings be stored like "...{name}" and somehow interpolated on fly?
P.S. This question is NOT about "how to make string.FormatIt extension" (there are A LOT of such libraries, SO answers, etc.). This question is about something like Roslyn extension for "string interpolation" in "localization" context (both are terms in MS .NET vocabulary), or dynamic usage like Dylan proposed.
An interpolated string evaluates the block between the curly braces as a C# expression (e.g. {expression}, {1 + 1}, {person.FirstName}).
This means that the expressions in an interpolated string must reference names in the current context.
For example this statement will not compile:
var nameFormat = $"My name is {name}"; // Cannot use *name*
// before it is declared
var name = "Fred";
WriteLine(nameFormat);
Similarly:
class Program
{
const string interpolated = $"{firstName}"; // Name *firstName* does not exist
// in the current context
static void Main(string[] args)
{
var firstName = "fred";
Console.WriteLine(interpolated);
Console.ReadKey();
}
}
To answer your question:
There is no current mechanism provided by the framework to evaluate interpolated strings at runtime. Therefore, you cannot store strings and interpolate on the fly out of the box.
There are libraries that exist that handle runtime interpolation of strings.
According to this discussion on the Roslyn codeplex site, string interpolation will likely not be compatible with resource files (emphasis mine):
String interpolation could be neater and easier to debug than either String.Format or concatenation...
Dim y = $"Robot {name} reporting
{coolant.name} levels are {coolant.level}
{reactor.name} levels are {reactor.level}"
However, this example is fishy. Most professional programmers won't be writing
user-facing strings in code. Instead they'll be storing those strings in resources (.resw, .resx or .xlf) for reasons of localization. So there doesn't seem much use for string interpolation here.
Assuming that your question is more about how to localise interpolated strings in your source code, and not how to handle interpolated string resources...
Given the example code:
var name = "John";
var middlename = "W";
var surname = "Bloggs";
var text = $"My name is {name} {middlename} {surname}";
Console.WriteLine(text);
The output is obviously:
My name is John W Bloggs
Now change the text assignment to fetch a translation instead:
var text = Translate($"My name is {name} {middlename} {surname}");
Translate is implemented like this:
public static string Translate(FormattableString text)
{
return string.Format(GetTranslation(text.Format),
text.GetArguments());
}
private static string GetTranslation(string text)
{
return text; // actually use gettext or whatever
}
You need to provide your own implementation of GetTranslation; it will receive a string like "My name is {0} {1} {2}" and should use GetText or resources or similar to locate and return a suitable translation for this, or just return the original parameter to skip translation.
You will still need to document for your translators what the parameter numbers mean; the text used in the original code string doesn't exist at runtime.
If, for example, in this case GetTranslation returned "{2}. {0} {2}, {1}. Don't wear it out." (hey, localisation is not just about language!) then the output of the full program would be:
Bloggs. John Bloggs, W. Don't wear it out.
Having said this, while using this style of translation is easy to develop, it's hard to actually translate, since the strings are buried in the code and only surface at runtime. Unless you have a tool that can statically explore your code and extract all the translatable strings (without having to hit that code path at runtime), you're better off using more traditional resx files, since they inherently give you a table of text to be translated.
As already said in previous answers: you currently cannot load the format string at runtime (e.g. from resource files) for string interpolation because it is used at compile time.
If you don't care about the compile time feature and just want to have named placeholders, you could use something like this extension method:
public static string StringFormat(this string input, Dictionary<string, object> elements)
{
int i = 0;
var values = new object[elements.Count];
foreach (var elem in elements)
{
input = Regex.Replace(input, "{" + Regex.Escape(elem.Key) + "(?<format>[^}]+)?}", "{" + i + "${format}}");
values[i++] = elem.Value;
}
return string.Format(input, values);
}
Be aware that you cannot have inline expressions like {i+1} here and that this is not code with best performance.
You can use this with a dictionary you load from resource files or inline like this:
var txt = "Hello {name} on {day:yyyy-MM-dd}!".StringFormat(new Dictionary<string, object>
{
["name"] = "Joe",
["day"] = DateTime.Now,
});
String interpolation is difficult to combine with localization because the compiler prefers to translate it to string.Format(...), which does not support localization. However, there is a trick that makes it possible to combine localization and string interpolation; it is described near the end of this article.
Normally string interpolation is translated to string.Format, whose behavior cannot be customized. However, in much the same way as lambda methods sometimes become expression trees, the compiler will switch from string.Format to FormattableStringFactory.Create (a .NET 4.6 method) if the target method accepts a System.FormattableString object.
The problem is, the compiler prefers to call string.Format if possible, so if there were an overload of Localized() that accepted FormattableString, it would not work with string interpolation because the C# compiler would simply ignore it [because there is an overload that accepts a plain string]. Actually, it's worse than that: the compiler also refuses to use FormattableString when calling an extension method.
It can work if you use a non-extension method. For example:
static class Loca
{
public static string lize(this FormattableString message)
{ return message.Format.Localized(message.GetArguments()); }
}
Then you can use it like this:
public class Program
{
public static void Main(string[] args)
{
Localize.UseResourceManager(Resources.ResourceManager);
var name = "Dave";
Console.WriteLine(Loca.lize($"Hello, {name}"));
}
}
It's important to realize that the compiler converts the $"..." string into an old-fashioned format string. So in this example, Loca.lize actually receives "Hello, {0}" as the format string, not "Hello, {name}".
Using the Microsoft.CodeAnalysis.CSharp.Scripting package you can achieve this.
You will need to create an object to store the data in, below a dynamic object is used. You could also create an specific class with all the properties required. The reason to wrap the dynamic object in a class in described here.
public class DynamicData
{
public dynamic Data { get; } = new ExpandoObject();
}
You can then use it as shown below.
var options = ScriptOptions.Default
.AddReferences(
typeof(Microsoft.CSharp.RuntimeBinder.RuntimeBinderException).GetTypeInfo().Assembly,
typeof(System.Runtime.CompilerServices.DynamicAttribute).GetTypeInfo().Assembly);
var globals = new DynamicData();
globals.Data.Name = "John";
globals.Data.MiddleName = "James";
globals.Data.Surname = "Jamison";
var text = "My name is {Data.Name} {Data.MiddleName} {Data.Surname}";
var result = await CSharpScript.EvaluateAsync<string>($"$\"{text}\"", options, globals);
This is compiling the snippet of code and executing it, so it is true C# string interpolation. Though you will have to take into account the performance of this as it is actually compiling and executing your code at runtime. To get around this performance hit if you could use CSharpScript.Create to compile and cache the code.
The C# 6.0 string interpolation won't help you if the format string is not in your C# source code. In that case, you will have to use some other solution, like this library.
If we use interpolation then we are thinking in terms of methods, not constants. In that case we could define our translations as methods:
public abstract class InterpolatedText
{
public abstract string GreetingWithName(string firstName, string lastName);
}
public class InterpolatedTextEnglish : InterpolatedText
{
public override string GreetingWithName(string firstName, string lastName) =>
$"Hello, my name is {firstName} {lastName}.";
}
We can then load an implementation of InterpolatedText for a specific culture. This also provides a way to implement fallback, as one implementation can inherit from another. If English is the default language and other implementations inherit from it, there will at least be something to display until a translation is provided.
This seems a bit unorthodox, but offers some benefits:
Primarily, the string used for interpolation is always stored in a strongly-typed method with clearly-specified arguments.
Given this: "Hello, my name is {0} {1}" can we determine that the placeholders represent first name and last name in that order? There will always be a method which matches values to placeholders, but there's less room for confusion when the interpolated string is stored with its arguments.
Similarly, if we store our translation strings in one place and use them in another, it becomes possible to modify them in a way that breaks the code using them. We can add {2} to a string which will be used elsewhere, and that code will fail at runtime.
Using string interpolation this is impossible. If our translation string doesn't match the available arguments it won't even compile.
There are drawbacks, although I see difficulty in maintaining any solution.
The greatest is portability. If your translation is coded in C# and you switch, it's not the easiest thing to export all of your translations.
It also means that if you wish to farm out translations to different individuals (unless you have one person who speaks everything) then the translators must modify code. It's easy code, but code nonetheless.
Interpolated strings can not refactored out from their (variable) scope because of using of the embedded variables in them.
The only way to relocate the string literal part is passing the scope bound variables as parameter to an other location, and mark their position in the string with special placeholders. However this solution is already "invented" and out there:
string.Format("literal with placeholers", parameters);
or some of advanced library (interpolating runtime), but using the very same concept (passing parameters).
Then you can refactor out the "literal with placeholers" to a resource.

c# How to process the string?

I connect to a webservice that gives me a response something like this(This is not the whole string, but you get the idea):
sResponse = "{\"Name\":\" Bod\u00f8\",\"homePage\":\"http:\/\/www.example.com\"}";
As you can see, the "Bod\u00f8" is not as it should be.
Therefor i tried to convert the unicode (\u00f8) to char by doing this with the string:
public string unicodeToChar(string sString)
{
StringBuilder sb = new StringBuilder();
foreach (char chars in sString)
{
if (chars >= 32 && chars <= 255)
{
sb.Append(chars);
}
else
{
// Replacement character
sb.Append((char)chars);
}
}
sString = sb.ToString();
return sString;
}
But it won't work, probably because the string is shown as \u00f8, and not \u00f8.
Now it would not be a problem if \u00f8 was the only unicode i had to convert, but i got many more of the unicodes.
That means that i can't just use the replace function :(
Hope someone can help.
You're basically talking about converting from JSON (JavaScript Object Notation). Try this link--near the bottom you'll see a list of publicly available libraries, including some in C#, that might do what you need.
The excellent Json.NET library has no problems decoding unicode escape sequences:
var sResponse = "{\"Name\":\"Bod\u00f8\",\"homePage\":\"http://www.ex.com\"}";
var obj = (JObject)JsonConvert.DeserializeObject(sResponse);
var name = ((JValue)obj["Name"]).Value;
var homePage = ((JValue)obj["homePage"]).Value;
Debug.Assert(Equals(name, "Bodø"));
Debug.Assert(Equals(homePage, "http://www.ex.com"));
This also allows you to deserialize to real POCO objects, making the code even cleaner (although less dynamic).
var obj = JsonConvert.DeserializeObject<Response>(sResponse);
Debug.Assert(obj2.Name == "Bodø");
Debug.Assert(obj2.HomePage == "http://www.ex.com");
public class Response
{
public string Name { get; set; }
public string HomePage { get; set; }
}
Perhaps you want to try:
string character = Encoding.UTF8.GetString(chars);
sb.Append(character);
I know this question is getting quite old, but I crashed into this problem as of today, while trying to access the Facebook Graph API. I was getting these strange \u00f8 and other variations back.
First I tried a simple replace as the OP also said (with the help from an online table). But I thought "no way!" after adding 2 replaces.
So after looking a little more at the "codes" it suddenly hit me...
The "\u" is a prefix, and the 4 characters after that is a hexadecimal encoded char code! So writing a simple regex to find all \u with 4 alphanumerical characters after, and afterwards converting the last 4 characters to integer and then to a character made the deal.
My source is in VB.NET
Private Function DecodeJsonString(ByVal Input As String) As String
For Each m As System.Text.RegularExpressions.Match In New System.Text.RegularExpressions.Regex("\\u(\w{4})").Matches(Input)
Input = Input.Replace(m.Value, Chr(CInt("&H" & m.Value.Substring(2))))
Next
Return Input
End Function
I also have a C# version here
private string DecodeJsonString(string Input)
{
foreach (System.Text.RegularExpressions.Match m in new System.Text.RegularExpressions.Regex(#"\\u(\w{4})").Matches(Input))
{
Input = Input.Replace(m.Value, ((char)(System.Int32.Parse(m.Value.Substring(2), System.Globalization.NumberStyles.AllowHexSpecifier))).ToString());
}
return Input;
}
I hope it can help someone out... I hate to add libraries when I really only need a few functions from them!

Is there a way of making strings file-path safe in c#?

My program will take arbitrary strings from the internet and use them for file names. Is there a simple way to remove the bad characters from these strings or do I need to write a custom function for this?
Ugh, I hate it when people try to guess at which characters are valid. Besides being completely non-portable (always thinking about Mono), both of the earlier comments missed more 25 invalid characters.
foreach (var c in Path.GetInvalidFileNameChars())
{
fileName = fileName.Replace(c, '-');
}
Or in VB:
'Clean just a filename
Dim filename As String = "salmnas dlajhdla kjha;dmas'lkasn"
For Each c In IO.Path.GetInvalidFileNameChars
filename = filename.Replace(c, "")
Next
'See also IO.Path.GetInvalidPathChars
To strip invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars
var validFilename = new string(filename.Where(ch => !invalidFileNameChars.Contains(ch)).ToArray());
To replace invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and an _ for invalid ones
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? '_' : ch).ToArray());
To replace invalid characters (and avoid potential name conflict like Hell* vs Hell$):
static readonly IList<char> invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and replaces invalid chars with a unique letter (Moves the Char into the letter range of unicode, starting at "A")
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? Convert.ToChar(invalidFileNameChars.IndexOf(ch) + 65) : ch).ToArray());
This question has been asked many times before and, as pointed out many times before, IO.Path.GetInvalidFileNameChars is not adequate.
First, there are many names like PRN and CON that are reserved and not allowed for filenames. There are other names not allowed only at the root folder. Names that end in a period are also not allowed.
Second, there are a variety of length limitations. Read the full list for NTFS here.
Third, you can attach to filesystems that have other limitations. For example, ISO 9660 filenames cannot start with "-" but can contain it.
Fourth, what do you do if two processes "arbitrarily" pick the same name?
In general, using externally-generated names for file names is a bad idea. I suggest generating your own private file names and storing human-readable names internally.
I agree with Grauenwolf and would highly recommend the Path.GetInvalidFileNameChars()
Here's my C# contribution:
string file = #"38?/.\}[+=n a882 a.a*/|n^%$ ad#(-))";
Array.ForEach(Path.GetInvalidFileNameChars(),
c => file = file.Replace(c.ToString(), String.Empty));
p.s. -- this is more cryptic than it should be -- I was trying to be concise.
Here's my version:
static string GetSafeFileName(string name, char replace = '_') {
char[] invalids = Path.GetInvalidFileNameChars();
return new string(name.Select(c => invalids.Contains(c) ? replace : c).ToArray());
}
I'm not sure how the result of GetInvalidFileNameChars is calculated, but the "Get" suggests it's non-trivial, so I cache the results. Further, this only traverses the input string once instead of multiple times, like the solutions above that iterate over the set of invalid chars, replacing them in the source string one at a time. Also, I like the Where-based solutions, but I prefer to replace invalid chars instead of removing them. Finally, my replacement is exactly one character to avoid converting characters to strings as I iterate over the string.
I say all that w/o doing the profiling -- this one just "felt" nice to me. : )
Here's the function that I am using now (thanks jcollum for the C# example):
public static string MakeSafeFilename(string filename, char replaceChar)
{
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(c, replaceChar);
}
return filename;
}
I just put this in a "Helpers" class for convenience.
If you want to quickly strip out all special characters which is sometimes more user readable for file names this works nicely:
string myCrazyName = "q`w^e!r#t#y$u%i^o&p*a(s)d_f-g+h=j{k}l|z:x\"c<v>b?n[m]q\\w;e'r,t.y/u";
string safeName = Regex.Replace(
myCrazyName,
"\W", /*Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'*/
"",
RegexOptions.IgnoreCase);
// safeName == "qwertyuiopasd_fghjklzxcvbnmqwertyu"
Here's what I just added to ClipFlair's (http://github.com/Zoomicon/ClipFlair) StringExtensions static class (Utils.Silverlight project), based on info gathered from the links to related stackoverflow questions posted by Dour High Arch above:
public static string ReplaceInvalidFileNameChars(this string s, string replacement = "")
{
return Regex.Replace(s,
"[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]",
replacement, //can even use a replacement string of any length
RegexOptions.IgnoreCase);
//not using System.IO.Path.InvalidPathChars (deprecated insecure API)
}
static class Utils
{
public static string MakeFileSystemSafe(this string s)
{
return new string(s.Where(IsFileSystemSafe).ToArray());
}
public static bool IsFileSystemSafe(char c)
{
return !Path.GetInvalidFileNameChars().Contains(c);
}
}
Why not convert the string to a Base64 equivalent like this:
string UnsafeFileName = "salmnas dlajhdla kjha;dmas'lkasn";
string SafeFileName = Convert.ToBase64String(Encoding.UTF8.GetBytes(UnsafeFileName));
If you want to convert it back so you can read it:
UnsafeFileName = Encoding.UTF8.GetString(Convert.FromBase64String(SafeFileName));
I used this to save PNG files with a unique name from a random description.
private void textBoxFileName_KeyPress(object sender, KeyPressEventArgs e)
{
e.Handled = CheckFileNameSafeCharacters(e);
}
/// <summary>
/// This is a good function for making sure that a user who is naming a file uses proper characters
/// </summary>
/// <param name="e"></param>
/// <returns></returns>
internal static bool CheckFileNameSafeCharacters(System.Windows.Forms.KeyPressEventArgs e)
{
if (e.KeyChar.Equals(24) ||
e.KeyChar.Equals(3) ||
e.KeyChar.Equals(22) ||
e.KeyChar.Equals(26) ||
e.KeyChar.Equals(25))//Control-X, C, V, Z and Y
return false;
if (e.KeyChar.Equals('\b'))//backspace
return false;
char[] charArray = Path.GetInvalidFileNameChars();
if (charArray.Contains(e.KeyChar))
return true;//Stop the character from being entered into the control since it is non-numerical
else
return false;
}
From my older projects, I've found this solution, which has been working perfectly over 2 years. I'm replacing illegal chars with "!", and then check for double !!'s, use your own char.
public string GetSafeFilename(string filename)
{
string res = string.Join("!", filename.Split(Path.GetInvalidFileNameChars()));
while (res.IndexOf("!!") >= 0)
res = res.Replace("!!", "!");
return res;
}
I find using this to be quick and easy to understand:
<Extension()>
Public Function MakeSafeFileName(FileName As String) As String
Return FileName.Where(Function(x) Not IO.Path.GetInvalidFileNameChars.Contains(x)).ToArray
End Function
This works because a string is IEnumerable as a char array and there is a string constructor string that takes a char array.
Many anwer suggest to use Path.GetInvalidFileNameChars() which seems like a bad solution to me. I encourage you to use whitelisting instead of blacklisting because hackers will always find a way eventually to bypass it.
Here is an example of code you could use :
string whitelist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.";
foreach (char c in filename)
{
if (!whitelist.Contains(c))
{
filename = filename.Replace(c, '-');
}
}

Categories