I connect to a webservice that gives me a response something like this(This is not the whole string, but you get the idea):
sResponse = "{\"Name\":\" Bod\u00f8\",\"homePage\":\"http:\/\/www.example.com\"}";
As you can see, the "Bod\u00f8" is not as it should be.
Therefor i tried to convert the unicode (\u00f8) to char by doing this with the string:
public string unicodeToChar(string sString)
{
StringBuilder sb = new StringBuilder();
foreach (char chars in sString)
{
if (chars >= 32 && chars <= 255)
{
sb.Append(chars);
}
else
{
// Replacement character
sb.Append((char)chars);
}
}
sString = sb.ToString();
return sString;
}
But it won't work, probably because the string is shown as \u00f8, and not \u00f8.
Now it would not be a problem if \u00f8 was the only unicode i had to convert, but i got many more of the unicodes.
That means that i can't just use the replace function :(
Hope someone can help.
You're basically talking about converting from JSON (JavaScript Object Notation). Try this link--near the bottom you'll see a list of publicly available libraries, including some in C#, that might do what you need.
The excellent Json.NET library has no problems decoding unicode escape sequences:
var sResponse = "{\"Name\":\"Bod\u00f8\",\"homePage\":\"http://www.ex.com\"}";
var obj = (JObject)JsonConvert.DeserializeObject(sResponse);
var name = ((JValue)obj["Name"]).Value;
var homePage = ((JValue)obj["homePage"]).Value;
Debug.Assert(Equals(name, "Bodø"));
Debug.Assert(Equals(homePage, "http://www.ex.com"));
This also allows you to deserialize to real POCO objects, making the code even cleaner (although less dynamic).
var obj = JsonConvert.DeserializeObject<Response>(sResponse);
Debug.Assert(obj2.Name == "Bodø");
Debug.Assert(obj2.HomePage == "http://www.ex.com");
public class Response
{
public string Name { get; set; }
public string HomePage { get; set; }
}
Perhaps you want to try:
string character = Encoding.UTF8.GetString(chars);
sb.Append(character);
I know this question is getting quite old, but I crashed into this problem as of today, while trying to access the Facebook Graph API. I was getting these strange \u00f8 and other variations back.
First I tried a simple replace as the OP also said (with the help from an online table). But I thought "no way!" after adding 2 replaces.
So after looking a little more at the "codes" it suddenly hit me...
The "\u" is a prefix, and the 4 characters after that is a hexadecimal encoded char code! So writing a simple regex to find all \u with 4 alphanumerical characters after, and afterwards converting the last 4 characters to integer and then to a character made the deal.
My source is in VB.NET
Private Function DecodeJsonString(ByVal Input As String) As String
For Each m As System.Text.RegularExpressions.Match In New System.Text.RegularExpressions.Regex("\\u(\w{4})").Matches(Input)
Input = Input.Replace(m.Value, Chr(CInt("&H" & m.Value.Substring(2))))
Next
Return Input
End Function
I also have a C# version here
private string DecodeJsonString(string Input)
{
foreach (System.Text.RegularExpressions.Match m in new System.Text.RegularExpressions.Regex(#"\\u(\w{4})").Matches(Input))
{
Input = Input.Replace(m.Value, ((char)(System.Int32.Parse(m.Value.Substring(2), System.Globalization.NumberStyles.AllowHexSpecifier))).ToString());
}
return Input;
}
I hope it can help someone out... I hate to add libraries when I really only need a few functions from them!
Related
I am scraping data off of a website to get helpful data for my coworkers, instead of having to refresh the page frequently.
The C# code pulls data straight from the HTML. But the data is encrypted in a strange way, and returns as a non human-readable string, which is not helpful to us.
For example, in the table, a product number may be shown as "14501219". In the HTML, the inner text of the element containing the data is "14501219".
I need to know how to:
Parse hex and decimal into int from the same string
Append those results to the eventual output
So far I worked this out this pseudocode, but I don't know how it would look in C# or what conversion methods to use:
for (int i = 0; i < inputString.Length; i++)
{
if (inputString[i] = '&' && inputstring[i+1] = '#')
{
if (inputstring[i+2 = 'x'
{
//convert to hex
//append to outputList
}
else
{
//convert to decimal
//append to outputList
}
}
else
{
//convert to string literal
}
}
Any help would be greatly appreciated
After you added the string literal example I new what you were seeing/asking. In order for the http client side not to get tripped up on some special charecters thaey are endoded using the acsii representation. Most frameworks have a way of working with the encoded url. For example in c# you should always make sure to use HttpUtility.UrlDecode and HttpUtility.UrlEncode internally when reading and writing.
I am confused by all the different escaping mechanisms for strings in C#. What I want is an escaping/unescaping method that:
1) Can be used on any string
2) escape+unescape is guaranteed to return the initial string
3) Replaces all punctuation with something else. If that is too much to ask, then at least commas, braces, and #. I am fine with spaces not being escaped.
4) Is unlikely to ever change.
Does it exist?
EDIT: This is for purposes of seriliazing and deserializing app-generated attributes. So my object may or may not have values for Attribute1, Attribute2, Attribute3, etc. Simplifying a bit, the idea is to do something like the below. Goal is to have the encoded collection be brief and more-or-less human-readable.
I am asking what methods would make sense to use for Escape and Unescape.
public abstract class GenericAttribute {
const string key1 = "KEY1"; //It is fine to put some restrictions on the keys, i.e. no punctuation
const string key2 = "KEY2";
public abstract string Encode(); // NO RESTRICTIONS ON WHAT ENCODE MIGHT RETURN
public static GenericAttribute FromKeyValuePair (string key, string value) {
switch (key) {
case key1: return new ConcreteAttribute1(value);
case key2: return new ConcreteAttribute2(value);
// etc.
}
}
}
public class AttributeCollection {
Dictionary <string, GenericAttribute> Content {get;set;}
public string Encode() {
string r = "";
bool first = true;
foreach (KeyValuePair<string, GenericAttribute> pair in this.Content) {
if (first) {
first = false;
} else {
r+=",";
}
r+=(pair.Key + "=" + Escape(pair.Value.Encode()));
}
return r;
}
public AttributeCollection(string encodedCollection) {
// input string is the return value of the Encode method
this.Content = new Dictionary<string, GenericAttribute>();
string[] array = encodedCollection.Split(',');
foreach(string component in array) {
int equalsIndex = component.IndexOf('=');
string key = component.Substring(0, equalsIndex);
string value = component.Substring(equalsIndex+1);
GenericAttribute attribute = GenericAttribute.FromKeyValuePair(key, Unescape(value));
this.Content[key]=attribute;
}
}
}
I'm not entirely sure what your asking, but I believe your intent is for the escaped character to be included, even with the escape.
var content = #"\'Hello";
Console.WriteLine(content);
// Output:
\'Hello
By utilizing the # it will include said escaping, making it apart of your string. That is for the server-side with C#, to account for other languages and escape formats only you would know that.
You can find some great information on C# escaping here:
MSDN Blog
Try using HttpServerUtility.UrlEncode and HttpServerUtility.UrlDecode. I think that will encode and decode all the things you want.
See the MSDN Docs and here is a description of the mapping on Wikipedia.
So I have data like this:
((4886.03 12494.89 "LYR3_SIG2"))
It is always going to be SPACE delimited thus I want to use Regex to place each into a property.
Yes, I was playing around with some regex
string q = "4886.03 12494.89 \"LYR3_SIG2";
string clean = Regex.Replace(q, #"[^\w\s]", string.Empty);
but what I aim to do is to put each of the 3 values into a class like this
public class BowTies
{
public double XCoordinate { get; set; }
public double YCoordinate { get; set; }
public string Layer { get; set; }
}
Now I originally was parsing the data into a property
t = streamReader.ReadLine();
if ((t != null) && Regex.IsMatch(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))"))
currentVioType.Bowtie = new ParseType() { Formatted = Regex.Match(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))").Value.Trim('(', ')'), Original = t };
But now I really want to put that data into the doubles and string
thus this data is space delimited ((4886.03 12494.89 "LYR3_SIG2"))
I was started down my path of refactoring , but I temporarily was not using the regex for getting the doubles ( which are ALWAYS going to be the first 2 values, followed by a string so I started doing this:
currentAddPla.Bows.Add(new BowTies() { XCoordinate = 44.33, YCoordinate = 344.33, Layer = Regex.Match(t, "(\\(\\()[a-zA-Z_,\\s\".0-9-]{1,}(\"\\)\\))").Value.Trim('(', ')')});
but I obviously need to use regex and parse this dumping the first value (the double into XCoordinate, then the 2nd value into YCoordinate and the 3rd value that regex is already getting ALL the data and needs to only get the 3rd value of "LYR3_SIG2" which should be found with regex right?
It is always going to be SPACE delimited thus
RegEx for this sounds like overkill. Have you considered using string.Split(' ');, eg:
string s = "((4886.03 12494.89 \"LYR3_SIG2\"))";
s = s.Replace("(", string.Empty).Replace(")", string.Empty);
string[] arr = s.Split(' ');
currentAddPla.Bows.Add(new BowTies() {
XCoordinate = Convert.ToDouble(arr[0]),
YCoordinate = Convert.ToDouble(arr[1]),
Layer = arr[3]});
You should just use String.Split instead of RegEx. The data is formatted simply enough that RegEx would be overkill even if it worked well here. On top of that the language which defines your data is not regular ( http://en.wikipedia.org/wiki/Regular_language ) and thus cannot be reliably parsed with RegEx. It may be working right now because the data inside the parens is simply formatted but languages which have matching braces are context-free and in general are not able to be parsed with regular expressions.
I am working on a simple windows forms application that the user enters a string with delimiters and I parse the string and only get the variables out of the string.
So for example if the user enters:
2X + 5Y + z^3
I extract the values 2,5 and 3 from the "equation" and simply add them together.
This is how I get the integer values from a string.
int thirdValue
string temp;
temp = Regex.Match(variables[3], #"\d+").Value
thirdValue = int.Parse(temp);
variables is just an array of strings I use to store strings after parsing.
However, I get the following error when I run the application:
Input string was not in a correct format
Why i everyone moaning about this question and marking it down? it's incredibly easy to explain what is happening and the questioner was right to say it as he did. There is nothing wrong whatsoever.
Regex.Match(variables[3], #"\d+").Value
throws a Input string was not in a correct format.. FormatException if the string (here it's variables[3]) doesn't contain any numbers. It also does it if it can't access variables[3] within the memory stack of an Array when running as a service. I SUSPECT THIS IS A BUG The error is that the .Value is empty and the .Match failed.
Now quite honestly this is a feature masquerading as a bug if you ask me, but it's meant to be a design feature. The right way (IMHO) to have done this method would be to return a blank string. But they don't they throw a FormatException. Go figure. It is for this reason you were advised by astef to not even bother with Regex because it throws exceptions and is confusing. But he got marked down too!
The way round it is to use this simple additional method they also made
if (Regex.IsMatch(variables[3], #"\d+")){
temp = Regex.Match(variables[3], #"\d+").Value
}
If this still doesn't work for you you cannot use Regex for this. I have seen in a c# service that this doesn't work and throws incorrect errors. So I had to stop using Regex
I prefer simple and lightweight solutions without Regex:
static class Program
{
static void Main()
{
Console.WriteLine("2X + 65Y + z^3".GetNumbersFromString().Sum());
Console.ReadLine();
}
static IEnumerable<int> GetNumbersFromString(this string input)
{
StringBuilder number = new StringBuilder();
foreach (char ch in input)
{
if (char.IsDigit(ch))
number.Append(ch);
else if (number.Length > 0)
{
yield return int.Parse(number.ToString());
number.Clear();
}
}
yield return int.Parse(number.ToString());
}
}
you can change the string to char array and check if its a digit and count them up.
string temp = textBox1.Text;
char[] arra = temp.ToCharArray();
int total = 0;
foreach (char t in arra)
{
if (char.IsDigit(t))
{
total += int.Parse(t + "");
}
}
textBox1.Text = total.ToString();
This should solve your problem:
string temp;
temp = Regex.Matches(textBox1.Text, #"\d+", RegexOptions.IgnoreCase)[2].Value;
int thirdValue = int.Parse(temp);
My program will take arbitrary strings from the internet and use them for file names. Is there a simple way to remove the bad characters from these strings or do I need to write a custom function for this?
Ugh, I hate it when people try to guess at which characters are valid. Besides being completely non-portable (always thinking about Mono), both of the earlier comments missed more 25 invalid characters.
foreach (var c in Path.GetInvalidFileNameChars())
{
fileName = fileName.Replace(c, '-');
}
Or in VB:
'Clean just a filename
Dim filename As String = "salmnas dlajhdla kjha;dmas'lkasn"
For Each c In IO.Path.GetInvalidFileNameChars
filename = filename.Replace(c, "")
Next
'See also IO.Path.GetInvalidPathChars
To strip invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars
var validFilename = new string(filename.Where(ch => !invalidFileNameChars.Contains(ch)).ToArray());
To replace invalid characters:
static readonly char[] invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and an _ for invalid ones
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? '_' : ch).ToArray());
To replace invalid characters (and avoid potential name conflict like Hell* vs Hell$):
static readonly IList<char> invalidFileNameChars = Path.GetInvalidFileNameChars();
// Builds a string out of valid chars and replaces invalid chars with a unique letter (Moves the Char into the letter range of unicode, starting at "A")
var validFilename = new string(filename.Select(ch => invalidFileNameChars.Contains(ch) ? Convert.ToChar(invalidFileNameChars.IndexOf(ch) + 65) : ch).ToArray());
This question has been asked many times before and, as pointed out many times before, IO.Path.GetInvalidFileNameChars is not adequate.
First, there are many names like PRN and CON that are reserved and not allowed for filenames. There are other names not allowed only at the root folder. Names that end in a period are also not allowed.
Second, there are a variety of length limitations. Read the full list for NTFS here.
Third, you can attach to filesystems that have other limitations. For example, ISO 9660 filenames cannot start with "-" but can contain it.
Fourth, what do you do if two processes "arbitrarily" pick the same name?
In general, using externally-generated names for file names is a bad idea. I suggest generating your own private file names and storing human-readable names internally.
I agree with Grauenwolf and would highly recommend the Path.GetInvalidFileNameChars()
Here's my C# contribution:
string file = #"38?/.\}[+=n a882 a.a*/|n^%$ ad#(-))";
Array.ForEach(Path.GetInvalidFileNameChars(),
c => file = file.Replace(c.ToString(), String.Empty));
p.s. -- this is more cryptic than it should be -- I was trying to be concise.
Here's my version:
static string GetSafeFileName(string name, char replace = '_') {
char[] invalids = Path.GetInvalidFileNameChars();
return new string(name.Select(c => invalids.Contains(c) ? replace : c).ToArray());
}
I'm not sure how the result of GetInvalidFileNameChars is calculated, but the "Get" suggests it's non-trivial, so I cache the results. Further, this only traverses the input string once instead of multiple times, like the solutions above that iterate over the set of invalid chars, replacing them in the source string one at a time. Also, I like the Where-based solutions, but I prefer to replace invalid chars instead of removing them. Finally, my replacement is exactly one character to avoid converting characters to strings as I iterate over the string.
I say all that w/o doing the profiling -- this one just "felt" nice to me. : )
Here's the function that I am using now (thanks jcollum for the C# example):
public static string MakeSafeFilename(string filename, char replaceChar)
{
foreach (char c in System.IO.Path.GetInvalidFileNameChars())
{
filename = filename.Replace(c, replaceChar);
}
return filename;
}
I just put this in a "Helpers" class for convenience.
If you want to quickly strip out all special characters which is sometimes more user readable for file names this works nicely:
string myCrazyName = "q`w^e!r#t#y$u%i^o&p*a(s)d_f-g+h=j{k}l|z:x\"c<v>b?n[m]q\\w;e'r,t.y/u";
string safeName = Regex.Replace(
myCrazyName,
"\W", /*Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'*/
"",
RegexOptions.IgnoreCase);
// safeName == "qwertyuiopasd_fghjklzxcvbnmqwertyu"
Here's what I just added to ClipFlair's (http://github.com/Zoomicon/ClipFlair) StringExtensions static class (Utils.Silverlight project), based on info gathered from the links to related stackoverflow questions posted by Dour High Arch above:
public static string ReplaceInvalidFileNameChars(this string s, string replacement = "")
{
return Regex.Replace(s,
"[" + Regex.Escape(new String(System.IO.Path.GetInvalidPathChars())) + "]",
replacement, //can even use a replacement string of any length
RegexOptions.IgnoreCase);
//not using System.IO.Path.InvalidPathChars (deprecated insecure API)
}
static class Utils
{
public static string MakeFileSystemSafe(this string s)
{
return new string(s.Where(IsFileSystemSafe).ToArray());
}
public static bool IsFileSystemSafe(char c)
{
return !Path.GetInvalidFileNameChars().Contains(c);
}
}
Why not convert the string to a Base64 equivalent like this:
string UnsafeFileName = "salmnas dlajhdla kjha;dmas'lkasn";
string SafeFileName = Convert.ToBase64String(Encoding.UTF8.GetBytes(UnsafeFileName));
If you want to convert it back so you can read it:
UnsafeFileName = Encoding.UTF8.GetString(Convert.FromBase64String(SafeFileName));
I used this to save PNG files with a unique name from a random description.
private void textBoxFileName_KeyPress(object sender, KeyPressEventArgs e)
{
e.Handled = CheckFileNameSafeCharacters(e);
}
/// <summary>
/// This is a good function for making sure that a user who is naming a file uses proper characters
/// </summary>
/// <param name="e"></param>
/// <returns></returns>
internal static bool CheckFileNameSafeCharacters(System.Windows.Forms.KeyPressEventArgs e)
{
if (e.KeyChar.Equals(24) ||
e.KeyChar.Equals(3) ||
e.KeyChar.Equals(22) ||
e.KeyChar.Equals(26) ||
e.KeyChar.Equals(25))//Control-X, C, V, Z and Y
return false;
if (e.KeyChar.Equals('\b'))//backspace
return false;
char[] charArray = Path.GetInvalidFileNameChars();
if (charArray.Contains(e.KeyChar))
return true;//Stop the character from being entered into the control since it is non-numerical
else
return false;
}
From my older projects, I've found this solution, which has been working perfectly over 2 years. I'm replacing illegal chars with "!", and then check for double !!'s, use your own char.
public string GetSafeFilename(string filename)
{
string res = string.Join("!", filename.Split(Path.GetInvalidFileNameChars()));
while (res.IndexOf("!!") >= 0)
res = res.Replace("!!", "!");
return res;
}
I find using this to be quick and easy to understand:
<Extension()>
Public Function MakeSafeFileName(FileName As String) As String
Return FileName.Where(Function(x) Not IO.Path.GetInvalidFileNameChars.Contains(x)).ToArray
End Function
This works because a string is IEnumerable as a char array and there is a string constructor string that takes a char array.
Many anwer suggest to use Path.GetInvalidFileNameChars() which seems like a bad solution to me. I encourage you to use whitelisting instead of blacklisting because hackers will always find a way eventually to bypass it.
Here is an example of code you could use :
string whitelist = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ.";
foreach (char c in filename)
{
if (!whitelist.Contains(c))
{
filename = filename.Replace(c, '-');
}
}