Ok I'm not sure how to explain this, but take a look at this:
string fornavn = borgerXML.GetElementsByTagName("Fornavne")[id].InnerXml
Say I have 4 of these strings, which are getting different elements from the XML file.
On every single one of these, I want to do the following:
.Replace("æ", "æ")
.Replace("Ã?", "Æ")
.Replace("ø", "ø")
.Replace("Ã?", "Ø")
.Replace("Ã¥", "å")
.Replace("Ã?", "Å");
If I could store these 6 lines of code as a method or something, I'd only have to paste that method for every one of those 4 string elements, instead of pasting 4 times 6 lines of code.
Is there some way to do this?
Thanks in advance.
The actual problem here is that the file, which according to your comment, specifies that it is encoded using the iso-8859-1 encoding:
<?xml version="1.0" encoding="iso-8859-1"?>
does not in fact match that encoding, try this with LINQPad:
void Main()
{
string wrong = "æ ø å";
var bytes = Encoding.GetEncoding("iso-8859-1").GetBytes(wrong);
string correct = Encoding.UTF8.GetString(bytes);
correct.Dump();
}
This will output:
æ ø å
(note that I only used every second letter as the characters you've posted in the question are not an exact match to how they are actually encoded).
To fix this, go back to the code that produces the file and ensure it actually uses iso-8859-1 encoding when writing the file, instead of UTF-8.
public string Convert(string source)
{
return source.Replace("æ", "æ")
.Replace("Ã?", "Æ")
.Replace("ø", "ø")
.Replace("Ã?", "Ø")
.Replace("Ã¥", "å")
.Replace("Ã?", "Å");
}
If I undestood you correctly. For each string you want to "process" call
Convert(value);
I borrow a bit from here Regex Replace - Multiple Characters (actually you would have found this yourself with the search function) to make it a more generic solution which you can use for more or less characters:
define a function that can substitute dynamically:
string MultiReplace(string original, Dictionary<string, string> replacements)
{
//initialize result with original in case no replacements have been requested
string result = original;
foreach (var r in replacements)
{
result = result.Replace(r.Key, r.Value);
}
return result;
}
Define a multi-use variable to save overhead of creating it every time you need:
//You can create & initialize a dictionary variable like this:
var charmap = new Dictionary<string, string>
{
{ "æ", "æ" },
{ "Ã?", "Æ" },
{ "ø", "ø" },
{ "Ã?", "Ø" },
{ "Ã¥", "å" },
{ "Ã?", "Å" },
};
Call this function with your specific task:
var a = MultiReplace(fornavn, charmap);
In general it should be possible to fix this on the XML level though.
BR Florian
Related
I have created a MadLibs style game where the user enters responses to prompts which in turn replace blanks, represented by %s0, %s1 etc., in a story. I have this working using a for loop but someone else suggested I could do it using regex. What I have so far is below, which replaces all instances of %s+number with "wibble". What I was wondering is if it is possible to store the number found by the regex in a temporary variable and in turn use that to return a value from the list Words? E.g. return Regex.Replace(story, pattern, Global.Words[x]); where x is the number returned by the regex pattern as it goes over the string.
static void Main(string[] args)
{
Globals.Words = new List<string>();
Globals.Words.Add("nathan");
Globals.Words.Add("bob");
var text = "Once upon a time there was a %s0 and it was %s1";
Console.WriteLine(FindEscapeCharacters(text));
}
public static string FindEscapeCharacters(string story)
{
var pattern = #"%s([0-9]+)";
return Regex.Replace(story, "%s([0-9]+)", "wibble");
}
Thanks in advance, Nathan.
Not a direct answer to your question about regexes, but if I understand you correctly, there is an easier way to do this:
string baseString = "I have a {0} {1} in my {0} {2}.";
List<string> words = new List<string>() { "red", "cat", "hat" };
string outputString = String.Format(baseString, words.ToArray());
outputString will be I have a red cat in my red hat..
Is that not what you want, or is there more to your question that I'm missing?
Minor elaboration
String.Format uses the following signature:
string Format(string format, params object[] values)
The neat thing about params is that you can either list values separately:
var a = String.Format("...", valueA, valueB, valueC);
but you can also pass in an array directly:
var a = String.Format("...", valueArray);
Note that you can't mix and match the two approaches.
Yes, you are very close in your attempt with Regex.Replace; the last step is to change constant "wibble" into lambda match => how_to_replace_the_match:
var text = "Once upon a time there was a %s0 and it was %s1";
// Once upon a time there was a nathan and it was bob
var result = Regex.Replace(
text,
"%s([0-9]+)",
match => Globals.Words[int.Parse(match.Groups[1].Value)]);
Edit: In case you don't want working with capturing groups by their numbers, you can name them explicitly:
// Once upon a time there was a nathan and it was bob
var result = Regex.Replace(
text,
"%s(?<number>[0-9]+)",
match => Globals.Words[int.Parse(match.Groups["number"].Value)]);
There is an overload of Regex.Replace that, rather than taking a string for the last argument, takes a MatchEvaluator delegate - a function that takes a Match object and returns a string.
You could make that function parse the integer from the Match's Groups[1].Value property and then use that to index into your list, returning the string you find.
I will receive an response in the form of JSON string.
We have an existing tool developed in C# which will take input in XML format.
Hence i am converting the JSON string obtained from server using Newtonsoft.JSON to XML string and passing to the tool.
Problem:
When converting JSON response to XML, I am getting an error
"Failed to process request. Reason: The ' ' character, hexadecimal
value 0x20, cannot be included in a name."
The above error indicates that the JSON Key contains a space [For Example: \"POI Items\":[{\"lat\":{\"value\":\"00\"}] which cannot be converted to XML element.
Is there any approach to identify spaces only JSON key's ["POI Items"] and remove the spaces in it?
Also suggest any alternative solution so that we needn't change the existing solution?
Regards,
Sudhir
You can use Json.Net and replace the names while loading the json..
JsonSerializer ser = new JsonSerializer();
var jObj = ser.Deserialize(new JReader(new StringReader(json))) as JObject;
var newJson = jObj.ToString(Newtonsoft.Json.Formatting.None);
.
public class JReader : Newtonsoft.Json.JsonTextReader
{
public JReader(TextReader r) : base(r)
{
}
public override bool Read()
{
bool b = base.Read();
if (base.CurrentState == State.Property && ((string)base.Value).Contains(' '))
{
base.SetToken(JsonToken.PropertyName,((string)base.Value).Replace(" ", "_"));
}
return b;
}
}
Input : {"POI Items":[{"lat":{"value":"00","ab cd":"de fg"}}]}
Output: {"POI_Items":[{"lat":{"value":"00","ab_cd":"de fg"}}]}
I recommend using some sort of Regex.Replace().
Search the input string for something like:
\"([a-zA-Z0-9]+) ([a-zA-Z0-9]+)\":
and then replace something like (mind the missing space):
\"(1)(2)\":
The 1st pair of parenthesis contain the first word in a variable name, the 2nd pair of parenthesis means the 2nd word. The : guarantees that this operation will be done in variable names only (not in string data). the JSON variable names are inside a pair of \"s.
Maybe it's not 100% correct but you can start searching by this.
For details check MSDN, and some Regex examples
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx
I need a way to convert special characters like this:
Helloæ
To normal characters. So this word would end up being Helloae. So far I have tried HttpUtility.Decode, or a method that would convert UTF8 to win1252, but nothing worked. Is there something simple and generic that would do this job?
Thank you.
EDIT
I have tried implementing those two methods using posts here on OC. Here's the methods:
public static string ConvertUTF8ToWin1252(string _source)
{
Encoding utf8 = new UTF8Encoding();
Encoding win1252 = Encoding.GetEncoding(1252);
byte[] input = _source.ToUTF8ByteArray();
byte[] output = Encoding.Convert(utf8, win1252, input);
return win1252.GetString(output);
}
// It should be noted that this method is expecting UTF-8 input only,
// so you probably should give it a more fitting name.
private static byte[] ToUTF8ByteArray(this string _str)
{
Encoding encoding = new UTF8Encoding();
return encoding.GetBytes(_str);
}
But it did not worked. The string remains the same way.
See: Does .NET transliteration library exists?
UnidecodeSharpFork
Usage:
var result = "Helloæ".Unidecode();
Console.WriteLine(result) // Prints Helloae
There is no direct mapping between æ and ae they are completely different unicode code points. If you need to do this you'll most likely need to write a function that maps the offending code points to the strings that you desire.
Per the comments you may need to take a two stage approach to this:
Remove the diacritics and combining characters per the link to the possible duplicate
Map any characters left that are not combining to alternate strings
switch(badChar){
case 'æ':
return "ae";
case 'ø':
return "oe";
// and so on
}
I want to pass a string array (separated by commas), then use a function to split the passed array by a comma, and add in a delimiter in place of the comma.
I will show you what I mean in further detail with some broken code:
String FirstData = "1";
String SecondData = "2" ;
String ThirdData = "3" ;
String FourthData = null;
FourthData = AddDelimiter(FirstData,SecondData,ThirdData);
public String AddDelimiter(String[] sData)
{
// foreach ","
String OriginalData = null;
// So, here ... I want to somehow split 'sData' by a ",".
// I know I can use the split function - which I'm having
// some trouble with - but I also believe there is some way
// to use the 'foreach' function? I wish i could put together
// some more code here but I'm a VB6 guy, and the syntax here
// is killing me. Errors everywhere.
return OriginalData;
}
Syntax doesn't matter much here, you need to get to know the Base Class Library. Also, you want to join strings apparently, not split it:
var s = string.Join(",", arrayOFStrings);
Also, if you want to pass n string to a method like that, you need the params keyword:
public string Join( params string[] data) {
return string.Join(",", data);
}
To split:
string[] splitString = sData.Split(new char[] {','});
To join in new delimiter, pass in the array of strings to String.Join:
string colonString = String.Join(":", splitString);
I think you are better off using Replace, since all you want to do is replace one delimiter with another:
string differentDelimiter = sData.Replace(",", ":");
If you have several objects and you want to put them in an array, you can write:
string[] allData = new string[] { FirstData, SecondData, ThirdData };
you can then simply give that to the function:
FourthData = AddDelimiter(allData);
C# has a nice trick, if you add a params keyword to the function definition, you can treat it as if it's a function with any number of parameters:
public String AddDelimiter(params String[] sData) { … }
…
FourthData = AddDelimiter(FirstData, SecondData, ThirdData);
As for the actual implementation, the easiest way is to use string.Join():
public String AddDelimiter(String[] sData)
{
// you can use any other string instead of ":"
return string.Join(":", sData);
}
But if you wanted to build the result yourself (for example if you wanted to learn how to do it), you could do it using string concatenation (oneString + anotherString), or even better, using StringBuilder:
public String AddDelimiter(String[] sData)
{
StringBuilder result = new StringBuilder();
bool first = true;
foreach (string s in sData)
{
if (!first)
result.Append(':');
result.Append(s);
first = false;
}
return result.ToString();
}
One version of the Split function takes an array of characters. Here is an example:
string splitstuff = string.Split(sData[0],new char [] {','});
If you don't need to perform any processing on the parts in between and just need to replace the delimiter, you could easily do so with the Replace method on the String class:
string newlyDelimited = oldString.Replace(',', ':');
For large strings, this will give you better performance, as you won't have to do a full pass through the string to break it apart and then do a pass through the parts to join them back together.
However, if you need to work with the individual parts (to recompose them into another form that does not resemble a simple replacement of the delimiter), then you would use the Split method on the String class to get an array of the delimited items and then plug those into the format you wish.
Of course, this means you have to have some sort of explicit knowledge about what each part of the delimited string means.
I connect to a webservice that gives me a response something like this(This is not the whole string, but you get the idea):
sResponse = "{\"Name\":\" Bod\u00f8\",\"homePage\":\"http:\/\/www.example.com\"}";
As you can see, the "Bod\u00f8" is not as it should be.
Therefor i tried to convert the unicode (\u00f8) to char by doing this with the string:
public string unicodeToChar(string sString)
{
StringBuilder sb = new StringBuilder();
foreach (char chars in sString)
{
if (chars >= 32 && chars <= 255)
{
sb.Append(chars);
}
else
{
// Replacement character
sb.Append((char)chars);
}
}
sString = sb.ToString();
return sString;
}
But it won't work, probably because the string is shown as \u00f8, and not \u00f8.
Now it would not be a problem if \u00f8 was the only unicode i had to convert, but i got many more of the unicodes.
That means that i can't just use the replace function :(
Hope someone can help.
You're basically talking about converting from JSON (JavaScript Object Notation). Try this link--near the bottom you'll see a list of publicly available libraries, including some in C#, that might do what you need.
The excellent Json.NET library has no problems decoding unicode escape sequences:
var sResponse = "{\"Name\":\"Bod\u00f8\",\"homePage\":\"http://www.ex.com\"}";
var obj = (JObject)JsonConvert.DeserializeObject(sResponse);
var name = ((JValue)obj["Name"]).Value;
var homePage = ((JValue)obj["homePage"]).Value;
Debug.Assert(Equals(name, "Bodø"));
Debug.Assert(Equals(homePage, "http://www.ex.com"));
This also allows you to deserialize to real POCO objects, making the code even cleaner (although less dynamic).
var obj = JsonConvert.DeserializeObject<Response>(sResponse);
Debug.Assert(obj2.Name == "Bodø");
Debug.Assert(obj2.HomePage == "http://www.ex.com");
public class Response
{
public string Name { get; set; }
public string HomePage { get; set; }
}
Perhaps you want to try:
string character = Encoding.UTF8.GetString(chars);
sb.Append(character);
I know this question is getting quite old, but I crashed into this problem as of today, while trying to access the Facebook Graph API. I was getting these strange \u00f8 and other variations back.
First I tried a simple replace as the OP also said (with the help from an online table). But I thought "no way!" after adding 2 replaces.
So after looking a little more at the "codes" it suddenly hit me...
The "\u" is a prefix, and the 4 characters after that is a hexadecimal encoded char code! So writing a simple regex to find all \u with 4 alphanumerical characters after, and afterwards converting the last 4 characters to integer and then to a character made the deal.
My source is in VB.NET
Private Function DecodeJsonString(ByVal Input As String) As String
For Each m As System.Text.RegularExpressions.Match In New System.Text.RegularExpressions.Regex("\\u(\w{4})").Matches(Input)
Input = Input.Replace(m.Value, Chr(CInt("&H" & m.Value.Substring(2))))
Next
Return Input
End Function
I also have a C# version here
private string DecodeJsonString(string Input)
{
foreach (System.Text.RegularExpressions.Match m in new System.Text.RegularExpressions.Regex(#"\\u(\w{4})").Matches(Input))
{
Input = Input.Replace(m.Value, ((char)(System.Int32.Parse(m.Value.Substring(2), System.Globalization.NumberStyles.AllowHexSpecifier))).ToString());
}
return Input;
}
I hope it can help someone out... I hate to add libraries when I really only need a few functions from them!