I am writing a program which I need to read text from a file and display this on the graph once mouse hover the datapoint. My problem is while I read the data from text file and show it on graph it shows some "?" instead of actual character. (cannot post images sorry)
here is my code to read from file and attempted to change encoding.(no success) :
string myString = File.ReadAllText(#"read.txt");
Encoding enc_to = Encoding.GetEncoding("iso-8859-1");
Encoding enc_from = Encoding.UTF8;
byte[] InitialBytes =enc_from.GetBytes(myString);
byte[] FinalBytes = Encoding.Convert(enc_from, enc_to, InitialBytes);
string myMessage = enc_to.GetString(FinalBytes);
Please note that I dont want to show string as MessageBox.Show rather I want to show it as tooltip.
here is the text in read.txt file :
3 stands of 5½"
here is the how it is shown :
3 stands of 5�"
Use Encoding.Default:
string myString = File.ReadAllText(#"read.txt",Encoding.Default);
Related
I have a text file that contains a single word but it is with language : Arabic
I want to extract it
My code is:
string text = System.IO.File.ReadAllText(#"C:\CINPROCESSING\nom.txt");
Console.WriteLine(text );
I have the result with unknown characters : ????
How i can fix it?
Thanks,
Setup right codepage for your text.
System.IO.File.ReadAllText(#"C:\CINPROCESSING\nom.txt",System.Text.Encoding.GetEncoding(codepage))
May be codepage=1256 (windows-arabic).
Your code reads the text correctly into the variable text. (Debug and See)
However, dispalying arabic characters in the windows Console is another issue (Check how to solve it Here)
You can try this:
string text = System.IO.File.ReadAllText(#"C:\CINPROCESSING\nom.txt",Encoding.Default);
Console.WriteLine(text);
Try specifying the encoding using this StreamReader constructor:
StreamReader arabic_reader = new StreamReader(filePath, System.Text.Encoding.UTF8, true);
OR
string text = System.IO.File.ReadAllText(#"C:\CINPROCESSING\nom.txt",Encoding.UTF8);
Try :
StreamReader reader = new StreamReader(filePath, System.Text.Encoding.UTF8, true);
For reference: http://msdn.microsoft.com/en-us/library/ms143457.aspx
I've had some success encoding the RTFbody from a MailItem using UTF8Encoding. I'm able to compose a new email, do all the new-email stuff and click send. Upon hitting send, I append the email with a tag that is also added to the categories. This all works and all through the RTFBbody.
The problem comes when I reply to RTF emails, which, for testing purposes, are just the emails I sent to my lonesome self. When I send the reply email and new tags were added, I remove the old tags first and then add the new tags. When I set the RTFBody in the reply email with the edited string that contains the new tags, I get a "not enough memory or disk space" error. This doesn't happen when I just remove the tags with the same function.
Bellow is the code I'm using:
private void ChangeRTFBody(string replaceThis, string replaceWith)
{
byte[] rtfBytes = Globals.ThisAddIn.email.RTFBody as byte[];
System.Text.Encoding encoding = new System.Text.UTF8Encoding();
string rtfString = encoding.GetString(rtfBytes);
rtfString = rtfString.Replace(replaceThis, replaceWith);
rtfBytes = encoding.GetBytes(rtfString);
Globals.ThisAddIn.email.RTFBody = rtfBytes; < // The error is here only on
// reply and only when I replace
// with new tags
}
These are the calls I make:
Delete old tag: ChangeRTFBody(lastTag, "");
Add new tag: ChangeRTFBody("}}\0", newTag + "}}\0");
Like I said, This works when I create a new email and send it, but not when I try to reply to the same email. It also seems that the size of the byte[] almost doubles after the delete. When I check it during the Delete it's at about 15k bytes and when I check during the Add it jumps to over 30k bytes. When I try to add the newly inflated byte[] to the rtfBody is when I get the error.
Thanks for any help and tips and sorry about all the reading.
I had the same problem and came across what I think is an easier way to replace text in a outlook rtf body by using the Word.Document object model. You will need to add reference of Microsoft.Office.Interop.Word to your project first.
then add using
using Word = Microsoft.Office.Interop.Word;
then your code would look like
Word.Document doc = Inspector.WordEditor as Word.Document;
//body text
string text = doc.Content.Text;
//find text location
int textLocation = text.IndexOf(replaceThis);
if(textLocation > -1){
//get range
int textLocationEnd = textLocation + replaceThis.Length;
//init range
Word.Range myRange = doc.Range(textLocation , textLocationEnd);
//replace text
myRange.Text = replaceWith
}
I've tried googling around but wasn't able to find what charset that this text below belongs to:
具有éœé›»ç”¢ç”Ÿè£ç½®ä¹‹å½±åƒè¼¸å…¥è£ç½®
But putting <meta http-equiv="Content-Type" Content="text/html; charset=utf-8"> and keeping that string into an HTML file, I was able to view the Chinese characters properly:
具有靜電產生裝置之影像輸入裝置
So my question is:
What tools can I use to detect the character set of this text?
And how do I convert/encode/decode them properly in C#?
Updates:
For completion sake, I've updated this test.
[TestMethod]
public void TestMethod1()
{
string encodedText = "具有éœé›»ç”¢ç”Ÿè£ç½®ä¹‹å½±åƒè¼¸å…¥è£ç½®";
Encoding utf8 = new UTF8Encoding();
Encoding window1252 = Encoding.GetEncoding("Windows-1252");
byte[] postBytes = window1252.GetBytes(encodedText);
string decodedText = utf8.GetString(postBytes);
string actualText = "具有靜電產生裝置之影像輸入裝置";
Assert.AreEqual(actualText, decodedText);
}
}
What is happening when you save the "bad" string in a text file with a meta tag declaring the correct encoding is that your text editor is saving the file with Windows-1252 encoding, but the browser is reading the file and interpreting it as UTF-8. Since the "bad" string is incorrectly decoded UTF-8 bytes with the Windows-1252 encoding, you are reversing the process by encoding the file as Windows-1252 and decoding as UTF-8.
Here's an example:
using System.Text;
using System.Windows.Forms;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
string s = "具有靜電產生裝置之影像輸入裝置"; // Unicode
Encoding Windows1252 = Encoding.GetEncoding("Windows-1252");
Encoding Utf8 = Encoding.UTF8;
byte[] utf8Bytes = Utf8.GetBytes(s); // Unicode -> UTF-8
string badDecode = Windows1252.GetString(utf8Bytes); // Mis-decode as Latin1
MessageBox.Show(badDecode,"Mis-decoded"); // Shows your garbage string.
string goodDecode = Utf8.GetString(utf8Bytes); // Correctly decode as UTF-8
MessageBox.Show(goodDecode, "Correctly decoded");
// Recovering from bad decode...
byte[] originalBytes = Windows1252.GetBytes(badDecode);
goodDecode = Utf8.GetString(originalBytes);
MessageBox.Show(goodDecode, "Re-decoded");
}
}
}
Even with correct decoding, you'll still need a font that supports the characters being displayed. If your default font doesn't support Chinese, you still might not see the correct characters.
The correct thing to do is figure out why the string you have was decoded as Windows-1252 in the first place. Sometimes, though, data in a database is stored incorrectly to begin with and you have to resort to these games to fix the problem.
string test = "敭畳灴獩楫n"; //incoming data. must be mesutpiskin
byte[] bytes = Encoding.Unicode.GetBytes(test);
string s = string.Empty;
for (int i = 0; i < bytes.Length; i++)
{
s += (char)bytes[i];
}
s = s.Trim((char)0);
MessageBox.Show(s);
//s=mesutpiskin
I'm not really sure what you mean, but I'm guessing you want to convert between a string in a certain encoding in byte array form and a string. Let's assume the character encoding is called "FooBar":
This is how you encode and decode:
Encoding myEncoding = Encoding.GetEncoding("FooBar");
string myString = "lala";
byte[] myEncodedBytes = myEncoding.GetBytes(myString);
string myDecodedString = myEncoding.GetString(myEncodedBytes);
You can learn more about the Encoding class over at MSDN.
Answering your question at the end of your post:
If you want to determine the text encoding on runtime you should look at that: http://code.google.com/p/ude/
for converting character sets you can use http://msdn.microsoft.com/en-us/library/system.text.encoding.convert(v=vs.100).aspx
It's Windows Latin 1. I pasted the Chinese text as UTF-8 into BBEDIT (a text editor for Mac) and re-opened the file as Windows Latin 1 and bang, the exact diacritics appeared.
i have an input like: DisplaygröÃe
And i want output like: Displaygröÿe
With notepad++ problem was solved by: converting to ansi, encoding to utf8 and converting back to ansi.
I need to do this programmatically in c#.
I've tried converting to / from ansi, utf8, latin-1 and none work properly, it shows ? with a function that uses Encoding.Default.GetBytes, then
res = Enconding.Convert(src1,dest1,bytes) and
EncodingDest.GetChars(res);
where EncodingDest it represent output encoding..
Code is running in Console application, but same result are on WPF.
Doesn't matter with encoding is good for output only if it works, these problems also are for country's like spain, italy or sweden.
use System.Text.Encoding
var ascii = Encoding.ASCII.GetBytes("DisplaygröÃe");
var utf8 = Encoding.Convert(Encoding.ASCII, Encoding.UTF8, ascii);
var output = Encoding.UTF8.GetString(utf8);
When you output a string somewhere (like a TextWriter, or a Stream, or a byte[]), you should always specify the encoding, unless you want the UTF-8 output (the default one):
using(StreamWriter sw = new StreamWriter("file.txt", Encoding.GetEncoding("windows-1252"))
sw.WriteLine("Displaygröÿe");
#DanM: You need to know what character set your input is in.
"DisplaygröÃe" is what you will see if you take the string "Displaygröße" (suggested by Vlad) encode it to bytes as UTF-8, and then incorrectly decode it as latin1.
If you do the same with Displaygröÿe, you would see "Displaygröÿe" (the inverted question mark is literally there, it is not a placeholder for something that can't be displayed.) Technically, "DisplaygröÃe" probably has another character between the à and e, but it is a control code, and is thus invisible to you.
If you have an character set foo, this is true: my_string = foo_decode(foo_encode(my_string)). If you have another character set bar, this is true: barf = bar_decode(foo_encode(my_string)) where barf is garbage like you're seeing.
If you don't know what character set your input is in, you will only decode it correctly by chance.
It appears that your input files are in UTF-8, and you will need to decode the bytes from the file as such. (I don't speak enough C# to help you here... I only speak character encodings.)
using (var rdr = new StreamReader(fs, Encoding.GetEncoding(1252))) {
result = rdr.ReadToEnd();
}
we had similar problem when sending data to text printer, and only one I get working is this (written as extension):
public static byte[] ToAnsiMemBytes(this string input)
{
int length = input.Length;
byte[] result = new byte[length];
try
{
IntPtr bytes = Marshal.StringToCoTaskMemAnsi(input);
Marshal.Copy(bytes, result, 0, length);
}
catch (Exception)
{
result = null;
}
return result;
}
I have a string of richtext characters/tokens that I would like to feed to a richtextbox in code.
string rt = #" {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}{\f1\fnil\fprq2\fcharset0 Biondi;}}"+
#"{\colortbl ;\red255\green0\blue0;}"+
#"{\*\generator Msftedit 5.41.15.1507;}\viewkind4\uc1\pard\f0\fs20\par"+
#"\cf1\f1 hello\cf0\f0 \ul world\par}";
I have attempted this :
System.IO.MemoryStream strm = new System.IO.MemoryStream();
byte[] b = Encoding.ASCII.GetBytes(rt);
strm.BeginRead(b, 0, b.Length, null, null);
richTextBox1.LoadFile(strm, RichTextBoxStreamType.RichText);
it didn't work.
can anyone give me a few sugestions.
BTW the rich text comes from saving from wordpad, opening the file with notepad and using the text with in to build my string
Rich textbox has a property named Rtf. Set that property to your string value. Also, your string has an extra space as the first character. I had to remove that before I saw your Hello World.
Expanding on gbogumil's answer:
string rt = #"{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}{\f1\fnil\fprq2\fcharset0 Biondi;}}"+
#"{\colortbl ;\red255\green0\blue0;}"+
#"{\*\generator Msftedit 5.41.15.1507;}\viewkind4\uc1\pard\f0\fs20\par"+
#"\cf1\f1 hello\cf0\f0 \ul world\par}";
this.richTextBox1.Rtf = rt;