encode and decode strange .shm file data to and from base64 c# - c#

first a depressing fact: https://www.base64decode.org/ can do what i want to do.
i´m trying to encode and decode (to and from base64) a model file (.shm) generated by the image processing tool MVTec Halcon because i want to store it in a xml file.
If i open it, it has this strange form:
HSTF ÿÿÿÿ¿€ Q¿ÙG®záH?Üä4©±w?­Eè}‰#?ð ................
I´m using this methods to encode and decode it:
public static string Base64Encode(string text)
{
Byte[] textBytes = Encoding.Default.GetBytes(text);
return Convert.ToBase64String(textBytes);
}
public static string Base64Decode(string base64EncodedData)
{
Byte[] base64EncodedBytes = Convert.FromBase64String(base64EncodedData);
return Encoding.Default.GetString(base64EncodedBytes);
}
and calling the methods from a gui like this:
var model = File.ReadAllText(#"C:\Users\\Desktop\model_region_nut.txt");
var base64 = ImageConverter.Base64Encode(model);
File.WriteAllText(#"C:\Users\\Desktop\base64.txt", base64);
var modelneu = ImageConverter.Base64Decode(File.ReadAllText(#"C:\Users\\Desktop\base64.txt"));
File.WriteAllText(#"C:\Users\\Desktop\modelneu.txt", modelneu);
my result for modelneu is:
HSTF ?????? Q??G?z?H???4??w??E?}??#??
so you can see that there are lots of missing characters.. I guess the problem is caused by using .Default.
Thanks for your help,
Michel

If you're working with binary data, there is no reason at all to go through text decoding and encoding. Doing so only risks corrupting the data in various ways, even if you're using a consistent character encoding.
Just use File.ReadAllBytes() instead of File.ReadAllText() and skip the unnecessary Encoding step.

The problem is with reading file with unspecified encoding, check this question.
As mentioned there you can go with overload for ReadAllText to specify encoding and also for writing you must specofy encoding for WriteAllText I suggest using UTF-8 encoding so:
var model = File.ReadAllText(#"C:\Users\pichlerm\Desktop\model_region_nut.txt",Encoding.UTF8);
var base64 = ImageConverter.Base64Encode(model);
File.WriteAllText(#"C:\Users\\Desktop\base64.txt", base64,Encoding.UTF8);
var modelneu = ImageConverter.Base64Decode(File.ReadAllText(#"C:\Users\\Desktop\base64.txt"));
File.WriteAllText(#"C:\Users\pichlerm\Desktop\modelneu.txt", modelneu);

Related

Flurl AddFile fileName Encoding

I try to use flurl to send a file like this:
public ImportResponse Import(ImportRequest request, string fileName, Stream stream)
{
request).PostAsync(content).Result<ImportTariffResponse>();
return FlurlClient(Routes.Import, request).PostMultipartAsync(mp => mp.AddJson("json", request).AddFile("file", stream, ConvertToAcsii(fileName))).Result<ImportResponse>();
}
fileName = "Файл импорта тарифов (1).xlsx"
But in post method I get this:
Request.Files.FirstOrDefault().FileName =
"=?utf-8?B?0KTQsNC50Lsg0LjQvNC/0L7RgNGC0LAg0YLQsNGA0LjRhNC+0LIgKDEpLnhsc3g=?="
Any suggestions?
The filename appears to be encoded using MIME encoded-word syntax. (Flurl doesn't do this directly, it presumably happens deeper down in the HttpClient libraries when non-ASCII characters are detected.) .NET doesn't directly support decoding this format, but you can do it yourself fairly easily. If you strip the =?utf-8?B? from the beginning and ?= from the end, what you're left with is your filename base64 encoded.
Here's one way you could do it:
var base64 = Request.Files.FirstOrDefault().FileName.Split('?')[3];
var bytes = Convert.FromBase64String(base64);
var filename = Encoding.UTF8.GetString(bytes);

Converting a byte[] string back to byte[] array

I have one scenario with class like this.
Class Document
{
public string Name {get;set;}
public byte[] Contents {get;set;}
}
Now I am trying to implement the import export functionality where I keep the document in binary so the document will be in json file with other fields and the document will be something in this format.
UEsDBBQABgAIAAAAIQCitGbRsgEAALEHAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACAAAAAAA==
Now when I upload this file back, I get this file as a string and I get the same data but when I try to convert this in binary bytes[] the file become corrupt.
How can I achieve this ?
I use something like this to convert
var ss = sr.ReadToEnd();
MemoryStream stream = new MemoryStream();
StreamWriter writer = new StreamWriter(stream);
writer.Write(ss);
writer.Flush();
stream.Position = 0;
var bytes = default(byte[]);
bytes = stream.ToArray();
This looks like base 64. Use:
System.Convert.ToBase64String(b)
https://msdn.microsoft.com/en-us/library/dhx0d524%28v=vs.110%29.aspx
And
System.Convert.FromBase64String(s)
https://msdn.microsoft.com/en-us/library/system.convert.frombase64string%28v=vs.110%29.aspx
You need to de-code it from base64, like this:
Assuming you've read the file into ss as a string.
var bytes = Convert.FromBase64String(ss);
There are several things going on here. You need to know the encoding for the default StreamWriter, if it is not specified it defaults to UTF-8 encoding. However, .NET strings are always either UNICODE or UTF-16.
MemoryStream from string - confusion about Encoding to use
I would suggest using System.Convert.ToBase64String(someByteArray) and its counterpart System.Convert.FromBase64String(someString) to handle this for you.

How to encode and decode Broken Chinese/Unicode characters?

I've tried googling around but wasn't able to find what charset that this text below belongs to:
具有éœé›»ç”¢ç”Ÿè£ç½®ä¹‹å½±åƒè¼¸å…¥è£ç½®
But putting <meta http-equiv="Content-Type" Content="text/html; charset=utf-8"> and keeping that string into an HTML file, I was able to view the Chinese characters properly:
具有靜電產生裝置之影像輸入裝置
So my question is:
What tools can I use to detect the character set of this text?
And how do I convert/encode/decode them properly in C#?
Updates:
For completion sake, I've updated this test.
[TestMethod]
public void TestMethod1()
{
string encodedText = "具有éœé›»ç”¢ç”Ÿè£ç½®ä¹‹å½±åƒè¼¸å…¥è£ç½®";
Encoding utf8 = new UTF8Encoding();
Encoding window1252 = Encoding.GetEncoding("Windows-1252");
byte[] postBytes = window1252.GetBytes(encodedText);
string decodedText = utf8.GetString(postBytes);
string actualText = "具有靜電產生裝置之影像輸入裝置";
Assert.AreEqual(actualText, decodedText);
}
}
What is happening when you save the "bad" string in a text file with a meta tag declaring the correct encoding is that your text editor is saving the file with Windows-1252 encoding, but the browser is reading the file and interpreting it as UTF-8. Since the "bad" string is incorrectly decoded UTF-8 bytes with the Windows-1252 encoding, you are reversing the process by encoding the file as Windows-1252 and decoding as UTF-8.
Here's an example:
using System.Text;
using System.Windows.Forms;
namespace Demo
{
class Program
{
static void Main(string[] args)
{
string s = "具有靜電產生裝置之影像輸入裝置"; // Unicode
Encoding Windows1252 = Encoding.GetEncoding("Windows-1252");
Encoding Utf8 = Encoding.UTF8;
byte[] utf8Bytes = Utf8.GetBytes(s); // Unicode -> UTF-8
string badDecode = Windows1252.GetString(utf8Bytes); // Mis-decode as Latin1
MessageBox.Show(badDecode,"Mis-decoded"); // Shows your garbage string.
string goodDecode = Utf8.GetString(utf8Bytes); // Correctly decode as UTF-8
MessageBox.Show(goodDecode, "Correctly decoded");
// Recovering from bad decode...
byte[] originalBytes = Windows1252.GetBytes(badDecode);
goodDecode = Utf8.GetString(originalBytes);
MessageBox.Show(goodDecode, "Re-decoded");
}
}
}
Even with correct decoding, you'll still need a font that supports the characters being displayed. If your default font doesn't support Chinese, you still might not see the correct characters.
The correct thing to do is figure out why the string you have was decoded as Windows-1252 in the first place. Sometimes, though, data in a database is stored incorrectly to begin with and you have to resort to these games to fix the problem.
string test = "敭畳灴獩楫n"; //incoming data. must be mesutpiskin
byte[] bytes = Encoding.Unicode.GetBytes(test);
string s = string.Empty;
for (int i = 0; i < bytes.Length; i++)
{
s += (char)bytes[i];
}
s = s.Trim((char)0);
MessageBox.Show(s);
//s=mesutpiskin
I'm not really sure what you mean, but I'm guessing you want to convert between a string in a certain encoding in byte array form and a string. Let's assume the character encoding is called "FooBar":
This is how you encode and decode:
Encoding myEncoding = Encoding.GetEncoding("FooBar");
string myString = "lala";
byte[] myEncodedBytes = myEncoding.GetBytes(myString);
string myDecodedString = myEncoding.GetString(myEncodedBytes);
You can learn more about the Encoding class over at MSDN.
Answering your question at the end of your post:
If you want to determine the text encoding on runtime you should look at that: http://code.google.com/p/ude/
for converting character sets you can use http://msdn.microsoft.com/en-us/library/system.text.encoding.convert(v=vs.100).aspx
It's Windows Latin 1. I pasted the Chinese text as UTF-8 into BBEDIT (a text editor for Mac) and re-opened the file as Windows Latin 1 and bang, the exact diacritics appeared.

c# encoding issue with?

i have an input like: DisplaygröÃe
And i want output like: Displaygröÿe
With notepad++ problem was solved by: converting to ansi, encoding to utf8 and converting back to ansi.
I need to do this programmatically in c#.
I've tried converting to / from ansi, utf8, latin-1 and none work properly, it shows ? with a function that uses Encoding.Default.GetBytes, then
res = Enconding.Convert(src1,dest1,bytes) and
EncodingDest.GetChars(res);
where EncodingDest it represent output encoding..
Code is running in Console application, but same result are on WPF.
Doesn't matter with encoding is good for output only if it works, these problems also are for country's like spain, italy or sweden.
use System.Text.Encoding
var ascii = Encoding.ASCII.GetBytes("DisplaygröÃe");
var utf8 = Encoding.Convert(Encoding.ASCII, Encoding.UTF8, ascii);
var output = Encoding.UTF8.GetString(utf8);
When you output a string somewhere (like a TextWriter, or a Stream, or a byte[]), you should always specify the encoding, unless you want the UTF-8 output (the default one):
using(StreamWriter sw = new StreamWriter("file.txt", Encoding.GetEncoding("windows-1252"))
sw.WriteLine("Displaygröÿe");
#DanM: You need to know what character set your input is in.
"DisplaygröÃe" is what you will see if you take the string "Displaygröße" (suggested by Vlad) encode it to bytes as UTF-8, and then incorrectly decode it as latin1.
If you do the same with Displaygröÿe, you would see "Displaygröÿe" (the inverted question mark is literally there, it is not a placeholder for something that can't be displayed.) Technically, "DisplaygröÃe" probably has another character between the à and e, but it is a control code, and is thus invisible to you.
If you have an character set foo, this is true: my_string = foo_decode(foo_encode(my_string)). If you have another character set bar, this is true: barf = bar_decode(foo_encode(my_string)) where barf is garbage like you're seeing.
If you don't know what character set your input is in, you will only decode it correctly by chance.
It appears that your input files are in UTF-8, and you will need to decode the bytes from the file as such. (I don't speak enough C# to help you here... I only speak character encodings.)
using (var rdr = new StreamReader(fs, Encoding.GetEncoding(1252))) {
result = rdr.ReadToEnd();
}
we had similar problem when sending data to text printer, and only one I get working is this (written as extension):
public static byte[] ToAnsiMemBytes(this string input)
{
int length = input.Length;
byte[] result = new byte[length];
try
{
IntPtr bytes = Marshal.StringToCoTaskMemAnsi(input);
Marshal.Copy(bytes, result, 0, length);
}
catch (Exception)
{
result = null;
}
return result;
}

Base64 decode in C# or Java

I have a Base64-encoded object with the following header:
application/x-xfdl;content-encoding="asc-gzip"
What is the best way to proceed in decoding the object? Do I need to strip the first line? Also, if I turn it into a byte array (byte[]), how do I un-gzip it?
Thanks!
I think I misspoke initially. By saying the header was
application/x-xfdl;content-encoding="asc-gzip"
I meant this was the first line of the file. So, in order to use the Java or C# libraries to decode the file, does this line need to be stripped?
If so, what would be the simplest way to strip the first line?
To decode the Base64 content in C# you can use the Convert Class static methods.
byte[] bytes = Convert.FromBase64String(base64Data);
You can also use the GZipStream Class to help deal with the GZipped stream.
Another option is SharpZipLib. This will allow you to extract the original data from the compressed data.
I was able to use the following code to convert an .xfdl document into a Java DOM Document.
I used iHarder's Base64 utility to do the Base64 Decode.
private static final String FILE_HEADER_BLOCK =
"application/vnd.xfdl;content-encoding=\"base64-gzip\"";
public static Document OpenXFDL(String inputFile)
throws IOException,
ParserConfigurationException,
SAXException
{
try{
//create file object
File f = new File(inputFile);
if(!f.exists()) {
throw new IOException("Specified File could not be found!");
}
//open file stream from file
FileInputStream fis = new FileInputStream(inputFile);
//Skip past the MIME header
fis.skip(FILE_HEADER_BLOCK.length());
//Decompress from base 64
Base64.InputStream bis = new Base64.InputStream(fis,
Base64.DECODE);
//UnZIP the resulting stream
GZIPInputStream gis = new GZIPInputStream(bis);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(gis);
gis.close();
bis.close();
fis.close();
return doc;
}
catch (ParserConfigurationException pce) {
throw new ParserConfigurationException("Error parsing XFDL from file.");
}
catch (SAXException saxe) {
throw new SAXException("Error parsing XFDL into XML Document.");
}
}
Still working on successfully modifying and re-encoding the document.
Hope this helps.
In Java, you can use the Apache Commons Base64 class
String decodedString = new String(Base64.decodeBase64(encodedBytes));
It sounds like you're dealing with data that is both gzipped and Base 64 encoded. Once you strip off any mime headers, you should convert the Base64 data to a byte array using something like Apache commons codec. You can then wrap the byte[] in a ByteArrayInputStream object and pass that to a GZipInputStream which will let you read the uncompressed data.
For java, have you tried java's built in java.util.zip package? Alternately, Apache Commons has the Commons Compress library to work with zip, tar and other compressed file types. As to decoding Base 64, there are several open source libraries, or you can use Sun's sun.misc.BASE64Decoder class.
Copied from elsewhere, for Base64 I link to commons-codec-1.6.jar:
public static String decode(String input) throws Exception {
byte[] bytes = Base64.decodeBase64(input);
BufferedReader in = new BufferedReader(new InputStreamReader(
new GZIPInputStream(new ByteArrayInputStream(bytes))));
StringBuffer buffer = new StringBuffer();
char[] charBuffer = new char[1024];
while(in.read(charBuffer) != -1) {
buffer.append(charBuffer);
}
return buffer.toString();
}

Categories