Detect Stream or Byte Array Encoding on Windows Phone - c#

I'm trying to read xmls that I downloaded with the WebClient.OpenReadAsync() in a Windows Phone application. The problem is that sometimes, the xml won't come with UTF8 Encoding, it might come with other encodings such as "ISO-8859-1", so the accents come messed up.
I was able to load one of the ISO-8858-1 xmls perfectly using the code:
var buff = e.Result.ReadFully(); //Gets byte array from the stream
var resultBuff = Encoding.Convert(Encoding.GetEncoding("ISO-8859-1"), Encoding.UTF8, buff);
var result = Encoding.UTF8.GetString(resultBuff, 0, resultBuff.Length);
It works beautifully with ISO-8859-1, the text came perfect after, but It messed up the UTF8 xmls.
So, the idea here is to detect the encoding of the byte array or the stream before doing this, then if it's not UTF8, it will convert the data using the method above with the detected encoding.
I am searching for some method that can detect the encoding on the internet but I cannot find any!
Does anybody know how I could do this kind of thing on Windows Phone?
Thanks!

You can look for the "Content-Type" value in the WebClient.ResponseHeaders property; If you are lucky the server is setting it to indicate the type of media plus its encoding (e.g. "text/html; charset=ISO-8859-4").

Related

C# Protobuf: 'Protocol message contained a tag with an invalid wire type.'

I am trying to read some protobuf data from a collegue - he created it in C++ and set the encoding to unicode, and transfer mode for protobuf is binary.
In python, this works perfectly:
with open('test.out', 'rb') as f:
dfile = protoclass_pb2.DataFile()
dfile.ParseFromString(f.read())
print(dfile.MetaData.author)
just like a charm.
In C# howevern, I try:
string filepath = "test.out"
FileStream fst = new FileStream(filepath,
DataFile data = DataFile.Parser.ParseDelimitedFrom(fst);
fst.Close();
and get the Exception:
Google.Protobuf.InvalidProtocolBufferException: 'Protocol message contained a tag with an invalid wire type.'
I tired somehow setting an encoding to the stream, but as far I can tell, I can only set encoding to StreamReader but not Stream itself.
Just reading the file content to an array produces the same in both languages.
How can I read the data in to C#?
Found the problem: the collegue serialized the message with SerializeToOstream and I tried reading with ParseDelimitedFrom.
Easlily fixed by either using SerializeDelimitedToOstream to serialize or ParseFrom to read.

Can we convert a live video stream into a byte array?

I'm using C#.NET I'm getting a live video stream from a url(rtsp://streamurl). Now I want to know if we can convert this live stream into a byte array so that I can use NReco.VideoConverter component to encode this Stream using .h264 and then stream it via a server.
I'm currently gathering details and studying basics on NReco.VideoEncoder. It has a method to convert a live video stream, but for the input file, it requires System.IO.Stream instead of a URL path. That's why I'm asking this question. Thanks!
I have no experience with NReco.VideoEncoder, so this is just a guess:
When looking at your link to the interface you'll see:
public ConvertLiveMediaTask ConvertLiveMedia(
Stream inputStream,
string inputFormat,
string outputFile,
string outputFormat,
ConvertSettings settings
)
Stream is very flexible (first input param), so you should be able to use anything from file as well as web... so you should be able to do it this way (haven't compiled this code):
// convert url to stream
WebRequest request=WebRequest.Create(url); // your rtsc url?
request.Timeout=30*60*1000;
request.UseDefaultCredentials=true;
request.Proxy.Credentials=request.Credentials;
WebResponse response=(WebResponse)request.GetResponse();
using (Stream stream = response.GetResponseStream())
{
var converter = new FFMpegConverter(); // init converter
converter.ConvertLiveMedia(stream, // put your stream here
"???", // problem here... no rtsc support in Formats enum found, so you might need to know the video format
"C:\whateverpath\whatever.hevc", // extension?
Format.h265);
}
I don't see how rtsc is supported here and you might need to now what kind of video encoding is packed into rtsc first, otherwise the converter doesn't understand the input (at least when using this interface you mentioned).
And that's what I meant in my comment: You need to know the data structure of the (byte) stream to know how to interpret the bits or you have to make a guess.
Their website states the feature:
Live video stream transcoding from C# Stream (or Webcam, RTSP URL, file) to C#
Stream (or streaming server URL, file)

C# : Change String Encoding?

I'm struggling with the encoding of one of my string.
On a Mail Sending WS, I'm receiving a bad string containing "�" instead of "é" (that's what I'm seeing in the Debug Mode of Visual Studio at least).
The character comes from some JSON that is deserialized when entering the WS into my DTO.
Changing the Content-Type of the JSON is not solving the thing.
So I thought I'll change the encoding of my string by myself, because the JSON encoding thing seems like a VS deserialization issue (I started a thread here if one of you guys want to take a look at it).
I tried :
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding defaultEncoding = Encoding.Default;
byte[] bytes = defaultEncoding.GetBytes(messedUpString);
byte[] isoBytes = Encoding.Convert(defaultEncoding, iso, bytes);
cleanString = iso.GetString(isoBytes);
Or :
byte[] bytes = Encoding.Default.GetBytes(messedUpString);
cleanString = Encoding.UTF8.GetString(bytes);
And it's not really effective... I get rid of the "�" char, which is the nice part, but I'm receiving in the cleanString "?" instead of the expected "é", and this in not really nice, or at least, the expected behavior.
In fact, every thing was fine in my application.
I used SOAPUI to test, and this was my error.
I downloaded some rest plugin for my browser, try from there, and everything worked.
Thanks for the help though #MattiVirkkunen

Sending quotation marks in a GCM Payload (and other special characters that break syntax)

I'm struggling finding a feasible solution to this. I've tried looking around but can't find any documentation regarding this issue. If a customer sends out a message with quote(s), it break the payload syntax and android spits me back a 400 Bad Request error.
The only solution I can think of is by doing my own translations and validations. Allow only the basics, and for the restricted do my own "parsing" Ie take a quote, replace them with "/q" and then replace "/q" on the App when received. I don't like this solution because it involves logic on the App that if, I forget something. I want to be able to change it immediately rather then update everyones phone, app, etc.
I'm looking for an existing encoding I could apply that is processed correctly by the GCM servers. Allowing them to be accepted then broadcasted. Received by the phone with the characters intact.
Base64 encoding should get rid of the special characters. Just encode it before sending and decode it again on receiving:
Edit: sorry, just got a java/android sample here, I don't know how exactly xamarin works and what functions it provides:
// before sending
byte[] data = message.getBytes("UTF-8");
String base64Message = Base64.encodeToString(data, Base64.DEFAULT);
// on receiving
byte[] data = Base64.decode(base64Message , Base64.DEFAULT);
String message= new String(data, "UTF-8");
.Net translation of #tknell solution
Decode:
Byte[] data = System.Convert.FromBase64String(encodedString);
String decoded = System.Text.Encoding.UTF8.GetString(data);
Encode:
Byte[] data = System.Text.Encoding.UTF8.GetBytes(decodedString);
String encoded = System.Convert.ToBase64String(data);

How do I get this encoding right with ANTLR?

I'm working on a project for school. We are making a static code analyzer.
A requirement for this is to analyse C# code in Java, which is going so far so good with ANTLR.
I have made some example C# code to scan with ANTLR in Visual Studio. I analyse every C# file in the solution. But it does not work. I am getting a memory leak and the error message :
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at org.antlr.runtime.Lexer.emit(Lexer.java:151)
at org.antlr.runtime.Lexer.nextToken(Lexer.java:86)
at org.antlr.runtime.CommonTokenStream.fillBuffer(CommonTokenStream.java:119)
at org.antlr.runtime.CommonTokenStream.LT(CommonTokenStream.java:238)
After a while I thought it was an issue with encoding, because all the files are in UTF-8. I think it can't read the encoded Stream. So i opened Notepad++ and i changed the encoding of every file to ANSI, and then it worked. I don't really understand what ANSI means, is this one character set or some kind of organisation?
I want to change the encoding from any encoding (probably UTF-8) to this ANSI encoding so i won't get memory leaks anymore.
This is the code that makes the Lexer and Parser:
InputStream inputStream = new FileInputStream(new File(filePath));
CharStream charStream = new ANTLRInputStream(inputStream);
CSharpLexer cSharpLexer = new CSharpLexer(charStream);
CommonTokenStream commonTokenStream = new CommonTokenStream(cSharpLexer);
CSharpParser cSharpParser = new CSharpParser(commonTokenStream);
Does anyone know how to change the encoding of the InputStream to the right encoding?
And what does Notepad++ do when I change the encoding to ANSI?
When reading text files you should set the encoding explicitly. Try you examples with the following change
CharStream charStream = new ANTLRInputStream(inputStream, "UTF-8");
I solved this issue by putting the ImputStream into a BufferedStream and then removed the Byte Order Mark.
I guess my parser didn't like that encoding, because I also tried set the encoding explicitly.

Categories