Send XML message to MSMQ without any formatting - c#

I need to send an XMLDocument object to MSMQ. I don’t have a class to deserialize it into (the xml may vary). The default formatter, XMLMessageFormatter, will “pretty print” the object. This causes a problem since
<text></text>
Will be converted to
<text>
</text>
(I.e. cr + spaces). The message is being read by a process using the default XMLMessageFormatter and hasn’t been an issue whilst nodes have data in them. This is an issue, however, further down the line, as a process (out of my control) will interpret these new characters as data and cause an error.
I know I could write some code to convert them using IsEmpty = true giving <text /> but I’d like a solution that doesn’t alter the object at all.
BinaryMessageFormatter will prefix the data with BOM data (receiver is not expecting that) and ActiveXMessageFormatter will double byte the string (again causing issues the other end).
I would rather like to avoid having to write a custom message formatter. I’ve tried including some options in the XMLMessageFormatter but they’ve had little effect. Any ideas would be very much appreciated.

MSMQ operates on raw blobs. You do not have to use a formatter unless you want to.
To send a message and get it back byte-for-byte identical, use the BodyStream property.
Example:
var queue = new MessageQueue(#".\private$\queueName");
var msg = new Message();
msg.BodyStream = new MemoryStream(Encoding.UTF8.GetBytes("<root><test></test></root>"));
queue.Send(msg);
Resultant message:

Related

C# - Deserializing when whitespace between tags is delimited

I am posting some XML to an API Gateway method in AWS, which has an integration to SNS. An SQS queue is then subscribed to the topic; and I have a C# process which polls the queue intermittently and needs to deserialize the XML.
The trouble is, the whitespace between the XML tags ends up getting encoded along the line somewhere, so tabs become \t and new lines become \r\n. But these end up as physical tokens inside the string.
Example XML which is posted to API Gateway:
<?xml version="1.0" encoding="utf-8"?>
<ProfileInformation>
<Username>bgs264</Username>
</ProfileInformation>
String which is read off the SQS queue:
<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<ProfileInformation>\n\t<Username>bgs264</Username>\n</ProfileInformation>
Note that the attributes in the declaration end up as \" and the whitespace posted ends up as \t, \r\n, etc.
However these aren't "the strings appearing as such in the debugger, but it's actually a tab", they are actually like this in the string.
So when I try to deserialize, using
using (var reader = new StringReader(message))
var myObj = serializer.Deserialize(reader) as ProfileInformation);
I get:
InvalidOperationException: There is an error in XML document (1, 15).
It refers to the first \ character in the declaration, as in version=\"1.0\"
My immediate idea was to simply string.Replace \t to empty string, etc, but that's unacceptable because it might be valid that the user's username is actually is bgs\t264 and the replace here would cause an inconsistency. In this example, I presume I would get bgs\\t264 in the message, so a replace would leave me, erroneously, with bgs\264 for example.
So I need to fix these \n\t characters where they occur between XML tags.
For what it's worth, I also have a lambda written in Go which has no problem with this and simply deserializes the exact same string straight into XML. So it must be possible.
My intial thoughts:
Can I somehow decode the string before passing it for
deserialization? I tried this with HttpUtility.DecodeHtml but I
don't think it's actually HTML that I'm trying to decode!
Is there a different XML library I can use that would work?
I would guess, and some googling seems to support the theory, that the message you're seeing has been converted to JSON & the escape sequences are as a consequence of that.
The ideal approach would be to investigate and prevent this from happening. I don't know enough about SNS to advise & you indicate this is a non-starter, so the simplest approach would be to reverse this process once you receive the message.
You can use a JSON library like Json.NET to do this:
var jsonString = string.Format("\"{0}\"", message);
var xmlString = JsonConvert.DeserializeObject<string>(jsonString);
using (var reader = new StringReader(xmlString))
{
var profileInformation = (ProfileInformation) serializer.Deserialize(reader);
}

Deserializing ServiceBus content in Azure Logic App

I'm trying to read the content body of a message in an Azure Logic App, but I'm not having much success. I have seen a lot of suggestions which say that the body is base64 encoded, and suggest using the following to decode:
#{json(base64ToString(triggerBody()?['ContentData']))}
The base64ToString(...) part is decoding the content into a string correctly, but the string appears to contain a prefix with some extra serialization information at the start:
#string3http://schemas.microsoft.com/2003/10/Serialization/�3{"Foo":"Bar"}
There are also some extra characters in that string that are not being displayed in my browser. So the json(...) function doesn't accept the input, and gives an error instead.
InvalidTemplate. Unable to process template language expressions in
action 'HTTP' inputs at line '1' and column '2451': 'The template
language function 'json' parameter is not valid. The provided value
#string3http://schemas.microsoft.com/2003/10/Serialization/�3{"Foo":"bar" }
cannot be parsed: Unexpected character encountered while parsing value: #. Path '', line 0, position 0.. Please see https://aka.ms/logicexpressions#json for usage details.'.
For reference, the messages are added to the topic using the .NET service bus client (the client shouldn't matter, but this looks rather C#-ish):
await TopicClient.SendAsync(new BrokeredMessage(JsonConvert.SerializeObject(item)));
How can I read this correctly as a JSON object in my Logic App?
This is caused by how the message is placed on the ServiceBus, specifically in the C# code. I was using the following code to add a new message:
var json = JsonConvert.SerializeObject(item);
var message = new BrokeredMessage(json);
await TopicClient.SendAsync(message);
This code looks fine, and works between different C# services no problem. The problem is caused by the way the BrokeredMessage(Object) constructor serializes the payload given to it:
Initializes a new instance of the BrokeredMessage class from a given object by using DataContractSerializer with a binary XmlDictionaryWriter.
That means the content is serialized as binary XML, which explains the prefix and the unrecognizable characters. This is hidden by the C# implementation when deserializing, and it returns the object you were expecting, but it becomes apparent when using a different library (such as the one used by Azure Logic Apps).
There are two alternatives to handle this problem:
Make sure the receiver can handle messages in binary XML format
Make sure the sender actually uses the format we want, e.g. JSON.
Paco de la Cruz's answer handles the first case, using substring, indexOf and lastIndexOf:
#json(substring(base64ToString(triggerBody()?['ContentData']), indexof(base64ToString(triggerBody()?['ContentData']), '{'), add(1, sub(lastindexof(base64ToString(triggerBody()?['ContentData']), '}'), indexof(base64ToString(triggerBody()?['ContentData']), '}')))))
As for the second case, fixing the problem at the source simply involves using the BrokeredMessage(Stream) constructor instead. That way, we have direct control over the content:
var json = JsonConvert.SerializeObject(item);
var bytes = Encoding.UTF8.GetBytes(json);
var stream = new MemoryStream(bytes);
var message = new BrokeredMessage(stream, true);
await TopicClient.SendAsync(message);
You can use the substring function together with indexOf and lastIndexOf to get only the JSON substring.
Unfortunately, it's rather complex, but it should look something like this:
#json(substring(base64ToString(triggerBody()?['ContentData']), indexof(base64ToString(triggerBody()?['ContentData']), '{'), add(1, sub(lastindexof(base64ToString(triggerBody()?['ContentData']), '}'), indexof(base64ToString(triggerBody()?['ContentData']), '}')))))
More info on how to use these functions here.
HTH
Paco de la Cruz solution worked for me, though I had to swap out the last '}' in the expression for a '{', otherwise it finds the wrong end of the data segment.
I also split it into two steps to make it a little more manageable.
First I get the decoded string out of the message into a variable (that I've called MC) using:
#{base64ToString(triggerBody()?['ContentData'])}
then in another logic app action do the substring extraction:
#{substring(variables('MC'),indexof(variables('MC'),'{'),add(1,sub(lastindexof(variables('MC'),'}'),indexof(variables('MC'),'{'))))}
Note that the last string literal '{' is reversed from Paco's solution.
This is working for my test cases, but I'm not sure how robust this is.
Also, I've left it as a String, I do the conversion to JSON later in my logic app.
UPDATE
We have found that just occasionally (2 in several hundred runs) the text that we want to discard can contain the '{' character.
I have modified our expression to explicitly locate the start of the data segment, which for me is:
'{"IntegrationRequest"'
so the substitution becomes:
#{substring(variables('MC'),indexof(variables('MC'),'{"IntegrationRequest"'),add(1,sub(lastindexof(variables('MC'),'}'),indexof(variables('MC'),'{"IntegrationRequest"'))))}

MSMQ. Displaying message body

I faced the problem of display message body. I send test message (using XmlFormatter) to queue using C# (I have Windows 7).
How can I remove hex from message body preview?
I found interesting moment: if body less 612 bytes - xml display is ok, but if body more 612 bytes - appears hex.
I can't use BinaryFormatter, because I need show in the message property (body) clear xml. (If use BinaryFormatter, hex displayed too).
I tried create custom (TxtFormatter) formatter - hex displayed too.
I found solution. Just need use ActiveXMessageFormatter
If you are using XmlMessageFormatter() while passing a non-XML string, you should ensure the following:
If message is an object, ensure that it has the attribute [Serializable]. Otherwise, the message body will be hexadecimal format.
Ensure the resulting serialized object attributes map correctly and have valid values, if defined in an XSD (XML Schema Definition) file.
Or you could use MSMQ Studio to view MSMQ messages. https://msmq-studio.com

SQL Service Broker Queue Handling Foreign Characters in the message

Setup: I have a form built in asp.net/c# that, on submit, XML serializes it's object model and calls a stored procedure with that XML serialized data as the sole parameter. The stored procedure sends that data to a sql broker queue. The message sent to the broker queue must be valid XML that obeys the message contract set on the queue. That message is picked up by BizTalk and processed accordingly.
Problem: Originally the data submitted to me was just regular English characters (essentially held to ASCII charset) but a requirement is on the horizon to support foreign characters as well. In my testing, I've noticed that if I try to submit something with foreign characters (chinese, arabic, etc), I get an error in the queue and the message that gets to BizTalk ends up with "?????" in place of the foreign characters. I've added the utf=16 xml header to the top of the document, but that doesn't seem to help.
Question: Is there a way I can cast the incoming XML message as nvarchar and still have it be considered valid XML by the queue? I don't want to change the actual type on the queue or recreate it. I'd prefer to change the message in the stored proc alone in some way that allows it to get on the queue.
Thanks in advance for your help.
I ended up handling this by encoding the characters using HTML5 and then security escaping them. I ran into some issues using the HttpUtility library to handle this encoding so I added the method that I used to handle the encoding.
I wish I could give direct credit for this, I can't remember where I found this but thank you to whomever it was:
private string EncodeToHTML(string text)
{
// call the normal HtmlEncode first
char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
StringBuilder encodedValue = new StringBuilder();
foreach (char c in chars)
{
if ((int)c > 127) // above normal ASCII
encodedValue.Append("&#" + (int)c + ";");
else
encodedValue.Append(c);
}
return encodedValue.ToString();
}

C# Issue with reading XML with chars of different encodings in it

I faced a problem with reading the XML. The solution was found, but there are still some questions. The incorrect XML file is in encoded in UTF-8 and has appropriate mark in its header. But it also includes a char encoded in UTF-16 - 'é'. This code was used to read XML file for validating its content:
var xDoc = XDocument.Load(taxFile);
It raises exception for specified incorrect XML file: "Invalid character in the given encoding. Line 59, position 104." The quick fix is as follows:
XDocument xDoc = null;
using (var oReader = new StreamReader(taxFile, Encoding.UTF8))
{
xDoc = XDocument.Load(oReader);
}
This code doesn't raise exception for the incorrect file. But the 'é' character is loaded as �. My first question is "why does it work?".
Another point is using XmlReader doesn't raise exception until the node with 'é' is loaded.
XmlReader xmlTax = XmlReader.Create(filePath);
And again the workout with StreamReader helps. The same question.
It seems like the fix solution is not good enough, cause one day :) XML encoded in another format may appear and it could be proceed in the wrong way. BUT I've tried to process UTF-16 formatted XML file and it worked fine (configured to UTF-8).
The final question is if there are any options to be provided for XDocument/XmlReader to ignore characters encoding or smth like this.
Looking forward for your replies. Thanks in advance
The first thing to note is that the XML file is in fact flawed - mixing text encodings in the same file like this should not be done. The error is even more obvious when the file actually has an explicit encoding embedded.
As for why it can be read without exception with StreamReader, it's because Encoding contains settings to control what happens when incompatible data is encountered
Encoding.UTF8 is documented to use fallback characters. From http://msdn.microsoft.com/en-us/library/system.text.encoding.utf8.aspx:
The UTF8Encoding object that is returned by this property may not have
the appropriate behavior for your application. It uses replacement
fallback to replace each string that it cannot encode and each byte
that it cannot decode with a question mark ("?") character.
You can instantiate the encoding yourself to get different settings. This is most probably what XDocument.Load() does, as it would generally be bad to hide errors by default.
http://msdn.microsoft.com/en-us/library/system.text.utf8encoding.aspx
If you are being sent such broken XML files step 1 is to complain (loudly) about it. There is no valid reason for such behavior. If you then absolutely must process them anyway, I suggest having a look at the UTF8Encoding class and its DecoderFallbackProperty. It seems you should be able to implement a custom DecoderFallback and DecoderFallbackBuffer to add logic that will understand the UTF-16 byte sequence.

Categories