I want to read the alert message from the object
{
alert = "1\n2\n3";
sound = default;
}
I have tried serializing it to JSON with newtonsoft and I've also tried converting it to a class. but both failed due to the formatting.
That is not valid JSON, so the best thing you can do is try to parse it yourself.
You can use a full-fledged parser/lexer for it, like ANTLR. You might get enough with some C# or regex, depending on other outputs to expect.
This regex might be a start:
alert = \"(.*?)\";.* sound = (.*?);
I have tried serializing it to JSON
But that's not JSON.
So you could try with some regex to extract the desired value:
var match = Regex.Match(payloadStr, #"alert\s=\s\""(.+)\""");
if (match.Success)
{
string alertText = match.Groups[1].Value;
}
As far as how reliable this regex is would very much depend on this custom format that is being used and what possible values it can get.
Related
I'm getting some JSON for an outside source that can't be changed and apparently they don't understand the rules about escaping characters correctly in JSON string values. So they have a string value that might have tabs in it, for example, that should have been escaped and other invalid escape sequences like \$. I'm trying to parse this with JSON.Net but it keeps falling over on these sequences.
For example, the source might look something like this:
{
"someRegularProp": 10,
"aNormalString": "foo bar etc",
"anInvalidString": "foo <tab \$100"
}
and it's parsed with
var obj = JObject.Parse(json);
So I can fix this specific case with something like:
json = json.Replace("\t", "").Replace("\\$", "$"); // note: in this case I'm fine with just stripping the tabs out
But is there a general way to fix these problems to remove invalid escape sequences before parsing? Because I don't know what other invalid sequences they might put in there?
I don't see general way. Obviously they are using bugged library or no library at all to generate this output and unless you explore more, all you can do is try as much output from them as possible to find all problems.
Perhaps make a script to generate as much output as possible and validate all of that, then you can be at least a bit more sure.
I have a string where I need to use as the body of a JSON object. I know its possible that the data could have quotes in it, so I parse through to add an escape character to those instance of quotes.. like so:
string NewComment = comment.Replace("\"", "\\\"");
However, somehow on some edgecases, a quote still makes it through. I don't know if this is something with UTF or some other issue, But I am trying to find a function that would safely create a json compatible string, I figured there has to be something like this out there, or a regex way of doing so.
Basically a TLDR is how to create a json syntax safe string from a c# string
The simple answer is don't do it this way. What if you have escaped quotes in your string? "Hello \"World\"" would become invalid with such a simple approach: "Hello \\"World\\"". JSON.Net or Newtonsoft are going to save you so many headaches in the long run.
I have a literal string that contains details of a json array that i need to extract a value from in C#
The string looks like the following:
"{\"Field1\":[],\"Field2\":333,\"Field3\":\"string\"....
Now Field2 is the field i wish to get in this isntance, but i have no idea how to in C#
Check out the Newtonsoft.Json package on nuget.org, it can parse the JSON for you and then you can retrieve the keys by name
Since the value is in JSON format, use JSON.Net to deserialize it to form a C# type then you can read the value as you read any other property in a class
Another way (apart from using external packages/addons whatever) would be writing a small regex-function like this:
public string GetField(string fieldName)
{
Regex rgxGetField = new Regex(fieldName + "\\\":(.*?),");
Match mGetField = rgxGetField(jString);
return = mGetField.Groups[1].Value;
}
For sure only works for the format you posted in your question.
You want to deserialize the JSON string. See How to Deserialize JSON data? for a number of excellent answers.
thanks for all the help
i ended up using the following:
dynamic d = JObject.Parse(string);
field2= d.field2;
I am using Newtonsoft JSON deserializer. How can one clean JSON for XSS (cross site scripting)? Either cleaning the JSON string before de-serializing or writing some kind of custom converter/sanitizer? If so - I am not 100% sure about the best way to approach this.
Below is an example of JSON that has a dangerous script injected and needs "cleaning." I want a want to manage this before I de-serialize it. But we need to assume all kinds of XSS scenarios, including BASE64 encoded script etc, so the problem is more complex that a simple REGEX string replace.
{ "MyVar" : "hello<script>bad script code</script>world" }
Here is a snapshot of my deserializer ( JSON -> Object ):
public T Deserialize<T>(string json)
{
T obj;
var JSON = cleanJSON(json); //OPTION 1 sanitize here
var customConverter = new JSONSanitizer();// OPTION 2 create a custom converter
obj = JsonConvert.DeserializeObject<T>(json, customConverter);
return obj;
}
JSON is posted from a 3rd party UI interface, so it's fairly exposed, hence the server-side validation. From there, it gets serialized into all kinds of objects and is usually stored in a DB, later to be retrieved and outputted directly in HTML based UI so script injection must be mitigated.
Ok, I am going to try to keep this rather short, because this is a lot of work to write up the whole thing. But, essentially, you need to focus on the context of the data you need to sanitize. From comments on the original post, it sounds like some values in the JSON will be used as HTML that will be rendered, and this HTML comes from an un-trusted source.
The first step is to extract whichever JSON values need to be sanitized as HTML, and for each of those objects you need to run them through an HTML parser and strip away everything that is not in a whitelist. Don't forget that you will also need a whitelist for attributes.
HTML Agility Pack is a good starting place for parsing HTML in C#. How to do this part is a separate question in my opinion - and probably a duplicate of the linked question.
Your worry about base64 strings seems a little over-emphasized in my opinion. It's not like you can simply put aW5zZXJ0IGg0eCBoZXJl into an HTML document and the browser will render it. It can be abused through javascript (which your whitelist will prevent) and, to some extent, through data: urls (but this isn't THAT bad, as javascript will run in the context of the data page. Not good, but you aren't automatically gobbling up cookies with this). If you have to allow a tags, part of the process needs to be validating that the URL is http(s) (or whatever schemes you want to allow).
Ideally, you would avoid this uncomfortable situation, and instead use something like markdown - then you could simply escape the HTML string, but this is not always something we can control. You'd still have to do some URL validation though.
Interesting!! Thanks for asking. we normally use html.urlencode in terms of web forms. I have a enterprise web api running that has validations like this. We have created a custom regex to validate. Please have a look at this MSDN link.
This is the sample model created to parse the request named KeyValue (say)
public class KeyValue
{
public string Key { get; set; }
}
Step 1: Trying with a custom regex
var json = #"[{ 'MyVar' : 'hello<script>bad script code</script>world' }]";
JArray readArray = JArray.Parse(json);
IList<KeyValue> blogPost = readArray.Select(p => new KeyValue { Key = (string)p["MyVar"] }).ToList();
if (!Regex.IsMatch(blogPost.ToString(),
#"^[\p{L}\p{Zs}\p{Lu}\p{Ll}\']{1,40}$"))
Console.WriteLine("InValid");
// ^ means start looking at this position.
// \p{ ..} matches any character in the named character class specified by {..}.
// {L} performs a left-to-right match.
// {Lu} performs a match of uppercase.
// {Ll} performs a match of lowercase.
// {Zs} matches separator and space.
// 'matches apostrophe.
// {1,40} specifies the number of characters: no less than 1 and no more than 40.
// $ means stop looking at this position.
Step 2: Using HttpUtility.UrlEncode - this newtonsoft website link suggests the below implementation.
string json = #"[{ 'MyVar' : 'hello<script>bad script code</script>world' }]";
JArray readArray = JArray.Parse(json);
IList<KeyValue> blogPost = readArray.Select(p => new KeyValue {Key =HttpUtility.UrlEncode((string)p["MyVar"])}).ToList();
I have a c# program that retrieve some JSON data and use Newtonsoft JSON to Deserialize it.
as i use persian chars in my program the JSON codes will be shown like this:\u060c \u067e\u0644\u0627\u06a9 .... also after i retrive the JSON data in my program this chars still show like its coded sample.but after i Deserialize it converted to ???? chars.
what should i do?
Your JSON deserializer is broken; \uXXXX is supposed to be turned into proper characters.
To do that yourself, use this function
// Turns every occurrence of \uXXXX into a proper character
void UnencodeJSONUnicode(string str) {
return Regex.Replace(str,
#"\\u(?<value>[0-9a-f]{4})",
match => {
string digits = match.Groups["value"].Value;
int number = int.Parse(digits, NumberStyles.HexNumber);
return char.ConvertFromUtf32(number);
});
}
(Untested code; I don't have VS available at the moment. Some exception handling would probably be nice too)
Looks like it has been JSON encoded, so you need to decode it. The DataContractJsonSerializer class can do this.
See this MSDN link for more information.