Regex for getting key:value from JSON in C# - c#

what is the pattern for getting a-z, A-Z, 0-9, space, special characters to deteck url
This is my input string:
{id:1622415796,name:Vincent Dagpin,picture:https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/573992_1622415796_217083925_q.jpg}
This is the pattern: so far
([a-z_]+):[ ]?([\d\s\w]*(,|}))
Expected Result:
id:1622415796
name:Vincent Dagpin
picture:https://fbcdn-profile-a.akamaihd.net/hprofile-ak-snc4/573992_1622415796_217083925_q.jpg
the problem is i can't get the last part.. the picture url..
any help please..

If this is the only kind of json input you expect and further json parsing is very unlikely, a full json parser would be overkill.
A string split may be all you need, jsonString.Split(',', '{', '}');
The regex for that would be along the lines of [{},]([a-z_]+):[ ]?(.+?)(?=,|})
If you can modify the json string that's being sent, you can key the RegEx on something else, like double quotes. Here's one I'm using that requires knowing the json key name. System.Text.RegularExpressions.Regex("(?<=\"" + key + "\"+ *: *\"+).*(?=\")");

I don't think a regex is the right solution. C# already contains the tools you need in JavaScriptSerializer. Check out the answer here to see how.

Related

Regex not matching when input string contains an ampersand

I am trying to come up with a regex that starts with a letter followed by only letters, spaces, commas, dots, ampersands, apostrophes and hyphens.
However, the ampersand character is giving me headaches. Whenever it appears in the input string, the regex no longer matches.
I am using the regex in an ASP.net project using C# in the 'Format' property of a TextInput (a custom control created in the project). In it, I am using Regex.IsMatch(Text, Format) to match it.
For example, using this regex:
^[a-zA-Z][a-zA-Z&.,'\- ]*$
The results are:
John' william-david Pass
John, william'david allen--tony-'' Pass
John, william&david Fail
Whenever I put a & in the input string the regex no longer matches, but without it everything works fine.
How can I fix my issue? Why would the ampersand be causing a problem?
Notes:
I've tried to escape the ampersand with ^[a-zA-Z][a-zA-Z\&.,'\- ]*$ but it has the same issue
I've tried to put the ampersand at the beginning or end o ^[a-zA-Z][&a-zA-Z.,'\- ]*$ or ^[a-zA-Z][a-zA-Z.,'\-\& ]*$ but it also doesn't work
Your problem is somewhere else. The following expression evaluates to true:
Regex.IsMatch(#"John, william&david", #"^[a-zA-Z][a-zA-Z&.,'\- ]*$")
See https://dotnetfiddle.net/WDvQNP
You mentioned in the comments that your problem pertains to C#, so I'll answer your question in that context. If ampersand (&) is truly giving you issues in your character class, you should specify it in an alternate manner.
Luckily, C# supports hex escape sequences which means that you can specifying & as \x26.
For example, instead of:
^[a-zA-Z][a-zA-Z&.,'\- ]*$
use
^[a-zA-Z][a-zA-Z\x26.,'\- ]*$
If that doesn't fix your issue, then your issue is not the &, it's something else.

Regex for ignoring consecutive quotation marks in string

I have built a parser in Sprache and C# for files using a format I don't control. Using it I can correctly convert:
a = "my string";
into
my string
The parser (for the quoted text only) currently looks like this:
public static readonly Parser<string> QuotedText =
from open in Parse.Char('"').Token()
from content in Parse.CharExcept('"').Many().Text().Token()
from close in Parse.Char('"').Token()
select content;
However the format I'm working with escapes quotation marks using "double doubles" quotes, e.g.:
a = "a ""string"".";
When attempting to parse this nothing is returned. It should return:
a ""string"".
Additionally
a = "";
should be parsed into a string.Empty or similar.
I've tried regexes unsuccessfully based on answers like this doing things like "(?:[^;])*", or:
public static readonly Parser<string> QuotedText =
from content in Parse.Regex("""(?:[^;])*""").Token()
This doesn't work (i.e. no matches are returned in the above cases). I think my beginners regex skills are getting in the way. Does anybody have any hints?
EDIT: I was testing it here - http://regex101.com/r/eJ9aH1
If I'm understanding you correctly, this is the kind of regex you're looking for:
"(?:""|[^"])*"
See the demo.
1. " matches an opening quote
2. (?:""|[^"])* matches two quotes or any chars that are not a quote (including newlines), repeating
3. " matches the closing quote.
But it's always going to boil down to whether your input is balanced. If not, you'll be getting false positives. And if you have a string such as "string"", which should be matched?"string"",""`, or nothing?... That's a tough decision, one that, fortunately, you don't have to make if you are sure of your input.
You can likely adapt your desired output from this pattern:
"(.+".+")"|(".+?")|("")
example:
http://regex101.com/r/lO1vZ4
If you only want to ignore consecutive double quotes, try this:
("{2,})
Live demo
This regex "("+) might help you to match extra unwanted double quotes.
here is the DEMO

deserialize xml attribute and handle newline and other special characters

I've tried finding the answer to this for the last 2 days and I just can't find anything that will work with our code.
We have an incoming xml response formatted as below and need to be able to handle newline and other special characters inside of attributes.
The one we're having issues with is "agent-notes" we can not seem to be able to find an XPath function to convert the special characters into \r \n etc.
"anything
everything
something" should be "anything \r \n everything \r \n something"
Unfortunetly, you can't. The agent property value is valid and cannot be assumed to be converted for you in XPath search. You will have to convert you search path by replacing all \n\r to "
". If its the value that you are expecting to be converted then you can use "HttpUtility.HtmlDecode Method".
I've had this problem before and suffered the same fustration as. Coding is not always a perfect science, as much as you would like it to be.

Regex for Multiple Key:Value Search Terms

I need a .NET (C#) regular expression for parsing a string of search terms. The terms are key:value pairs and are delimited by spaces. The thing that's throwing me for a loop is the fact that the key:value pairs may have spaces in the value.
Here's an example string:
f:john l:smith c:san francisco st:ca
I expect to get back the following terms:
f:john
l:smith
c:san francisco
st:ca
Any help? Thanks.
I think that this one will work. It uses a lookahead to make sure that the last word doesn't have a : terminating it.
\b\w+:[\w\s]+\b(?!:)
This is my try:
([\w]+)\:([\w\s]+)\s(?=([\w]+)\:)?
2 caveats:
Each match will have three captures in it. Ignore the last one.
The input text must have a space at the end.

Removing String Escape Codes

My program outputs strings like "Wzyryrff}av{v5~fvzu: Bb``igbuz~+\177Ql\027}C5]{H5LqL{" and the problem is the escape codes (\\\ instead of \, \177 instead of the character, etc.)
I need a way to unescape the string of all escape codes (mainly just the \\\ and octal \027 types). Is there something that already does this?
Thanks
Reference: http://www.tailrecursive.org/postscript/escapes.html
The strings are an encrypted value and I need to decrypt them, but I'm getting the wrong values since the strings are escaped
It sounds more like it's encoded rather than simply escaped (if \177 is really a character). So, try decoding it.
There is nothing built in to do exactly this kind of escaping.
You will need to parse and replace these sequences yourself.
The \xxx octal escapes can be found with a RegEx (\\\d{3}), iterating over the matches will allow you to parse out the octal part and get the replacement character for it (then a simple replace will do).
The others appear to be simple to replace with string.Replace.
If the string is encrypted then you probably need to treat it as binary and not text. You need to know how it is encoded and decode it accordingly. The fact that you can view it as text is incidental.
If you want to replace specific contents you can just use the .Replace() method.
i.e. myInput.Replace("\\", #"\")
I am not sure why the "\" is a problem for you. If it its actually an escape code then it just should be fine since the \ represents the \ in a string.
What is the reason you need to "remove" the escape codes?

Categories