I'm looking for a way to wrap all integers over 17 digits long in a json-formatted string in quotes (essentially making them strings when deserialized).
Someone facing the same issue in Javascript posted here Convert all the integer value to string in JSON
I suspect there is a way to use Regex.Replace() here but the need to understand the syntax and regex's between the two languages has me a bit lost.
So far I have:
string pattern = #"/:\s*(\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d\d+)\s*([,\}])/g";
content = Regex.Replace(content,pattern, #":""{1}""{2}");
Zero-width negative lookahead/lookbehind (https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx#grouping_constructs) is what you should be using to make sure there are no quotes at the start or end. That way you don't need to know about the exact JSON format when you do the replacement:
string pattern = #"(?<![""\w])(\d{17,})(?![""\w])";
string content = Regex.Replace(content, pattern, "\"$1\"");
This solution won't care whether there is a space between the : and the number. It will also handle numbers in arrays [ 0123456701234567, 0123456701234567 ] or by themselves.
Regex still isn't an ideal solution unless you know what content will be passed into it as this breaks as soon as you have a number included in a string value, e.g. "abc 0123456701234567 def".
wrap all integers over 17 digits long in a json-formatted string in quotes
I would use the following:
string pattern = "[^\"\\d](\\d{17,})[^\"\\d]";
content = Regex.Replace(content,pattern, "\"$1\"");
The first line selects all numeric values of 17 digits or greater (and ensure that they aren't already strings).
The second line wraps these 17 digits inside of double quotes.
If your JSON is minified, it changes the regex a little. We can use, which will make sure the resulting JSON is still valid.
string pattern = ":(\\d{17,})";
content = Regex.Replace(content,pattern, "\"$1\"");
Related
I have 2 strings which comes from 2 different sources of machine readings for a code-line, but the readings are not always accurate and might get some missing characters. they both refer for a reading from the same code-line, but they use two different technologies of reading because both of them are not always accurate
The form of the strings should be like this example string:
string s = "<0123456<:012345678:00112233445566778899<";
but because of the accuracy, they could be like this:
string reading1 = "?012?456<:012?45678:00112?33445566?78899<";
string reading2 = "??<0?23456??012?45676?00112?3344556?778890????";
where question-mark is unreadable character which can be from 0 - 9, <, :, or even can result from some noises which may occur at any of the start or the end of the code-line like in reading2
Also the numbers lengths can vary in each of the 3 parts in s string.
I am a new in Regex, and I need a way to use it so that I can get one final string that compensate the missing chars from both strings, so for the previous reading, the final string should be:
string finalString = "<0123456<:012?45678:00112?33445566778899<";
As the reading1 has the top priority so if the same char in both strings differs, the char in reading1 should be used like in the last 9 in the code line, and for non reading chars for both strings, it should remain as question-mark in final string.
I am new in using Regex so I am not sure if there is a way to implement this using Regex, I searched and found many Regex examples which no one of them like my problem, but some of them can solve only some parts of my problem.
Thank you.
I am looking to parse out a string in C# to get relevant data segments from the string.
The rule for one part of the data stream is for Address with this rule set:
Address with $ between address lines. Terminated with “^” if less than 29 characters.
Some examples:
28 Atol Av$Suite 2$^
Hiawatha Park$Apt 2037^
340 Brentwood Dr.$Fall Estate
There are other similar rules for segments but if I have a solid plan for this segment I can modified it for the rest of the parsing.
I am wondering if there is a regex that could be used.
I have.{0,29}\^ that seems to do the trick. I wasn't escaping the ^ initially.
thanks,
Dan
You can use string.Split() to do this.
string [] substrings = string.Split('$');
Now you have an array of strings that contains the values between the '$' characters.
Then, I imagine you just want to get rid of the '^' character on the last element of the array (if it exists).
int index = substrings.Length - 1;
substrings[index] = substrings[index].TrimEnd('^');
You can use regular expressions and Regex.Split(), but you really don't need it if all you need to do is split on '$' and trim '^'. Writing a regular expression for this would be overkill.
EDIT: Now that I think of it, you could split on both '$' and '^' and just discard the empty entries, saving you the trimming step.
string [] substrings = string.Split("$^".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
I'll leave the pre-edit code as-is since it's more explicit, and explains the usage better.
I am trying to use Regex.SPlit to split a a string in order to keep all of its contents, including the delimiters i use. The string is a math problem. For example, 5+9/2*1-1. I have it working if the string contains a + sign but I don't know how to add more then one to the delimiter list. I have looked online at multiple pages but everything I try gives me errors. Here is the code for the Regex.Split line I have: (It works for the plus, Now i need it to also do -,*, and /.
string[] everything = Regex.Split(inputBox.Text, #"(\+)");
Use a character class to match any of the math operations: [*/+-]
string input = "5+9/2*1-1";
string pattern = #"([*/+-])";
string[] result = Regex.Split(input, pattern);
Be aware that character classes allow ranges, such as [0-9], which matches any digit from 0 up to 9. Therefore, to avoid accidental ranges, you can escape the - or place it at either the beginning or end of the character class.
Been scratching my head all day about this one!
Ok, so I have a string which contains the following:
?\"width=\"1\"height=\"1\"border=\"0\"style=\"display:none;\">');
I want to convert that string to the following:
?\"width=1height=1border=0style=\"display:none;\">');
I could theoretically just do a String.Replace on "\"1\"" etc. But this isn't really a viable option as the string could theoretically have any number within the expression.
I also thought about removing the string "\"", however there are other occurrences of this which I don't want to be replaced.
I have been attempting to use the Regex.Replace method as I believe this exists to solve problems along my lines. Here's what I've got:
chunkContents = Regex.Replace(chunkContents, "\".\"", ".");
Now that really messes things up (It replaces the correct elements, but with a full stop), but I think you can see what I am attempting to do with it. I am also worrying that this will only work for single numbers (\"1\" rather than \"11\").. So that led me into thinking about using the "*" or "+" expression rather than ".", however I foresaw the problem of this picking up all of the text inbetween the desired characters (which are dotted all over the place) whereas I obviously only want to replace the ones with numeric characters in between them.
Hope I've explained that clearly enough, will be happy to provide any extra info if needed :)
Try this
var str = "?\"width=\"1\"height=\"1234\"border=\"0\"style=\"display:none;\">');";
str = Regex.Replace(str , "\"(\\d+)\"", "$1");
(\\d+) is a capturing group that looks for one or more digits and $1 references what the group captured.
This works
String input = #"?\""width=\""1\""height=\""1\""border=\""0\""style=\""display:none;\"">');";
//replace the entire match of the regex with only what's captured (the number)
String result = Regex.Replace(input, #"\\""(\d+)\\""", match => match.Result("$1"));
//control string for excpected result
String shouldBe = #"?\""width=1height=1border=0style=\""display:none;\"">');";
//prints true
Console.WriteLine(result.Equals(shouldBe).ToString());
I have a string full of a few hundred words.
How would I get each "word" (this can also be a single letter number or punctuation), and as each "word" is found, it is removed from the string.
Is this possible?
Example:
String:
"this is a string full of words and letters and also some punctuation! and num6er5."
As far as the algorithm is concerned, there are exactly 15 words in the above string.
What you're trying to do is known as tokenizing.
In C#, the string Split() function works pretty well. If it's used like in Niedermair's code without any parameters, it returns an array of strings split (splitted?) by any spaces like this:
"I have spaces" -> {"I", "have", "spaces"}
You can also give any chars to split on as a parameter to Split() (for instance, ',' or ';' to handle csv files).
The Split() method pays no heed to what goes into the strings, so any letters, numbers and other chars will be handled.
About removing the words from the string: You might want to write the string into a buffer to achieve this, but I seriously think that's going too far. Strings are immutable which means any time you remove the "next word" you'll have to recreate the entire string object.
It will be a lot easier to just Split() the entire string, throw the string away, and work with the array from there on.